From BlakeinstantaneousDelgado at gksoft.com Tue Apr 1 03:18:58 2008 From: BlakeinstantaneousDelgado at gksoft.com (Kerry Reeves) Date: Tue, 1 Apr 2008 18:18:58 +0800 Subject: [ofa-general] The bull is back Message-ID: <3IX014EJXVWDA271@gksoft.com> DnC Multimedia Corporation Today came firing out of the gate today Symbol:DCNM 600% Volume Spike and Over 20% gains on a a huge news release The hot PR DnC Multimedia Announces Distribution Agreement and $445,000 Purchase Order, read more about it. The trick with penny stocs is to hit it while its hot, and today's activity clearly backs our beliefs of DCNM being in the zone. Investors are discovering this hidden gem Grab this gem while its in cents it wont last there long. Ride the gains with DCNM DnC Multimedia Corporation Today From ogerlitz at voltaire.com Tue Apr 1 00:00:46 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 01 Apr 2008 10:00:46 +0300 Subject: [ofa-general] Re: IB/core: Add creation flags to struct ib_qp_init_attr In-Reply-To: References: <1205767427.25950.137.camel@mtls03> Message-ID: <47F1DD9E.7040304@voltaire.com> Roland Dreier wrote: > Subject: [PATCH] IB/core: Add creation flags to struct ib_qp_init_attr > > Add a create_flags member to struct ib_qp_init_attr that will allow a > kernel verbs consumer to create a pass special flags when creating a QP. > Add a flag value for telling low-level drivers that a QP will be used > for IPoIB UD LSO. The create_flags member will also be useful for XRC > and ehca low-latency QP support. Roland, can you please comment what is the approach you prefer to see for --user space-- implementation of features such as the ehca low latency and the mlx4 block loopback QP "types"? do you want to go the XRC way of not breaking the ABI by introducing a new create-qp verb per feature as Jack said they did: > I got around the create_flags problem by adding a new verb to userspace > (ibv_create_xrc_rcv_qp() ) with its own ABI to kernel space. Since the kernel-space > function (added to uverbs_cmd: ib_uverbs_create_xrc_rcv_qp() ) "knew" that it > was creating an XRC_RCV qp, it set the flag in ib_qp_init_attr appropriately. If this is what you prefer to see, does it makes sense to have one new verb that can be used for xrc, ehca-ll, mlx4-block loopback and what ever new features we want to add for user space QPs in the future? Or. From harry at eurowebhost.com Tue Apr 1 00:30:03 2008 From: harry at eurowebhost.com (Noelle Grue) Date: Tue, 1 Apr 2008 10:30:03 +0300 Subject: [ofa-general] Belebt Geist und Korper Message-ID: <01c893e3$58660780$f321e457@harry> Online anonym bestellen - original Qualitat - 100% wirksam Fakten von unseren Kunden: - Sex ist befriedigender denn je. Stress und Leistungsdruck verschwinden. Sie ist nie wieder frustriert, ich habe keine Angst mehr zu versagen. Es ist ein wundervolles korperliches Erlebnis, dem ein genauso tiefes Gefuhl folgt. - Die Nebenwirkungen sind minimal: manchmal eine verstopfte Nase, kurzzeitig ein roter Kopf - kein Kopfschmerz, sondern das Gefuhl, als wurde man eine Flasche eiskalte Cola in einem Zug trinken. - Interessanterweise macht eine Vi. allein noch keinen Stander. Man(n) muss wenigstens ein bisschen Lust auf Sex mit der Frau haben. Gegen eine Eiserne Jungfrau im Bett hilft auch die grosste Dosis nichts. Wer aber das erste Kribbeln in den Lenden spurt, wird einen stahlharten Stander haben, und das fur wenigstens vier Stunden. - Eine volle 100-mg-Dosis macht den Schwanz zum Schwert. Wer es ubertreibt, ist Schuld, wenn die Herzallerliebste am Ende einen Y-formigen Sarg braucht. Fur die meisten von uns sind 50 mg mehr als genug, wenn man das gute Stuck zwischen den Hohepunkten auch mal hangen lassen will ... zur Not hilft es da vielleicht, sich ein nacktes Grossmutterchen vorzustellen. - Wer noch Zeit und Lust fur eine schnelle Nummer am nachsten Morgen hat, sollte dafur sorgen, dann noch genug Vi. im Blut zu haben - damit es noch fur ein oder zwei "Stehaufmannchen" reicht. - Das Beste an Vi. ist die Sicherheit, dass man "mit Autopilot fliegt", dass man entspannt und ohne Sorgen zur Sache kommen kann, dass der Stander auch halt, auch wenn man unterbrochen wird (die Kinder klopfen an die Schlafzimmertur, der Hund bellt, das Kondom sitzt schlecht). Wenn man Vi. bewusst anwendet, kann es auch der Partnerin gegenuber ein grosses Geschenk sein. Nur ein Rat: Sagen Sie ihr nicht, dass Sie es verwenden, das weibliche Selbstwertgefuhl ist genauso verletzlich wie das unsere. Spezialangebot: Vi. 10 Tab. 100 mg + Ci. 10 Tab. x 20 mg 53,82 Euro Vi. 10 Tab. 26,20 Euro Vi. 30 Tab. 51,97 Euro - Sie sparen: 27,00 Euro Vi. 60 Tab. 95,69 Euro - Sie sparen: 62,00 Euro Vi. 90 Tab. 136,91 Euro - Sie sparen: 100,00 Euro Ci. 10 - 30,00 Euro Ci. 20 - 59,35 Euro - Sie sparen: 2,00 Euro Ci. 30 - 80,30 Euro - Sie sparen: 12,00 Euro - bequem und diskret online bestellen. - keine versteckte Kosten - kein peinlicher Arztbesuch erforderlich - diskrete Verpackung - Visa verifizierter Onlineshop - diskrete Zahlung - kostenlose, arztliche Telefon-Beratung - kein langes Warten - Auslieferung innerhalb von 2-3 Tagen Bestellen Sie jetzt und vergessen Sie Ihre Enttauschungen, anhaltende Versagensaengste und wiederholte peinliche Situationen http://exceptsecond.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From eli at dev.mellanox.co.il Tue Apr 1 00:53:12 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Tue, 01 Apr 2008 10:53:12 +0300 Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support In-Reply-To: <15ddcffd0803312353l57b0cfaft273c9f809387fb68@mail.gmail.com> References: <1206452112.25950.360.camel@mtls03> <15ddcffd0803312353l57b0cfaft273c9f809387fb68@mail.gmail.com> Message-ID: <1207036392.22081.12.camel@mtls03> On Tue, 2008-04-01 at 09:53 +0300, Or Gerlitz wrote: > On Tue, Mar 25, 2008 at 4:35 PM, Eli Cohen > wrote: > Add LSO support to mlx4 driver such that it will be able > to send SKBs passed from the driver which publish NETIF_TSO. > > Signed-off-by: Eli Cohen > --- > Changes since last post: > 1. Verify that header length does not exceed 60 bytes. > 2. Remove unnecessary printk calls > > > OK, so this patch would complete the LSO merging! > > Eli - what is the status here, does some editing is needed to comply > to the way the rest of the patches were merged (qp creation flags, > etc), or its applicable to review/merge as was posted? > I think it does though I didn't yet have the chance to check it on top of the latest commits of Roland. From eli at dev.mellanox.co.il Tue Apr 1 00:53:12 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Tue, 01 Apr 2008 10:53:12 +0300 Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support In-Reply-To: <15ddcffd0803312353l57b0cfaft273c9f809387fb68@mail.gmail.com> References: <1206452112.25950.360.camel@mtls03> <15ddcffd0803312353l57b0cfaft273c9f809387fb68@mail.gmail.com> Message-ID: <1207036392.22081.12.camel@mtls03> On Tue, 2008-04-01 at 09:53 +0300, Or Gerlitz wrote: > On Tue, Mar 25, 2008 at 4:35 PM, Eli Cohen > wrote: > Add LSO support to mlx4 driver such that it will be able > to send SKBs passed from the driver which publish NETIF_TSO. > > Signed-off-by: Eli Cohen > --- > Changes since last post: > 1. Verify that header length does not exceed 60 bytes. > 2. Remove unnecessary printk calls > > > OK, so this patch would complete the LSO merging! > > Eli - what is the status here, does some editing is needed to comply > to the way the rest of the patches were merged (qp creation flags, > etc), or its applicable to review/merge as was posted? > I think it does though I didn't yet have the chance to check it on top of the latest commits of Roland. From HNGUYEN at de.ibm.com Tue Apr 1 01:16:35 2008 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Tue, 1 Apr 2008 10:16:35 +0200 Subject: [ofa-general] Re: [PATCH 2/10] IB/core: Add creation flags to QPs In-Reply-To: Message-ID: Hi Roland! > Thanks, I applied this with some extra code in all the low-level > drivers to make sure that the create_flags are passed in as 0. Does > that make sense to everyone? Below changes make sense to me as I would have to check the flags when introducing LL QP flag for ehca later. BTW: If you have some minutes, please let us agree on the encoding scheme for qp_types and create_flags as discussed in this thread. Thanks! Nam > diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c > b/drivers/infiniband/hw/ehca/ehca_qp.c > index a9fd419..3eb14a5 100644 > --- a/drivers/infiniband/hw/ehca/ehca_qp.c > +++ b/drivers/infiniband/hw/ehca/ehca_qp.c > @@ -421,6 +421,9 @@ static struct ehca_qp *internal_create_qp( > u32 swqe_size = 0, rwqe_size = 0, ib_qp_num; > unsigned long flags; > > + if (init_attr->create_flags) > + return ERR_PTR(-EINVAL); > + > memset(&parms, 0, sizeof(parms)); > qp_type = init_attr->qp_type; > From HNGUYEN at de.ibm.com Tue Apr 1 01:16:35 2008 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Tue, 1 Apr 2008 10:16:35 +0200 Subject: [ofa-general] Re: [PATCH 2/10] IB/core: Add creation flags to QPs In-Reply-To: Message-ID: Hi Roland! > Thanks, I applied this with some extra code in all the low-level > drivers to make sure that the create_flags are passed in as 0. Does > that make sense to everyone? Below changes make sense to me as I would have to check the flags when introducing LL QP flag for ehca later. BTW: If you have some minutes, please let us agree on the encoding scheme for qp_types and create_flags as discussed in this thread. Thanks! Nam > diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c > b/drivers/infiniband/hw/ehca/ehca_qp.c > index a9fd419..3eb14a5 100644 > --- a/drivers/infiniband/hw/ehca/ehca_qp.c > +++ b/drivers/infiniband/hw/ehca/ehca_qp.c > @@ -421,6 +421,9 @@ static struct ehca_qp *internal_create_qp( > u32 swqe_size = 0, rwqe_size = 0, ib_qp_num; > unsigned long flags; > > + if (init_attr->create_flags) > + return ERR_PTR(-EINVAL); > + > memset(&parms, 0, sizeof(parms)); > qp_type = init_attr->qp_type; > From ogerlitz at voltaire.com Tue Apr 1 01:18:08 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 01 Apr 2008 11:18:08 +0300 Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support In-Reply-To: <1207036392.22081.12.camel@mtls03> References: <1206452112.25950.360.camel@mtls03> <15ddcffd0803312353l57b0cfaft273c9f809387fb68@mail.gmail.com> <1207036392.22081.12.camel@mtls03> Message-ID: <47F1EFC0.70504@voltaire.com> Eli Cohen wrote: > On Tue, 2008-04-01 at 09:53 +0300, Or Gerlitz wrote: > >> Eli - what is the status here, does some editing is needed to comply >> to the way the rest of the patches were merged (qp creation flags, >> etc), or its applicable to review/merge as was posted? >> > > I think it does though I didn't yet have the chance to check it on top > of the latest commits of Roland. > I see. 2.6.25 is at RC seven and we still have the interrupt moderation patches pending for completion of review and merging, so... Or. From ogerlitz at voltaire.com Tue Apr 1 01:18:08 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 01 Apr 2008 11:18:08 +0300 Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support In-Reply-To: <1207036392.22081.12.camel@mtls03> References: <1206452112.25950.360.camel@mtls03> <15ddcffd0803312353l57b0cfaft273c9f809387fb68@mail.gmail.com> <1207036392.22081.12.camel@mtls03> Message-ID: <47F1EFC0.70504@voltaire.com> Eli Cohen wrote: > On Tue, 2008-04-01 at 09:53 +0300, Or Gerlitz wrote: > >> Eli - what is the status here, does some editing is needed to comply >> to the way the rest of the patches were merged (qp creation flags, >> etc), or its applicable to review/merge as was posted? >> > > I think it does though I didn't yet have the chance to check it on top > of the latest commits of Roland. > I see. 2.6.25 is at RC seven and we still have the interrupt moderation patches pending for completion of review and merging, so... Or. From BookertoffeeAvery at kdka.com Tue Apr 1 05:44:05 2008 From: BookertoffeeAvery at kdka.com (Raymundo Jacobson) Date: Tue, 1 Apr 2008 11:44:05 -0100 Subject: [ofa-general] Aggressive Traders Alert Message-ID: <8IX823EJXVWDA522@kdka.com> DnC Multimedia Corporation Today came firing out of the gate today Symbol:DCNM 600% Volume Spike and Over 20% gains on a a huge news release The hot PR DnC Multimedia Announces Distribution Agreement and $445,000 Purchase Order, read more about it. The trick with penny stocs is to hit it while its hot, and today's activity clearly backs our beliefs of DCNM being in the zone. Investors are discovering this hidden gem Grab this gem while its in cents it wont last there long. Ride the gains with DCNM DnC Multimedia Corporation Today From a-anthw at abvv-wvl.be Tue Apr 1 02:45:42 2008 From: a-anthw at abvv-wvl.be (Hugh Dillard) Date: Tue, 1 Apr 2008 17:45:42 +0800 Subject: [ofa-general] i still remember you Message-ID: <01c89420$3474e700$2af3a9dc@a-anthw> Hello! I am bored today. I am nice girl that would like to chat with you. Email me at Alexis at jolasite.com only, because I am using my friend's email to write this. Will send some of my pictures From mcdermienn at esvax-a1.email.dupont.com Tue Apr 1 03:34:44 2008 From: mcdermienn at esvax-a1.email.dupont.com (Lucinda Ash) Date: Tue, 1 Apr 2008 11:34:44 +0100 Subject: [ofa-general] Die groesste Standardsoftware fuer Minipreise Message-ID: <488940577.35031876976674@esvax-a1.email.dupont.com> Legal software salesHier bekommen Sie Ihre Software sofort. Bezahlen und unverzueglich downloaden � so geht es bei uns. Wir haben Programme in allen europaeischen Sprachen, diese sind sowohl fuer Windows als auch fuer Macintosh geeignet. Unsere Programme sind sehr preiswert, aber es handelt sich nur um originale Vollversionen. http://geocities.com/ewing_lindsey/* Office Enterprise 2007: $79.95 * Adobe Acrobat 8.0 Professional: $69.95 * Office System Professional 2003 (5 Cds): $59.95 * Office System Professional 2003 (5 Cds): $59.95 http://geocities.com/ewing_lindsey/Unsere Kundenberater sind immer bereit Ihnen bei der Installation zu helfen. Wir antworten sehr schnell und geben Ihnen auch Geld-Zurueck-Garantie. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dotanb at dev.mellanox.co.il Tue Apr 1 05:45:10 2008 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Tue, 1 Apr 2008 15:45:10 +0300 Subject: [ofa-general] [PATCH] librdmacm: fix typos in examples + start add port support Message-ID: <200804011545.10650.dotanb@dev.mellanox.co.il> Fixed typo in test name + spelling typos. Started to add support to control the port number from command line. Signed-off-by: Dotan Barak --- diff --git a/examples/cmatose.c b/examples/cmatose.c index 2f6e5f6..ba6299e 100644 --- a/examples/cmatose.c +++ b/examples/cmatose.c @@ -80,6 +80,7 @@ static struct cmatest test; static int connections = 1; static int message_size = 100; static int message_count = 10; +static uint16_t port = 7471; static uint8_t set_tos = 0; static uint8_t tos; static uint8_t migrate = 0; @@ -536,7 +537,7 @@ static int run_server(void) } else test.src_in.sin_family = PF_INET; - test.src_in.sin_port = 7471; + test.src_in.sin_port = port; ret = rdma_bind_addr(listen_id, test.src_addr); if (ret) { printf("cmatose: bind address failed: %d\n", ret); @@ -613,7 +614,7 @@ static int run_client(void) if (ret) return ret; - test.dst_in.sin_port = 7471; + test.dst_in.sin_port = port; printf("cmatose: connecting\n"); for (i = 0; i < connections; i++) { @@ -666,7 +667,7 @@ int main(int argc, char **argv) { int op, ret; - while ((op = getopt(argc, argv, "s:b:c:C:S:t:m")) != -1) { + while ((op = getopt(argc, argv, "s:b:c:C:S:t:p:m")) != -1) { switch (op) { case 's': dst_addr = optarg; @@ -687,6 +688,9 @@ int main(int argc, char **argv) set_tos = 1; tos = (uint8_t) atoi(optarg); break; + case 'p': + port = atoi(optarg); + break; case 'm': migrate = 1; break; @@ -698,6 +702,7 @@ int main(int argc, char **argv) printf("\t[-C message_count]\n"); printf("\t[-S message_size]\n"); printf("\t[-t type_of_service]\n"); + printf("\t[-p port_number]\n"); printf("\t[-m(igrate)]\n"); exit(1); } diff --git a/examples/rping.c b/examples/rping.c index 983ce1c..8bfa053 100644 --- a/examples/rping.c +++ b/examples/rping.c @@ -123,7 +123,7 @@ struct rping_cb { struct rping_rdma_info recv_buf;/* malloc'd buffer */ struct ibv_mr *recv_mr; /* MR associated with this buffer */ - struct ibv_send_wr sq_wr; /* send work requrest record */ + struct ibv_send_wr sq_wr; /* send work request record */ struct ibv_sge send_sgl; struct rping_rdma_info send_buf;/* single send buf */ struct ibv_mr *send_mr; @@ -600,7 +600,7 @@ static void *cq_thread(void *arg) pthread_exit(NULL); } if (ev_cq != cb->cq) { - fprintf(stderr, "Unkown CQ!\n"); + fprintf(stderr, "Unknown CQ!\n"); pthread_exit(NULL); } ret = ibv_req_notify_cq(cb->cq, 0); diff --git a/examples/udaddy.c b/examples/udaddy.c index 60d9e16..0d69b05 100644 --- a/examples/udaddy.c +++ b/examples/udaddy.c @@ -74,6 +74,7 @@ static struct cmatest test; static int connections = 1; static int message_size = 100; static int message_count = 10; +static uint16_t port = 7174; static uint8_t set_tos = 0; static uint8_t tos; static char *dst_addr; @@ -244,7 +245,7 @@ static int addr_handler(struct cmatest_node *node) ret = rdma_set_option(node->cma_id, RDMA_OPTION_ID, RDMA_OPTION_ID_TOS, &tos, sizeof tos); if (ret) - printf("cmatose: set TOS option failed: %d\n", ret); + printf("udaddy: set TOS option failed: %d\n", ret); } ret = rdma_resolve_route(node->cma_id, 2000); @@ -542,7 +543,7 @@ static int run_server(void) } else test.src_in.sin_family = PF_INET; - test.src_in.sin_port = 7174; + test.src_in.sin_port = port; ret = rdma_bind_addr(listen_id, test.src_addr); if (ret) { printf("udaddy: bind address failed: %d\n", ret); @@ -595,7 +596,7 @@ static int run_client(void) if (ret) return ret; - test.dst_in.sin_port = 7174; + test.dst_in.sin_port = port; printf("udaddy: connecting\n"); for (i = 0; i < connections; i++) { From dotanb at dev.mellanox.co.il Tue Apr 1 06:02:04 2008 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Tue, 01 Apr 2008 16:02:04 +0300 Subject: [ofa-general] Re: the port numbers in some of the rdmacm examples is a fixed value In-Reply-To: <000101c8934b$265a46e0$37fc070a@amr.corp.intel.com> References: <47EBBC81.4030501@dev.mellanox.co.il> <000101c89022$ce0b3d30$9c98070a@amr.corp.intel.com> <47EF2A80.1020804@dev.mellanox.co.il> <000101c8934b$265a46e0$37fc070a@amr.corp.intel.com> Message-ID: <47F2324C.9060002@dev.mellanox.co.il> Sean Hefty wrote: >> I started to work on this patch and for ucmatose everything is fine. >> The problem is with udaddy: the parameter "-p" is only being used for >> the port space ... >> (I really would like to have the same parameter for controlling the port >> number for ALL of the examples >> of the librdmacm, but i must admit that without doing some changes it >> won't happen) ... >> > > I'd prefer to use the same parameter as well. If no one objects, I'm okay with > modifying the udaddy -p parameter. > > >> what do you think? >> (do you want me to send you the changes i made so far?) >> > > Sending me the changes that you have would be fine. I can finish them up when I > get some time. > O.k., i sent you one patch which contains: 1) typo fixes (in test name of error message) + spelling typos 2) start of port support to control the port numbers from the command line (if you wish, i can supply two different patches) Only a one minute work is required to close this issue and fix the port number support of the udaddy. thanks Dotan From stadler at imit.kth.se Tue Apr 1 06:07:01 2008 From: stadler at imit.kth.se (Antone Pack) Date: Tue, 1 Apr 2008 21:07:01 +0800 Subject: [ofa-general] {Viagra_onli2_de} Message-ID: <161186767.74859209063012@imit.kth.se> Versuchen Sie unser Produkt und Sie werden fuhlen was unsere Kunden bestatigen Pr. .. Eise die keine Konk... ..Urrenz kennen - Kein langes Warten - Auslieferung innerhalb von 2-3 Tagen - keine versteckte Kos// Ten - Bequem und dis kret 0... . N-line! be... .Stellen. - Disk rete Verpackung und Zahlung - Kein peinlicher Arz t besuch erforderlich - Kos... Tenlose, arztliche Telefon-Beratung Originalme/ dikamente Ciia..aa^_^aaalis...... 10 Pack. 21,00 Euro Viia..aa^_^aaagra... 10 Pack. 11,00 Euro Nur fur kurze Zeit - vier Pil. .. len umsonst erhalten Man Lebt nur einmal - probiers aus ! (bitte warten Sie einen Moment bis die Seite vollstandig geladen ist) -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Tue Apr 1 06:38:06 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 01 Apr 2008 08:38:06 -0500 Subject: [ofa-general] Re: summary on OFED 1.4 plans - re adding new kernel features/verbs through ofed In-Reply-To: <47F1D4E6.2010108@voltaire.com> References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> <47F108D2.1030606@opengridcomputing.com> <47F1D4E6.2010108@voltaire.com> Message-ID: <47F23ABE.8060102@opengridcomputing.com> Or Gerlitz wrote: > Steve Wise wrote: >> Tziporet Koren wrote: >>> >>> * OFED 1.4: * >>> 1. Kernel base: since we target 1.4 release to Sep we target the >>> kernel base to be 2.6.27 >>> This is a good target, but we may need to stay with 2.6.26 if the >>> kernel progress will not be aligned. >>> 2. Suggestions for new features: >>> >>> * Verbs: Reliable Multicast (to be presented at Sonoma) >>> * IPoIB - continue with performance enhancements >>> >> Sorry I missed these meetings. For iWARP, here is my plan: >> New iWARP Verbs: >> - stag_alloc/dealloc >> - nsmr_fastreg >> - read-with-inv-local-stag >> - inv-local-stag >> Note the above verbs might be transport-independent. I believe the >> IBTA has defined a fastreg verb too? >> - peer-2-peer support in IWCM/Drivers > Steve, Tziporet, > > So you are talking about adding new verbs/features to the Linux RDMA > stack. Are you intending to do this through the mainline kernel cycles, > eg the general list, the maintainer (Roland), etc? and if not, why? > Of course. All the work I've done and will do for the linux rdma core and chelsio drivers is first pushed upstream, then submitted to ofed. Steve. From kitlouiskoogax at louiskoo.com Tue Apr 1 06:47:44 2008 From: kitlouiskoogax at louiskoo.com (Pat Moore) Date: Tue, 1 Apr 2008 10:47:44 -0300 Subject: [ofa-general] Legal software sales Message-ID: <326403736.67623303150570@louiskoo.com> Our aim is to render PC and Macintosh lawful software and computer solutions of low cost for any budget. Whether you are a corporate buyer, a small-scale enterprise possessor, or shopping for your home personal computer, we think that we'll assist you. CHECK ALL PRODUCTS! http://louisajeffcoatnc761.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From sashak at voltaire.com Tue Apr 1 10:45:08 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 1 Apr 2008 17:45:08 +0000 Subject: [ofa-general] [PATCH] opensm/configure.in: improve readability of configured config files In-Reply-To: <47F023B6.2070302@mellanox.co.il> References: <20080330232119.GM13708@sashak.voltaire.com> <47F023B6.2070302@mellanox.co.il> Message-ID: <20080401174508.GB27321@sashak.voltaire.com> When ./configure script is executed it will show the values which are used for those config files, like this: checking for --with-opensm-conf-sub-dir... /etc/opensm checking for --with-node-name-map ... ib-node-name-map checking for --with-partitions-conf... partitions.conf checking for --with-qos-policy-conf... qos-policy.conf (note that for --with-opensm-conf-sub-dir full path is shown) And not just that it was or wasn't redefined from its default values (checking for --with-partitions-conf... no , etc). Signed-off-by: Sasha Khapyorsky --- opensm/configure.in | 35 +++++++++++++++-------------------- 1 files changed, 15 insertions(+), 20 deletions(-) diff --git a/opensm/configure.in b/opensm/configure.in index a5b7c5a..0da402a 100644 --- a/opensm/configure.in +++ b/opensm/configure.in @@ -82,6 +82,10 @@ OPENIB_OSM_CONSOLE_SOCKET_SEL dnl select performance manager or not OPENIB_OSM_PERF_MGR_SEL +dnl resolve config dir. +conf_dir_tmp1="`eval echo ${sysconfdir} | sed 's/^NONE/$ac_default_prefix/'`" +SYS_CONFIG_DIR="`eval echo $conf_dir_tmp1`" + dnl Check for a different subdir for the config files. OPENSM_CONF_SUB_DIR=opensm AC_MSG_CHECKING(for --with-opensm-conf-sub-dir) @@ -92,23 +96,17 @@ AC_ARG_WITH(opensm-conf-sub-dir, no) ;; *) - withopensmconfsubdir=yes OPENSM_CONF_SUB_DIR=$withval ;; esac ] ) -AC_MSG_RESULT(${withopensmconfsubdir=no}) -AC_SUBST(OPENSM_CONF_SUB_DIR) - -dnl Set up /opensm config dir. -CONF_DIR_TMP1="`eval echo ${sysconfdir}/$OPENSM_CONF_SUB_DIR`" -CONF_DIR_TMP2="`echo $CONF_DIR_TMP1 | sed 's/^NONE/$ac_default_prefix/'`" -CONF_DIR="`eval echo $CONF_DIR_TMP2`" - +OPENSM_CONFIG_DIR=$SYS_CONFIG_DIR/$OPENSM_CONF_SUB_DIR +AC_MSG_RESULT($OPENSM_CONFIG_DIR) AC_DEFINE_UNQUOTED(OPENSM_CONFIG_DIR, - ["$CONF_DIR"], + ["$OPENSM_CONFIG_DIR"], [Define OpenSM config directory]) -AC_SUBST(CONF_DIR) +AC_SUBST(OPENSM_CONF_SUB_DIR) +AC_SUBST(CONF_DIR,$OPENSM_CONFIG_DIR) dnl Check for a different default node name map file NODENAMEMAPFILE=ib-node-name-map @@ -120,14 +118,13 @@ AC_ARG_WITH(node-name-map, no) ;; *) - withnodenamemap=yes NODENAMEMAPFILE=$withval ;; esac ] ) -AC_MSG_RESULT(${withnodenamemap=no}) +AC_MSG_RESULT($NODENAMEMAPFILE) AC_DEFINE_UNQUOTED(HAVE_DEFAULT_NODENAME_MAP, - ["$CONF_DIR/$NODENAMEMAPFILE"], + ["$OPENSM_CONFIG_DIR/$NODENAMEMAPFILE"], [Define a default node name map file]) AC_SUBST(NODENAMEMAPFILE) @@ -141,14 +138,13 @@ AC_ARG_WITH(partitions-conf, no) ;; *) - withpartitionsconf=yes PARTITION_CONFIG_FILE=$withval ;; esac ] ) -AC_MSG_RESULT(${withpartitionsconf=no}) +AC_MSG_RESULT($PARTITION_CONFIG_FILE) AC_DEFINE_UNQUOTED(HAVE_DEFAULT_PARTITION_CONFIG_FILE, - ["$CONF_DIR/$PARTITION_CONFIG_FILE"], + ["$OPENSM_CONFIG_DIR/$PARTITION_CONFIG_FILE"], [Define a Partition config file]) AC_SUBST(PARTITION_CONFIG_FILE) @@ -162,14 +158,13 @@ AC_ARG_WITH(qos-policy-conf, no) ;; *) - withqospolicyconf=yes QOS_POLICY_FILE=$withval ;; esac ] ) -AC_MSG_RESULT(${withqospolicyconf=no}) +AC_MSG_RESULT($QOS_POLICY_FILE) AC_DEFINE_UNQUOTED(HAVE_DEFAULT_QOS_POLICY_FILE, - ["$CONF_DIR/$QOS_POLICY_FILE"], + ["$OPENSM_CONFIG/$QOS_POLICY_FILE"], [Define a QOS policy config file]) AC_SUBST(QOS_POLICY_FILE) -- 1.5.4.1.122.gaa8d From sashak at voltaire.com Tue Apr 1 10:52:46 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 1 Apr 2008 17:52:46 +0000 Subject: [ofa-general] [PATCH] opensm/configure.in: replace CONF_DIR config var by OSM_CONFIG_DIR In-Reply-To: <20080401174508.GB27321@sashak.voltaire.com> References: <20080330232119.GM13708@sashak.voltaire.com> <47F023B6.2070302@mellanox.co.il> <20080401174508.GB27321@sashak.voltaire.com> Message-ID: <20080401175246.GC27321@sashak.voltaire.com> Replace CONF_DIR config variable by OSM_CONFIG_DIR in substitution patterns. Remove not needed OPENSM_CONF_SUB_DIR var. Signed-off-by: Sasha Khapyorsky --- opensm/configure.in | 8 +++----- opensm/man/opensm.8.in | 16 ++++++++-------- opensm/opensm.spec.in | 8 ++++---- opensm/scripts/opensmd.in | 4 ++-- opensm/scripts/redhat-opensm.init.in | 4 ++-- 5 files changed, 19 insertions(+), 21 deletions(-) diff --git a/opensm/configure.in b/opensm/configure.in index 0da402a..1f2bed5 100644 --- a/opensm/configure.in +++ b/opensm/configure.in @@ -87,7 +87,7 @@ conf_dir_tmp1="`eval echo ${sysconfdir} | sed 's/^NONE/$ac_default_prefix/'`" SYS_CONFIG_DIR="`eval echo $conf_dir_tmp1`" dnl Check for a different subdir for the config files. -OPENSM_CONF_SUB_DIR=opensm +OPENSM_CONFIG_DIR=$SYS_CONFIG_DIR/opensm AC_MSG_CHECKING(for --with-opensm-conf-sub-dir) AC_ARG_WITH(opensm-conf-sub-dir, AC_HELP_STRING([--with-opensm-conf-sub-dir=dir], @@ -96,17 +96,15 @@ AC_ARG_WITH(opensm-conf-sub-dir, no) ;; *) - OPENSM_CONF_SUB_DIR=$withval + OPENSM_CONFIG_DIR=$SYS_CONFIG_DIR/$withval ;; esac ] ) -OPENSM_CONFIG_DIR=$SYS_CONFIG_DIR/$OPENSM_CONF_SUB_DIR AC_MSG_RESULT($OPENSM_CONFIG_DIR) AC_DEFINE_UNQUOTED(OPENSM_CONFIG_DIR, ["$OPENSM_CONFIG_DIR"], [Define OpenSM config directory]) -AC_SUBST(OPENSM_CONF_SUB_DIR) -AC_SUBST(CONF_DIR,$OPENSM_CONFIG_DIR) +AC_SUBST(OPENSM_CONFIG_DIR) dnl Check for a different default node name map file NODENAMEMAPFILE=ib-node-name-map diff --git a/opensm/man/opensm.8.in b/opensm/man/opensm.8.in index 93fa95c..e93844d 100644 --- a/opensm/man/opensm.8.in +++ b/opensm/man/opensm.8.in @@ -201,21 +201,21 @@ is accumulative. .TP \fB\-P\fR, \fB\-\-Pconfig\fR This option defines the optional partition configuration file. -The default name is \fB\%@CONF_DIR@/@PARTITION_CONFIG_FILE@\fP. +The default name is \fB\%@OPENSM_CONFIG_DIR@/@PARTITION_CONFIG_FILE@\fP. .TP .BI --prefix_routes_file= path Prefix routes control how the SA responds to path record queries for off-subnet DGIDs. By default, the SA fails such queries. The .B PREFIX ROUTES section below describes the format of the configuration file. -The default path is \fB\%@CONF_DIR@/prefix\-routes.conf\fP. +The default path is \fB\%@OPENSM_CONFIG_DIR@/prefix\-routes.conf\fP. .TP \fB\-Q\fR, \fB\-\-qos\fR This option enables QoS setup. It is disabled by default. .TP \fB\-Y\fR, \fB\-\-qos_policy_file\fR This option defines the optional QoS policy file. The default -name is \fB\%@CONF_DIR@/@QOS_POLICY_FILE@\fP. +name is \fB\%@OPENSM_CONFIG_DIR@/@QOS_POLICY_FILE@\fP. .TP \fB\-N\fR, \fB\-\-no_part_enforce\fR This option disables partition enforcement on switch external ports. @@ -331,7 +331,7 @@ logrotate purposes. .SH PARTITION CONFIGURATION .PP The default name of OpenSM partitions configuration file is -\fB\%@CONF_DIR@/@PARTITION_CONFIG_FILE@\fP. The default may be changed by using +\fB\%@OPENSM_CONFIG_DIR@/@PARTITION_CONFIG_FILE@\fP. The default may be changed by using --Pconfig (-P) option with OpenSM. The default partition will be created by OpenSM unconditionally even @@ -926,19 +926,19 @@ Both or one of options -U and -M can be specified together with \'-R file\'. .SH FILES .TP -.B @CONF_DIR@/@NODENAMEMAPFILE@ +.B @OPENSM_CONFIG_DIR@/@NODENAMEMAPFILE@ default node name map file. See ibnetdiscover for more information on format. .TP -.B @CONF_DIR@/@PARTITION_CONFIG_FILE@ +.B @OPENSM_CONFIG_DIR@/@PARTITION_CONFIG_FILE@ default partition config file .TP -.B @CONF_DIR@/@QOS_POLICY_FILE@ +.B @OPENSM_CONFIG_DIR@/@QOS_POLICY_FILE@ default QOS policy config file .TP -.B @CONF_DIR@/prefix-routes.conf +.B @OPENSM_CONFIG_DIR@/prefix-routes.conf default prefix routes file. .SH AUTHORS diff --git a/opensm/opensm.spec.in b/opensm/opensm.spec.in index 882e6e4..feabfef 100644 --- a/opensm/opensm.spec.in +++ b/opensm/opensm.spec.in @@ -94,9 +94,9 @@ if [ -f /etc/redhat-release -o -s /etc/redhat-release ]; then else REDHAT="" fi -mkdir -p $etc/{init.d, at OPENSM_CONF_SUB_DIR@,logrotate.d} +mkdir -p $etc/{init.d,logrotate.d} @OPENSM_CONFIG_DIR@ install -m 755 scripts/${REDHAT}opensm.init $etc/init.d/opensmd -install -m 644 scripts/opensm.conf $etc/@OPENSM_CONF_SUB_DIR@/opensm.conf +install -m 644 scripts/opensm.conf @OPENSM_CONFIG_DIR@/opensm.conf install -m 644 scripts/opensm.logrotate $etc/logrotate.d/opensm install -m 755 scripts/sldd.sh $RPM_BUILD_ROOT%{_sbindir}/sldd.sh @@ -128,10 +128,10 @@ fi %doc AUTHORS COPYING README %{_sysconfdir}/init.d/opensmd %{_sbindir}/sldd.sh -%config(noreplace) %{_sysconfdir}/@OPENSM_CONF_SUB_DIR@/opensm.conf +%config(noreplace) @OPENSM_CONFIG_DIR@/opensm.conf %config(noreplace) %{_sysconfdir}/logrotate.d/opensm %dir /var/cache/opensm -%dir %{_sysconfdir}/@OPENSM_CONF_SUB_DIR@ +%dir @OPENSM_CONFIG_DIR@ %files libs %defattr(-,root,root,-) diff --git a/opensm/scripts/opensmd.in b/opensm/scripts/opensmd.in index 434a92c..7e5d868 100755 --- a/opensm/scripts/opensmd.in +++ b/opensm/scripts/opensmd.in @@ -28,13 +28,13 @@ # # # processname: @sbindir@/opensm -# config: @sysconfig@/opensm.conf +# config: @OPENSM_CONFIG_DIR@/opensm.conf # pidfile: /var/run/opensm.pid prefix=@prefix@ exec_prefix=@exec_prefix@ -CONFIG=@sysconfdir@/@OPENSM_CONF_SUB_DIR@/opensm.conf +CONFIG=@OPENSM_CONFIG_DIR@/opensm.conf if [ ! -f $CONFIG ]; then exit 0 diff --git a/opensm/scripts/redhat-opensm.init.in b/opensm/scripts/redhat-opensm.init.in index 689ffa0..5cc9079 100755 --- a/opensm/scripts/redhat-opensm.init.in +++ b/opensm/scripts/redhat-opensm.init.in @@ -38,7 +38,7 @@ # $Id: openib-1.0-opensm.init,v 1.5 2006/08/02 18:18:23 dledford Exp $ # # processname: @sbindir@/opensm -# config: @sysconfdir@/@OPENSM_CONF_SUB_DIR@/opensm.conf +# config: @OPENSM_CONFIG_DIR@/opensm.conf # pidfile: /var/run/opensm.pid prefix=@prefix@ @@ -46,7 +46,7 @@ exec_prefix=@exec_prefix@ . /etc/rc.d/init.d/functions -CONFIG=@sysconfdir@/@OPENSM_CONF_SUB_DIR@/opensm.conf +CONFIG=@OPENSM_CONFIG_DIR@/opensm.conf if [ ! -f $CONFIG ]; then exit 0 fi -- 1.5.4.1.122.gaa8d From MeghanwildcatMetcalf at byucougars.com Tue Apr 1 11:14:46 2008 From: MeghanwildcatMetcalf at byucougars.com (Angel Gustafson) Date: Tue, 1 Apr 2008 17:14:46 -0100 Subject: [ofa-general] Investor Stock Alert Message-ID: <0IX047EJXVWDA869@byucougars.com> DnC Multimedia Corporation Today came firing out of the gate today Symbol:DCNM 600% Volume Spike and Over 20% gains on a a huge news release The hot PR DnC Multimedia Announces Distribution Agreement and $445,000 Purchase Order, read more about it. The trick with penny stocs is to hit it while its hot, and today's activity clearly backs our beliefs of DCNM being in the zone. Investors are discovering this hidden gem Grab this gem while its in cents it wont last there long. Ride the gains with DCNM DnC Multimedia Corporation Today From eli at dev.mellanox.co.il Tue Apr 1 08:35:46 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Tue, 01 Apr 2008 18:35:46 +0300 Subject: [ofa-general] Re: [PATCH 3/10] IB/core: Add LSO support In-Reply-To: References: <1205767431.25950.138.camel@mtls03> Message-ID: <1207064146.3781.19.camel@mtls03> Roland, would like me to re-generate the mlx4 LSO patch to match this commit or would you do the adjustments? On Fri, 2008-03-28 at 14:39 -0700, Roland Dreier wrote: > thanks, applied as below. > > For now I left the IB_WR_LSO opcode rather than a send flag, since the > mlx4 internal implementation is as a new opcode. However since this > is kernel-internal we can revisit this and I'm happy if the discussion > continues. > > From 86a0dd93c39739a39d6b5f7f67d4b2456c5f45ae Mon Sep 17 00:00:00 2001 > From: Eli Cohen > Date: Mon, 17 Mar 2008 17:23:51 +0200 > Subject: [PATCH] IB/core: Add IPoIB UD LSO support > > LSO (large send offload) allows the networking stack to pass SKBs with > data size larger than the MTU to the IPoIB driver and have the HCA HW > fragment the data to multiple MSS-sized packets. Add a device > capability flag IB_DEVICE_UD_TSO for devices that can perform TCP > segmentation offload, a new send work request opcode IB_WR_LSO, > header, hlen and mss fields for the work request structure, and a new > IB_WC_LSO completion type. > > Signed-off-by: Eli Cohen > Signed-off-by: Roland Dreier > --- > include/rdma/ib_verbs.h | 8 +++++++- > 1 files changed, 7 insertions(+), 1 deletions(-) > > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h > index 3ac7371..5fe7723 100644 > --- a/include/rdma/ib_verbs.h > +++ b/include/rdma/ib_verbs.h > @@ -104,6 +104,7 @@ enum ib_device_cap_flags { > * IPoIB driver may set NETIF_F_IP_CSUM for datagram mode. > */ > IB_DEVICE_UD_IP_CSUM = (1<<18), > + IB_DEVICE_UD_TSO = (1<<19), > IB_DEVICE_SEND_W_INV = (1<<21), > }; > > @@ -412,6 +413,7 @@ enum ib_wc_opcode { > IB_WC_COMP_SWAP, > IB_WC_FETCH_ADD, > IB_WC_BIND_MW, > + IB_WC_LSO, > /* > * Set value of IB_WC_RECV so consumers can test if a completion is a > * receive by testing (opcode & IB_WC_RECV). > @@ -623,7 +625,8 @@ enum ib_wr_opcode { > IB_WR_SEND_WITH_IMM, > IB_WR_RDMA_READ, > IB_WR_ATOMIC_CMP_AND_SWP, > - IB_WR_ATOMIC_FETCH_AND_ADD > + IB_WR_ATOMIC_FETCH_AND_ADD, > + IB_WR_LSO > }; > > enum ib_send_flags { > @@ -662,6 +665,9 @@ struct ib_send_wr { > } atomic; > struct { > struct ib_ah *ah; > + void *header; > + int hlen; > + int mss; > u32 remote_qpn; > u32 remote_qkey; > u16 pkey_index; /* valid for GSI only */ From eli at dev.mellanox.co.il Tue Apr 1 08:35:46 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Tue, 01 Apr 2008 18:35:46 +0300 Subject: [ofa-general] Re: [PATCH 3/10] IB/core: Add LSO support In-Reply-To: References: <1205767431.25950.138.camel@mtls03> Message-ID: <1207064146.3781.19.camel@mtls03> Roland, would like me to re-generate the mlx4 LSO patch to match this commit or would you do the adjustments? On Fri, 2008-03-28 at 14:39 -0700, Roland Dreier wrote: > thanks, applied as below. > > For now I left the IB_WR_LSO opcode rather than a send flag, since the > mlx4 internal implementation is as a new opcode. However since this > is kernel-internal we can revisit this and I'm happy if the discussion > continues. > > From 86a0dd93c39739a39d6b5f7f67d4b2456c5f45ae Mon Sep 17 00:00:00 2001 > From: Eli Cohen > Date: Mon, 17 Mar 2008 17:23:51 +0200 > Subject: [PATCH] IB/core: Add IPoIB UD LSO support > > LSO (large send offload) allows the networking stack to pass SKBs with > data size larger than the MTU to the IPoIB driver and have the HCA HW > fragment the data to multiple MSS-sized packets. Add a device > capability flag IB_DEVICE_UD_TSO for devices that can perform TCP > segmentation offload, a new send work request opcode IB_WR_LSO, > header, hlen and mss fields for the work request structure, and a new > IB_WC_LSO completion type. > > Signed-off-by: Eli Cohen > Signed-off-by: Roland Dreier > --- > include/rdma/ib_verbs.h | 8 +++++++- > 1 files changed, 7 insertions(+), 1 deletions(-) > > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h > index 3ac7371..5fe7723 100644 > --- a/include/rdma/ib_verbs.h > +++ b/include/rdma/ib_verbs.h > @@ -104,6 +104,7 @@ enum ib_device_cap_flags { > * IPoIB driver may set NETIF_F_IP_CSUM for datagram mode. > */ > IB_DEVICE_UD_IP_CSUM = (1<<18), > + IB_DEVICE_UD_TSO = (1<<19), > IB_DEVICE_SEND_W_INV = (1<<21), > }; > > @@ -412,6 +413,7 @@ enum ib_wc_opcode { > IB_WC_COMP_SWAP, > IB_WC_FETCH_ADD, > IB_WC_BIND_MW, > + IB_WC_LSO, > /* > * Set value of IB_WC_RECV so consumers can test if a completion is a > * receive by testing (opcode & IB_WC_RECV). > @@ -623,7 +625,8 @@ enum ib_wr_opcode { > IB_WR_SEND_WITH_IMM, > IB_WR_RDMA_READ, > IB_WR_ATOMIC_CMP_AND_SWP, > - IB_WR_ATOMIC_FETCH_AND_ADD > + IB_WR_ATOMIC_FETCH_AND_ADD, > + IB_WR_LSO > }; > > enum ib_send_flags { > @@ -662,6 +665,9 @@ struct ib_send_wr { > } atomic; > struct { > struct ib_ah *ah; > + void *header; > + int hlen; > + int mss; > u32 remote_qpn; > u32 remote_qkey; > u16 pkey_index; /* valid for GSI only */ From sashak at voltaire.com Tue Apr 1 11:45:14 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 1 Apr 2008 18:45:14 +0000 Subject: [ofa-general] [PATCH] opensm/configure.in: make prefix routes config file configurable In-Reply-To: <20080401175246.GC27321@sashak.voltaire.com> References: <20080330232119.GM13708@sashak.voltaire.com> <47F023B6.2070302@mellanox.co.il> <20080401174508.GB27321@sashak.voltaire.com> <20080401175246.GC27321@sashak.voltaire.com> Message-ID: <20080401184514.GD27321@sashak.voltaire.com> Add configuration ability for prefix routes config file, similar to other OpenSM config files. Signed-off-by: Sasha Khapyorsky --- opensm/configure.in | 20 ++++++++++++++++++++ opensm/man/opensm.8.in | 2 +- 2 files changed, 21 insertions(+), 1 deletions(-) diff --git a/opensm/configure.in b/opensm/configure.in index 1f2bed5..7cf7076 100644 --- a/opensm/configure.in +++ b/opensm/configure.in @@ -166,6 +166,26 @@ AC_DEFINE_UNQUOTED(HAVE_DEFAULT_QOS_POLICY_FILE, [Define a QOS policy config file]) AC_SUBST(QOS_POLICY_FILE) +dnl Check for a different prefix-routes file +PREFIX_ROUTES_FILE=prefix-routes.conf +AC_MSG_CHECKING(for --with-prefix-routes-conf) +AC_ARG_WITH(prefix-routes-conf, + AC_HELP_STRING([--with-prefix-routes-conf=file], + [define a Prefix Routes config file (default is prefix-routes.conf)]), + [ case "$withval" in + no) + ;; + *) + PREFIX_ROUTES_FILE=$withval + ;; + esac ] +) +AC_MSG_RESULT($PREFIX_ROUTES_FILE) +AC_DEFINE_UNQUOTED(HAVE_DEFAULT_PREFIX_ROUTES_FILE, + ["$OPENSM_CONFIG/$PREFIX_ROUTES_FILE"], + [Define a Prefix Routes config file]) +AC_SUBST(PREFIX_ROUTES_FILE) + dnl select example event plugin or not OPENIB_OSM_DEFAULT_EVENT_PLUGIN_SEL diff --git a/opensm/man/opensm.8.in b/opensm/man/opensm.8.in index e93844d..1c47160 100644 --- a/opensm/man/opensm.8.in +++ b/opensm/man/opensm.8.in @@ -938,7 +938,7 @@ default partition config file default QOS policy config file .TP -.B @OPENSM_CONFIG_DIR@/prefix-routes.conf +.B @OPENSM_CONFIG_DIR@/@PREFIX_ROUTES_FILE@ default prefix routes file. .SH AUTHORS -- 1.5.4.1.122.gaa8d From sashak at voltaire.com Tue Apr 1 11:55:09 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 1 Apr 2008 18:55:09 +0000 Subject: [ofa-general] [PATCH] opensm/osm_base.h: use OPENSM_COFNIG_DIR in config files paths definitions In-Reply-To: <20080401184514.GD27321@sashak.voltaire.com> References: <20080330232119.GM13708@sashak.voltaire.com> <47F023B6.2070302@mellanox.co.il> <20080401174508.GB27321@sashak.voltaire.com> <20080401175246.GC27321@sashak.voltaire.com> <20080401184514.GD27321@sashak.voltaire.com> Message-ID: <20080401185509.GE27321@sashak.voltaire.com> Use OPENSM_CONFIG_DIR for config files paths definitions when appropriate HAVE_*_FILE macros are not set. Use /etc/opensm as default OpenSM config directory. Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_base.h | 32 ++++++++++++++++---------------- 1 files changed, 16 insertions(+), 16 deletions(-) diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h index 1a9abf0..cbe8205 100644 --- a/opensm/include/opensm/osm_base.h +++ b/opensm/include/opensm/osm_base.h @@ -224,12 +224,12 @@ BEGIN_C_DECLS */ #ifdef __WIN__ #define OSM_DEFAULT_PARTITION_CONFIG_FILE strcat(GetOsmCachePath(), "osm-partitions.conf") -#else /* !__WIN__ */ -# ifdef HAVE_DEFAULT_PARTITION_CONFIG_FILE -# define OSM_DEFAULT_PARTITION_CONFIG_FILE HAVE_DEFAULT_PARTITION_CONFIG_FILE -# else /* !HAVE_DEFAULT_PARTITION_CONFIG_FILE */ -# define OSM_DEFAULT_PARTITION_CONFIG_FILE "/etc/ofa/opensm-partitions.conf" -# endif /* HAVE_DEFAULT_PARTITION_CONFIG_FILE */ +#elif defined(HAVE_DEFAULT_PARTITION_CONFIG_FILE) +#define OSM_DEFAULT_PARTITION_CONFIG_FILE HAVE_DEFAULT_PARTITION_CONFIG_FILE +#elif defined(OSM_CONFIG_DIR) +#define OSM_DEFAULT_PARTITION_CONFIG_FILE OPENSM_CONFIG_DIR "/partitions.conf" +#else +#define OSM_DEFAULT_PARTITION_CONFIG_FILE "/etc/opensm/partitions.conf" #endif /* __WIN__ */ /***********/ @@ -244,12 +244,12 @@ BEGIN_C_DECLS */ #ifdef __WIN__ #define OSM_DEFAULT_QOS_POLICY_FILE strcat(GetOsmCachePath(), "osm-qos-policy.conf") -#else /* !__WIN__ */ -# ifdef HAVE_DEFAULT_QOS_POLICY_FILE -# define OSM_DEFAULT_QOS_POLICY_FILE HAVE_DEFAULT_QOS_POLICY_FILE -# else /* !HAVE_DEFAULT_QOS_POLICY_FILE */ -# define OSM_DEFAULT_QOS_POLICY_FILE "/etc/ofa/opensm-qos-policy.conf" -# endif /* HAVE_DEFAULT_QOS_POLICY_FILE */ +#elif defined(HAVE_DEFAULT_QOS_POLICY_FILE) +#define OSM_DEFAULT_QOS_POLICY_FILE HAVE_DEFAULT_QOS_POLICY_FILE +#elif defined(OSM_CONFIG_DIR) +#define OSM_DEFAULT_QOS_POLICY_FILE OPENSM_CONFIG_DIR "/qos-policy.conf" +#else +#define OSM_DEFAULT_QOS_POLICY_FILE "/etc/opensm/qos-policy.conf" #endif /* __WIN__ */ /***********/ @@ -264,12 +264,12 @@ BEGIN_C_DECLS */ #ifdef __WIN__ #define OSM_DEFAULT_PREFIX_ROUTES_FILE strcat(GetOsmCachePath(), "osm-prefix-routes.conf") -#else -#ifdef OPENSM_CONFIG_DIR +#elif defined(HAVE_DEFAULT_PREFIX_ROUTES_FILE) +#define OSM_DEFAULT_PREFIX_ROUTES_FILE HAVE_DEFAULT_PREFIX_ROUTES_FILE +#elif defined(OPENSM_CONFIG_DIR) #define OSM_DEFAULT_PREFIX_ROUTES_FILE OPENSM_CONFIG_DIR "/prefix-routes.conf" #else -#define OSM_DEFAULT_PREFIX_ROUTES_FILE "/etc/ofa/opensm-prefix-routes.conf" -#endif +#define OSM_DEFAULT_PREFIX_ROUTES_FILE "/etc/opensm/prefix-routes.conf" #endif /***********/ -- 1.5.4.1.122.gaa8d From gstreiff at NetEffect.com Tue Apr 1 09:11:47 2008 From: gstreiff at NetEffect.com (Glenn Streiff) Date: Tue, 1 Apr 2008 11:11:47 -0500 Subject: [ofa-general] RE: [ewg] OFED March 24 meeting summary on OFED 1.4 plans In-Reply-To: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC079500B6@venom2> > OFED March 24 meeting summary about OFED 1.4 and 1.3.1 plans: > 1.3.1 Release: > As we decided we should do a release in 2-3 month after 1.3. > In addition if there are any special fixes as outcome from the > interop we can do a release earlier. > All - please send me your requests for fixed issues and needed time > frame and I will publish 1.3.1 schedule based on this. Hi, Tziporet. Just to refresh what I said at the last conference call, NetEffect has at least one fix (already upstream), that we would like to see in an OFED 1.3.1 build. In terms of desired timeframe...late April or early May? Have fun at Sonoma. Dave Sommers from NetEffect will be there while I work through my backlog. :-/ Regards, Glenn From Arkady.Kanevsky at netapp.com Tue Apr 1 09:30:25 2008 From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady) Date: Tue, 1 Apr 2008 12:30:25 -0400 Subject: [ofa-general] Re: files preamble In-Reply-To: <47EF7054.6020503@voltaire.com> References: <47E5EF49.9080506@voltaire.com> <47EC06AF.8000309@sun.com><47EF6B57.40502@voltaire.com> <47EF7054.6020503@voltaire.com> Message-ID: I am very doubtful that you can remote it. Some of that is based on earlier work by IBM in DAPL which was submitted under 3 licenses. My goal was to globally change preambule from OpenIB to OpenFabrics. Arkady Kanevsky email: arkady at netapp.com Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 > -----Original Message----- > From: Or Gerlitz [mailto:ogerlitz at voltaire.com] > Sent: Sunday, March 30, 2008 6:50 AM > To: Ted H. Kim > Cc: openib-general at openib.org > Subject: Re: [ofa-general] Re: files preamble > > Or Gerlitz wrote: > > Ted H. Kim wrote: > >> For example, it appears addr.c, cma.c, ib_addr.h, rdma_cm.h and > >> rdma_cm_ib.h -- all have the "Common Public License 1.0" > >> mentioned. > >> > > I have no idea what the common public license is, generally > speaking I > > think it would be fine if you send a patch that removes it from all > > the files under drivers/infiniband and include/rdma. > Hi Ted, > > OK, as its about legals, if you want to drive the removal of > this license from the files, best if its first being > discussed in an appropriate forum which I am not sure if this > list is, so in that respect, I take back my proposal for you > to send a patch... > > Or. > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From arlin.r.davis at intel.com Tue Apr 1 11:17:22 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Tue, 1 Apr 2008 11:17:22 -0700 Subject: [ofa-general] [PATCH 1/1][v2] dapl: calculate private data size based on transport type and cma_hdr overhead Message-ID: <000001c89424$a1673c10$9f97070a@amr.corp.intel.com> Need to adjust CM private date size based on different transport types. Add hca_ptr to dapls_ib_private_data_size call for transport type validation via verbs device. Add definitions to include iWARP size of 512 and subtract 36 bytes for cma_hdr overhead. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/common/dapl_adapter_util.h | 3 ++- dapl/common/dapl_cr_callback.c | 3 ++- dapl/common/dapl_ep_connect.c | 4 +++- dapl/common/dapl_evd_connection_callb.c | 4 +++- dapl/common/dapl_ia_query.c | 5 +++-- dapl/ibal-scm/dapl_ibal-scm_cm.c | 4 +++- dapl/ibal/dapl_ibal_cm.c | 2 ++ dapl/openib/dapl_ib_cm.c | 4 +++- dapl/openib_cma/dapl_ib_cm.c | 10 ++++++++-- dapl/openib_cma/dapl_ib_util.h | 14 ++++++++------ dapl/openib_scm/dapl_ib_cm.c | 4 +++- 11 files changed, 40 insertions(+), 17 deletions(-) diff --git a/dapl/common/dapl_adapter_util.h b/dapl/common/dapl_adapter_util.h index 6738d6a..d664bf6 100755 --- a/dapl/common/dapl_adapter_util.h +++ b/dapl/common/dapl_adapter_util.h @@ -239,7 +239,8 @@ DAT_RETURN dapls_ib_cm_remote_addr ( int dapls_ib_private_data_size ( IN DAPL_PRIVATE *prd_ptr, - IN DAPL_PDATA_OP conn_op); + IN DAPL_PDATA_OP conn_op, + IN DAPL_HCA *hca_ptr); void dapls_query_provider_specific_attr( diff --git a/dapl/common/dapl_cr_callback.c b/dapl/common/dapl_cr_callback.c index e8f58a4..46d2b4c 100644 --- a/dapl/common/dapl_cr_callback.c +++ b/dapl/common/dapl_cr_callback.c @@ -378,7 +378,8 @@ dapli_connection_request ( else { cr_ptr->param.private_data_size = - dapls_ib_private_data_size (prd_ptr, DAPL_PDATA_CONN_REQ); + dapls_ib_private_data_size(prd_ptr, DAPL_PDATA_CONN_REQ, + sp_ptr->header.owner_ia->hca_ptr); } if (cr_ptr->param.private_data_size > 0) { diff --git a/dapl/common/dapl_ep_connect.c b/dapl/common/dapl_ep_connect.c index 12d391f..f290ebe 100755 --- a/dapl/common/dapl_ep_connect.c +++ b/dapl/common/dapl_ep_connect.c @@ -258,7 +258,9 @@ dapl_ep_connect ( */ req_hdr_size = (sizeof (DAPL_PRIVATE) - DAPL_MAX_PRIVATE_DATA_SIZE); - max_req_pdata_size = dapls_ib_private_data_size (NULL, DAPL_PDATA_CONN_REQ); + max_req_pdata_size = dapls_ib_private_data_size( + NULL, DAPL_PDATA_CONN_REQ, + ep_ptr->header.owner_ia->hca_ptr); if (private_data_size + req_hdr_size > (DAT_COUNT)max_req_pdata_size) { diff --git a/dapl/common/dapl_evd_connection_callb.c b/dapl/common/dapl_evd_connection_callb.c index 3c4e0cb..d3a39a6 100644 --- a/dapl/common/dapl_evd_connection_callb.c +++ b/dapl/common/dapl_evd_connection_callb.c @@ -148,7 +148,9 @@ dapl_evd_connection_callback ( else { private_data_size = - dapls_ib_private_data_size (prd_ptr, DAPL_PDATA_CONN_REP); + dapls_ib_private_data_size( + prd_ptr, DAPL_PDATA_CONN_REP, + ep_ptr->header.owner_ia->hca_ptr); } if (private_data_size > 0) diff --git a/dapl/common/dapl_ia_query.c b/dapl/common/dapl_ia_query.c index 593f356..a8c39a3 100755 --- a/dapl/common/dapl_ia_query.c +++ b/dapl/common/dapl_ia_query.c @@ -156,8 +156,9 @@ dapl_ia_query ( * to 0 unless IBHOSTS_NAMING is enabled. */ provider_attr->max_private_data_size = - dapls_ib_private_data_size (NULL, DAPL_PDATA_CONN_REQ) - - (sizeof (DAPL_PRIVATE) - DAPL_MAX_PRIVATE_DATA_SIZE); + dapls_ib_private_data_size(NULL, DAPL_PDATA_CONN_REQ, + ia_ptr->hca_ptr) - + (sizeof(DAPL_PRIVATE) - DAPL_MAX_PRIVATE_DATA_SIZE); provider_attr->supports_multipath = DAT_FALSE; provider_attr->ep_creator = DAT_PSP_CREATES_EP_NEVER; provider_attr->optimal_buffer_alignment = DAT_OPTIMAL_ALIGNMENT; diff --git a/dapl/ibal-scm/dapl_ibal-scm_cm.c b/dapl/ibal-scm/dapl_ibal-scm_cm.c index 692e5b9..fcf5215 100644 --- a/dapl/ibal-scm/dapl_ibal-scm_cm.c +++ b/dapl/ibal-scm/dapl_ibal-scm_cm.c @@ -1019,6 +1019,7 @@ dapls_ib_cm_remote_addr ( * Input: * prd_ptr private data pointer * conn_op connection operation type + * hca_ptr hca pointer, needed for transport type * * If prd_ptr is NULL, this is a query for the max size supported by * the provider, otherwise it is the actual size of the private data @@ -1034,7 +1035,8 @@ dapls_ib_cm_remote_addr ( */ int dapls_ib_private_data_size ( IN DAPL_PRIVATE *prd_ptr, - IN DAPL_PDATA_OP conn_op) + IN DAPL_PDATA_OP conn_op, + IN DAPL_HCA *hca_ptr) { int size; diff --git a/dapl/ibal/dapl_ibal_cm.c b/dapl/ibal/dapl_ibal_cm.c index 9f3ffc4..6cd652f 100644 --- a/dapl/ibal/dapl_ibal_cm.c +++ b/dapl/ibal/dapl_ibal_cm.c @@ -1679,6 +1679,7 @@ dapls_ib_cr_handoff ( * Return the size of private data given a connection op type * * Input: + * hca_ptr hca pointer, needed for transport type * prd_ptr private data pointer * conn_op connection operation type * @@ -1697,6 +1698,7 @@ dapls_ib_cr_handoff ( */ int dapls_ib_private_data_size ( + IN DAPL_HCA *hca_ptr, IN DAPL_PRIVATE *prd_ptr, IN DAPL_PDATA_OP conn_op) { diff --git a/dapl/openib/dapl_ib_cm.c b/dapl/openib/dapl_ib_cm.c index 2ff2ba0..76d5968 100644 --- a/dapl/openib/dapl_ib_cm.c +++ b/dapl/openib/dapl_ib_cm.c @@ -1049,6 +1049,7 @@ dapls_ib_cm_remote_addr ( * Input: * prd_ptr private data pointer * conn_op connection operation type + * hca_ptr hca pointer, needed for transport type * * If prd_ptr is NULL, this is a query for the max size supported by * the provider, otherwise it is the actual size of the private data @@ -1064,7 +1065,8 @@ dapls_ib_cm_remote_addr ( */ int dapls_ib_private_data_size ( IN DAPL_PRIVATE *prd_ptr, - IN DAPL_PDATA_OP conn_op) + IN DAPL_PDATA_OP conn_op, + IN DAPL_HCA *hca_ptr) { int size; diff --git a/dapl/openib_cma/dapl_ib_cm.c b/dapl/openib_cma/dapl_ib_cm.c index 04b9e41..cf79142 100755 --- a/dapl/openib_cma/dapl_ib_cm.c +++ b/dapl/openib_cma/dapl_ib_cm.c @@ -972,6 +972,7 @@ dapls_ib_cm_remote_addr(IN DAT_HANDLE dat_handle, OUT DAT_SOCK_ADDR6 *raddr) * Input: * prd_ptr private data pointer * conn_op connection operation type + * hca_ptr hca pointer, needed for transport type * * If prd_ptr is NULL, this is a query for the max size supported by * the provider, otherwise it is the actual size of the private data @@ -985,11 +986,16 @@ dapls_ib_cm_remote_addr(IN DAT_HANDLE dat_handle, OUT DAT_SOCK_ADDR6 *raddr) * length of private data * */ -int dapls_ib_private_data_size(IN DAPL_PRIVATE *prd_ptr, - IN DAPL_PDATA_OP conn_op) +int dapls_ib_private_data_size( IN DAPL_PRIVATE *prd_ptr, + IN DAPL_PDATA_OP conn_op, + IN DAPL_HCA *hca_ptr) { int size; + if (hca_ptr->ib_hca_handle->device->transport_type + == IBV_TRANSPORT_IWARP) + return(IWARP_MAX_PDATA_SIZE); + switch(conn_op) { case DAPL_PDATA_CONN_REQ: diff --git a/dapl/openib_cma/dapl_ib_util.h b/dapl/openib_cma/dapl_ib_util.h index 2f01fc3..f35cb9d 100755 --- a/dapl/openib_cma/dapl_ib_util.h +++ b/dapl/openib_cma/dapl_ib_util.h @@ -113,12 +113,14 @@ typedef struct _ib_wait_obj_handle /* inline send rdma threshold */ #define INLINE_SEND_DEFAULT 64 -/* CM private data areas */ -#define IB_MAX_REQ_PDATA_SIZE 48 -#define IB_MAX_REP_PDATA_SIZE 196 -#define IB_MAX_REJ_PDATA_SIZE 148 -#define IB_MAX_DREQ_PDATA_SIZE 220 -#define IB_MAX_DREP_PDATA_SIZE 224 +/* CMA private data areas */ +#define CMA_PDATA_HDR 36 +#define IB_MAX_REQ_PDATA_SIZE (92-CMA_PDATA_HDR) +#define IB_MAX_REP_PDATA_SIZE (196-CMA_PDATA_HDR) +#define IB_MAX_REJ_PDATA_SIZE (148-CMA_PDATA_HDR) +#define IB_MAX_DREQ_PDATA_SIZE (220-CMA_PDATA_HDR) +#define IB_MAX_DREP_PDATA_SIZE (224-CMA_PDATA_HDR) +#define IWARP_MAX_PDATA_SIZE (512-CMA_PDATA_HDR) /* DTO OPs, ordered for DAPL ENUM definitions */ #define OP_RDMA_WRITE IBV_WR_RDMA_WRITE diff --git a/dapl/openib_scm/dapl_ib_cm.c b/dapl/openib_scm/dapl_ib_cm.c index f534e8d..485ab9b 100644 --- a/dapl/openib_scm/dapl_ib_cm.c +++ b/dapl/openib_scm/dapl_ib_cm.c @@ -827,6 +827,7 @@ dapls_ib_cm_remote_addr ( * Input: * prd_ptr private data pointer * conn_op connection operation type + * hca_ptr hca pointer, needed for transport type * * If prd_ptr is NULL, this is a query for the max size supported by * the provider, otherwise it is the actual size of the private data @@ -842,7 +843,8 @@ dapls_ib_cm_remote_addr ( */ int dapls_ib_private_data_size ( IN DAPL_PRIVATE *prd_ptr, - IN DAPL_PDATA_OP conn_op) + IN DAPL_PDATA_OP conn_op, + IN DAPL_HCA *hca_ptr) { int size; -- 1.5.2.5 From chu11 at llnl.gov Tue Apr 1 11:29:39 2008 From: chu11 at llnl.gov (Al Chu) Date: Tue, 01 Apr 2008 11:29:39 -0700 Subject: [ofa-general] [Infiniband-Diags] [PATCH] saquery exit with non-zero code on bad input Message-ID: <1207074579.15637.153.camel@cardanus.llnl.gov> Hey Sasha, If an input into saquery isn't found, saquery still exits with '0' status, so it poses a problem in scripting. This patch exits w/ non-zero if the input isn't found by saquery. The actual status code I selected to return can be revised. I just sort of picked one. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-exit-non-zero-if-saquery-input-not-found.patch Type: text/x-patch Size: 2134 bytes Desc: not available URL: From ttubby at pearlriverresort.com Tue Apr 1 11:46:51 2008 From: ttubby at pearlriverresort.com (Rosendo Fox) Date: Tue, 1 Apr 2008 13:46:51 -0500 Subject: [ofa-general] Vergleichen Sie Preise und kaufen Sie hier Message-ID: <249563945.37193639129250@pearlriverresort.com> Bekommen Sie Ihre Software unverzüglich. Einfach zahlen und sofort runterladen. Hier sind Programme in allen europäischen Sprachen verfügbar, programmiert für Windows und Macintosh. Alle Softwaren sind sehr günstig, es handelt sich dabei garantiert um originale, komplette und völlig funktionale Versionen. Professionelle und persönliche Beratung von unserem Kundencenter wird Ihnen sicherlich bei der Softwareinstallation helfen. Schnelle Antworten garantiert. Geld-Zurück-Garantie ist verfügbar! Kaufen Sie die perfekt funktionierte Softwarehttp://geocities.com/wall_damion/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mashirle at us.ibm.com Tue Apr 1 03:52:04 2008 From: mashirle at us.ibm.com (Shirley Ma) Date: Tue, 01 Apr 2008 03:52:04 -0700 Subject: [ofa-general] Re: [RFC][1/2] IPoIB UD 4K MTU support In-Reply-To: <47E61B05.2020003@voltaire.com> References: <1206005880.8399.20.camel@localhost.localdomain> <47E61B05.2020003@voltaire.com> Message-ID: <1207047124.4593.38.camel@localhost.localdomain> Hello Or, Thanks for your view. > Reading ipoib_ud_skb_put_frags below and its usage in the patch that follows, its unclear to me if IPOIB_UD_MAX_PAYLOAD is being made of (4K - IPOIB_ENCAP_LEN) + IPOIB_ENCAP_LEN or from adjustment to some IP header alignment constraint. Specifically, the design I'd like to see here is that the IPoIB header telling the type of the frame (ARP, IPv4, IPv6, etc) is provided up to the stack as part of the packet in the skb (eg its very useful with tcpdump/etc filters). The max payload is the max IB mtu here. It's 4K. IPoIB mtu = IB-mtu - IPoIB header = 4K - 4. > Reading earlier threads I see that Roland suggested to allow for upto 4K-4 mtu towards the stack and use some internal buffer for the GRH where this buffer can be allocated and dma mapped once and being forget from till the driver cleans up, etc. Was there any problem with this approach? The implementation here is using one buffer for PAGE_SIZE greater than 4K. Using two buffers for PAGE_SIZE = 4K. One buffer is 4K-4 IPoIB MTU which contains data. One buffer is 44 bytes header (GRH header + IPoIB header). I uses a generic routine for IPoIB receiving path regardless of MTU size, it significantly reduces the size of the patch. We can't just dam map once for this combined header(GRH header + IPoIB header) buffer, GRH header is obsoleted, but not IPoIB header, right? What Roland suggested before was to have GRH in one buffer, IPoIB header and data in the second buffer. If we do so, the total size of the second buffer is 4K, plus the IP header alignment (12 bytes), it will exceed one page size, which is the problem we are trying to solve here. > > +static inline void ipoib_ud_skb_put_frags(struct ipoib_dev_priv *priv, > > + struct sk_buff *skb, > > + unsigned int length) > > +{ > > + if (ipoib_ud_need_sg(priv->max_ib_mtu)) { > > + skb_frag_t *frag = &skb_shinfo(skb)->frags[0]; > > + /* > > + * There is only two buffers needed for max_payload = 4K, > > + * first buf size is IPOIB_UD_HEAD_SIZE > > + */ > > + skb->tail += IPOIB_UD_HEAD_SIZE; > > + frag->size = length - IPOIB_UD_HEAD_SIZE; > > + skb->data_len += frag->size; > > + skb->truesize += frag->size; > > + skb->len += length; > > + } else > > + skb_put(skb, length); > > + > > +} > > > I fail to follow what this code really wants to do and how it does it. > Is there a must to touch "by hand" all the internal skb fields? Since there is only two S/G, this way uses less instructions for adjusting length of skb with fragments to match received data. You can refer other device driver like skb_put_frags() is used in ipoib-cm. > also > this function is called once by ipoib_ib_handle_rx_wc in the patch that > follows, any reason not to make it static over there? It's not necessary to be there, I can move this function to ipoib_ib.c instead. Thanks Shirley From or.gerlitz at gmail.com Tue Apr 1 11:54:26 2008 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Tue, 1 Apr 2008 21:54:26 +0300 Subject: [ofa-general] Re: files preamble In-Reply-To: References: <47E5EF49.9080506@voltaire.com> <47EC06AF.8000309@sun.com> <47EF6B57.40502@voltaire.com> <47EF7054.6020503@voltaire.com> Message-ID: <15ddcffd0804011154o3fa5c18bu552952bbfce5902f@mail.gmail.com> >> Ted H. Kim wrote: >>> For example, it appears addr.c, cma.c, ib_addr.h, rdma_cm.h and >>> rdma_cm_ib.h -- all have the "Common Public License 1.0" On 4/1/08, Kanevsky, Arkady wrote: > > I am very doubtful that you can remote it. > Some of that is based on earlier work by IBM in DAPL which was submitted > under 3 licenses. > > What?! As far as I know all these files were written by Sean from scratch. Or. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue Apr 1 12:41:26 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 01 Apr 2008 12:41:26 -0700 Subject: [ofa-general] Re: [PATCH 3/10] IB/core: Add LSO support In-Reply-To: <1207064146.3781.19.camel@mtls03> (Eli Cohen's message of "Tue, 01 Apr 2008 18:35:46 +0300") References: <1205767431.25950.138.camel@mtls03> <1207064146.3781.19.camel@mtls03> Message-ID: > would like me to re-generate the mlx4 LSO patch to match this commit or > would you do the adjustments? Sorry for being so slow. Anyway I did the adjustments as below. I also removed the "reserve" variable and moved the 64 byte extra for LSO into send_wqe_overhead(), since it seemed that the only place where you used send_wqe_overhead() without adding in reserve was actually a bug. I also did various changes other places, and maybe introduced a bug: when I try NPtcp between two systems (once running unmodified 2.6.25-rc8, the other running my for-2.6.26 branch, both with ConnectX with FW 2.3.000), on the side with the LSO patch, I eventually get a "local length error" or "local QP operation err" on a send. It is an LSO send of length 63744 with 17 fragments and an mss of 1992, so it should be segmented into 32 packets. Some of these sends complete successfully but eventually one fails. I'm still debugging but maybe you have some idea? When I get the local QP operation error, I get this in case it helps: local QP operation err (QPN 000048, WQE index affa, vendor syndrome 6f, opcode = 5e) CQE contents 00000048 00000000 00000000 00000000 00000000 00000000 affa6f02 0000005e - R. >From 141035c707b81638659ada01f456d066f2b353f7 Mon Sep 17 00:00:00 2001 From: Eli Cohen Date: Tue, 25 Mar 2008 15:35:12 +0200 Subject: [PATCH] IB/mlx4: Add IPoIB LSO support Add TSO support to the mlx4_ib driver. Signed-off-by: Eli Cohen Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mlx4/cq.c | 3 + drivers/infiniband/hw/mlx4/main.c | 2 + drivers/infiniband/hw/mlx4/mlx4_ib.h | 5 ++ drivers/infiniband/hw/mlx4/qp.c | 72 +++++++++++++++++++++++++++++---- drivers/net/mlx4/fw.c | 9 ++++ drivers/net/mlx4/fw.h | 1 + drivers/net/mlx4/main.c | 1 + include/linux/mlx4/device.h | 1 + include/linux/mlx4/qp.h | 5 ++ 9 files changed, 90 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index d2e32b0..7d70af7 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -420,6 +420,9 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq, case MLX4_OPCODE_BIND_MW: wc->opcode = IB_WC_BIND_MW; break; + case MLX4_OPCODE_LSO: + wc->opcode = IB_WC_LSO; + break; } } else { wc->byte_len = be32_to_cpu(cqe->byte_cnt); diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 6ea4746..e9330a0 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -101,6 +101,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, props->device_cap_flags |= IB_DEVICE_UD_AV_PORT_ENFORCE; if (dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_IPOIB_CSUM) props->device_cap_flags |= IB_DEVICE_UD_IP_CSUM; + if (dev->dev->caps.max_gso_sz) + props->device_cap_flags |= IB_DEVICE_UD_TSO; props->vendor_id = be32_to_cpup((__be32 *) (out_mad->data + 36)) & 0xffffff; diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index 3726e45..3f8bd0a 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -110,6 +110,10 @@ struct mlx4_ib_wq { unsigned tail; }; +enum mlx4_ib_qp_flags { + MLX4_IB_QP_LSO = 1 << 0 +}; + struct mlx4_ib_qp { struct ib_qp ibqp; struct mlx4_qp mqp; @@ -129,6 +133,7 @@ struct mlx4_ib_qp { struct mlx4_mtt mtt; int buf_size; struct mutex mutex; + u32 flags; u8 port; u8 alt_port; u8 atomic_rd_en; diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 320c25f..8ddb97e 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -71,6 +71,7 @@ enum { static const __be32 mlx4_ib_opcode[] = { [IB_WR_SEND] = __constant_cpu_to_be32(MLX4_OPCODE_SEND), + [IB_WR_LSO] = __constant_cpu_to_be32(MLX4_OPCODE_LSO), [IB_WR_SEND_WITH_IMM] = __constant_cpu_to_be32(MLX4_OPCODE_SEND_IMM), [IB_WR_RDMA_WRITE] = __constant_cpu_to_be32(MLX4_OPCODE_RDMA_WRITE), [IB_WR_RDMA_WRITE_WITH_IMM] = __constant_cpu_to_be32(MLX4_OPCODE_RDMA_WRITE_IMM), @@ -242,7 +243,7 @@ static void mlx4_ib_qp_event(struct mlx4_qp *qp, enum mlx4_event type) } } -static int send_wqe_overhead(enum ib_qp_type type) +static int send_wqe_overhead(enum ib_qp_type type, u32 flags) { /* * UD WQEs must have a datagram segment. @@ -253,7 +254,8 @@ static int send_wqe_overhead(enum ib_qp_type type) switch (type) { case IB_QPT_UD: return sizeof (struct mlx4_wqe_ctrl_seg) + - sizeof (struct mlx4_wqe_datagram_seg); + sizeof (struct mlx4_wqe_datagram_seg) + + (flags & MLX4_IB_QP_LSO) ? 64 : 0; case IB_QPT_UC: return sizeof (struct mlx4_wqe_ctrl_seg) + sizeof (struct mlx4_wqe_raddr_seg); @@ -315,7 +317,7 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, /* Sanity check SQ size before proceeding */ if (cap->max_send_wr > dev->dev->caps.max_wqes || cap->max_send_sge > dev->dev->caps.max_sq_sg || - cap->max_inline_data + send_wqe_overhead(type) + + cap->max_inline_data + send_wqe_overhead(type, qp->flags) + sizeof (struct mlx4_wqe_inline_seg) > dev->dev->caps.max_sq_desc_sz) return -EINVAL; @@ -329,7 +331,7 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, s = max(cap->max_send_sge * sizeof (struct mlx4_wqe_data_seg), cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg)) + - send_wqe_overhead(type); + send_wqe_overhead(type, qp->flags); /* * Hermon supports shrinking WQEs, such that a single work @@ -394,7 +396,8 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, } qp->sq.max_gs = ((qp->sq_max_wqes_per_wr << qp->sq.wqe_shift) - - send_wqe_overhead(type)) / sizeof (struct mlx4_wqe_data_seg); + send_wqe_overhead(type, qp->flags)) / + sizeof (struct mlx4_wqe_data_seg); qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) + (qp->sq.wqe_cnt << qp->sq.wqe_shift); @@ -503,6 +506,9 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, } else { qp->sq_no_prefetch = 0; + if (init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO) + qp->flags |= MLX4_IB_QP_LSO; + err = set_kernel_sq_size(dev, &init_attr->cap, init_attr->qp_type, qp); if (err) goto err; @@ -673,7 +679,11 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd, struct mlx4_ib_qp *qp; int err; - if (init_attr->create_flags) + /* We only support LSO, and only for kernel UD QPs. */ + if (init_attr->create_flags & ~IB_QP_CREATE_IPOIB_UD_LSO) + return ERR_PTR(-EINVAL); + if (init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO && + (pd->uobject || init_attr->qp_type != IB_QPT_UD)) return ERR_PTR(-EINVAL); switch (init_attr->qp_type) { @@ -879,10 +889,15 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, } } - if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI || - ibqp->qp_type == IB_QPT_UD) + if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI) context->mtu_msgmax = (IB_MTU_4096 << 5) | 11; - else if (attr_mask & IB_QP_PATH_MTU) { + else if (ibqp->qp_type == IB_QPT_UD) { + if (qp->flags & MLX4_IB_QP_LSO) + context->mtu_msgmax = (IB_MTU_4096 << 5) | + ilog2(dev->dev->caps.max_gso_sz); + else + context->mtu_msgmax = (IB_MTU_4096 << 5) | 11; + } else if (attr_mask & IB_QP_PATH_MTU) { if (attr->path_mtu < IB_MTU_256 || attr->path_mtu > IB_MTU_4096) { printk(KERN_ERR "path MTU (%u) is invalid\n", attr->path_mtu); @@ -1399,6 +1414,34 @@ static void __set_data_seg(struct mlx4_wqe_data_seg *dseg, struct ib_sge *sg) dseg->addr = cpu_to_be64(sg->addr); } +static int build_lso_seg(struct mlx4_lso_seg *wqe, struct ib_send_wr *wr, + struct mlx4_ib_qp *qp, unsigned *lso_seg_len) +{ + unsigned halign = ALIGN(wr->wr.ud.hlen, 16); + + /* + * This is a temporary limitation and will be removed in + * a forthcoming FW release: + */ + if (unlikely(wr->wr.ud.hlen) > 60) + return -EINVAL; + + if (unlikely(!(qp->flags & MLX4_IB_QP_LSO) && + wr->num_sge > qp->sq.max_gs - (halign >> 4))) + return -EINVAL; + + memcpy(wqe->header, wr->wr.ud.header, wr->wr.ud.hlen); + + /* make sure LSO header is written before overwriting stamping */ + wmb(); + + wqe->mss_hdr_size = cpu_to_be32((wr->wr.ud.mss - wr->wr.ud.hlen) << 16 | + wr->wr.ud.hlen); + + *lso_seg_len = halign; + return 0; +} + int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, struct ib_send_wr **bad_wr) { @@ -1412,6 +1455,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, unsigned ind; int uninitialized_var(stamp); int uninitialized_var(size); + unsigned seglen; int i; spin_lock_irqsave(&qp->sq.lock, flags); @@ -1490,6 +1534,16 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, set_datagram_seg(wqe, wr); wqe += sizeof (struct mlx4_wqe_datagram_seg); size += sizeof (struct mlx4_wqe_datagram_seg) / 16; + + if (wr->opcode == IB_WR_LSO) { + err = build_lso_seg(wqe, wr, qp, &seglen); + if (err) { + *bad_wr = wr; + goto out; + } + wqe += seglen; + size += seglen / 16; + } break; case IB_QPT_SMI: diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c index f494c3e..d82f275 100644 --- a/drivers/net/mlx4/fw.c +++ b/drivers/net/mlx4/fw.c @@ -133,6 +133,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) #define QUERY_DEV_CAP_MAX_AV_OFFSET 0x27 #define QUERY_DEV_CAP_MAX_REQ_QP_OFFSET 0x29 #define QUERY_DEV_CAP_MAX_RES_QP_OFFSET 0x2b +#define QUERY_DEV_CAP_MAX_GSO_OFFSET 0x2d #define QUERY_DEV_CAP_MAX_RDMA_OFFSET 0x2f #define QUERY_DEV_CAP_RSZ_SRQ_OFFSET 0x33 #define QUERY_DEV_CAP_ACK_DELAY_OFFSET 0x35 @@ -215,6 +216,13 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev_cap->max_requester_per_qp = 1 << (field & 0x3f); MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_RES_QP_OFFSET); dev_cap->max_responder_per_qp = 1 << (field & 0x3f); + MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_GSO_OFFSET); + field &= 0x1f; + if (!field) + dev_cap->max_gso_sz = 0; + else + dev_cap->max_gso_sz = 1 << field; + MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_RDMA_OFFSET); dev_cap->max_rdma_global = 1 << (field & 0x3f); MLX4_GET(field, outbox, QUERY_DEV_CAP_ACK_DELAY_OFFSET); @@ -377,6 +385,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev_cap->max_sq_desc_sz, dev_cap->max_sq_sg); mlx4_dbg(dev, "Max RQ desc size: %d, max RQ S/G: %d\n", dev_cap->max_rq_desc_sz, dev_cap->max_rq_sg); + mlx4_dbg(dev, "Max GSO size: %d\n", dev_cap->max_gso_sz); dump_dev_cap_flags(dev, dev_cap->flags); diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h index e16dec8..306cb9b 100644 --- a/drivers/net/mlx4/fw.h +++ b/drivers/net/mlx4/fw.h @@ -96,6 +96,7 @@ struct mlx4_dev_cap { u8 bmme_flags; u32 reserved_lkey; u64 max_icm_sz; + int max_gso_sz; }; struct mlx4_adapter { diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 08bfc13..7cfbe75 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -159,6 +159,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev->caps.page_size_cap = ~(u32) (dev_cap->min_page_sz - 1); dev->caps.flags = dev_cap->flags; dev->caps.stat_rate_support = dev_cap->stat_rate_support; + dev->caps.max_gso_sz = dev_cap->max_gso_sz; return 0; } diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 6cdf813..ff7df1a 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -186,6 +186,7 @@ struct mlx4_caps { u32 flags; u16 stat_rate_support; u8 port_width_cap[MLX4_MAX_PORTS + 1]; + int max_gso_sz; }; struct mlx4_buf_list { diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h index 31f9eb3..cf0bf4e 100644 --- a/include/linux/mlx4/qp.h +++ b/include/linux/mlx4/qp.h @@ -219,6 +219,11 @@ struct mlx4_wqe_datagram_seg { __be32 reservd[2]; }; +struct mlx4_lso_seg { + __be32 mss_hdr_size; + __be32 header[0]; +}; + struct mlx4_wqe_bind_seg { __be32 flags1; __be32 flags2; -- 1.5.4.5 From rdreier at cisco.com Tue Apr 1 12:41:26 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 01 Apr 2008 12:41:26 -0700 Subject: [ofa-general] Re: [PATCH 3/10] IB/core: Add LSO support In-Reply-To: <1207064146.3781.19.camel@mtls03> (Eli Cohen's message of "Tue, 01 Apr 2008 18:35:46 +0300") References: <1205767431.25950.138.camel@mtls03> <1207064146.3781.19.camel@mtls03> Message-ID: > would like me to re-generate the mlx4 LSO patch to match this commit or > would you do the adjustments? Sorry for being so slow. Anyway I did the adjustments as below. I also removed the "reserve" variable and moved the 64 byte extra for LSO into send_wqe_overhead(), since it seemed that the only place where you used send_wqe_overhead() without adding in reserve was actually a bug. I also did various changes other places, and maybe introduced a bug: when I try NPtcp between two systems (once running unmodified 2.6.25-rc8, the other running my for-2.6.26 branch, both with ConnectX with FW 2.3.000), on the side with the LSO patch, I eventually get a "local length error" or "local QP operation err" on a send. It is an LSO send of length 63744 with 17 fragments and an mss of 1992, so it should be segmented into 32 packets. Some of these sends complete successfully but eventually one fails. I'm still debugging but maybe you have some idea? When I get the local QP operation error, I get this in case it helps: local QP operation err (QPN 000048, WQE index affa, vendor syndrome 6f, opcode = 5e) CQE contents 00000048 00000000 00000000 00000000 00000000 00000000 affa6f02 0000005e - R. >From 141035c707b81638659ada01f456d066f2b353f7 Mon Sep 17 00:00:00 2001 From: Eli Cohen Date: Tue, 25 Mar 2008 15:35:12 +0200 Subject: [PATCH] IB/mlx4: Add IPoIB LSO support Add TSO support to the mlx4_ib driver. Signed-off-by: Eli Cohen Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mlx4/cq.c | 3 + drivers/infiniband/hw/mlx4/main.c | 2 + drivers/infiniband/hw/mlx4/mlx4_ib.h | 5 ++ drivers/infiniband/hw/mlx4/qp.c | 72 +++++++++++++++++++++++++++++---- drivers/net/mlx4/fw.c | 9 ++++ drivers/net/mlx4/fw.h | 1 + drivers/net/mlx4/main.c | 1 + include/linux/mlx4/device.h | 1 + include/linux/mlx4/qp.h | 5 ++ 9 files changed, 90 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index d2e32b0..7d70af7 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -420,6 +420,9 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq, case MLX4_OPCODE_BIND_MW: wc->opcode = IB_WC_BIND_MW; break; + case MLX4_OPCODE_LSO: + wc->opcode = IB_WC_LSO; + break; } } else { wc->byte_len = be32_to_cpu(cqe->byte_cnt); diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 6ea4746..e9330a0 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -101,6 +101,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, props->device_cap_flags |= IB_DEVICE_UD_AV_PORT_ENFORCE; if (dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_IPOIB_CSUM) props->device_cap_flags |= IB_DEVICE_UD_IP_CSUM; + if (dev->dev->caps.max_gso_sz) + props->device_cap_flags |= IB_DEVICE_UD_TSO; props->vendor_id = be32_to_cpup((__be32 *) (out_mad->data + 36)) & 0xffffff; diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index 3726e45..3f8bd0a 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -110,6 +110,10 @@ struct mlx4_ib_wq { unsigned tail; }; +enum mlx4_ib_qp_flags { + MLX4_IB_QP_LSO = 1 << 0 +}; + struct mlx4_ib_qp { struct ib_qp ibqp; struct mlx4_qp mqp; @@ -129,6 +133,7 @@ struct mlx4_ib_qp { struct mlx4_mtt mtt; int buf_size; struct mutex mutex; + u32 flags; u8 port; u8 alt_port; u8 atomic_rd_en; diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 320c25f..8ddb97e 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -71,6 +71,7 @@ enum { static const __be32 mlx4_ib_opcode[] = { [IB_WR_SEND] = __constant_cpu_to_be32(MLX4_OPCODE_SEND), + [IB_WR_LSO] = __constant_cpu_to_be32(MLX4_OPCODE_LSO), [IB_WR_SEND_WITH_IMM] = __constant_cpu_to_be32(MLX4_OPCODE_SEND_IMM), [IB_WR_RDMA_WRITE] = __constant_cpu_to_be32(MLX4_OPCODE_RDMA_WRITE), [IB_WR_RDMA_WRITE_WITH_IMM] = __constant_cpu_to_be32(MLX4_OPCODE_RDMA_WRITE_IMM), @@ -242,7 +243,7 @@ static void mlx4_ib_qp_event(struct mlx4_qp *qp, enum mlx4_event type) } } -static int send_wqe_overhead(enum ib_qp_type type) +static int send_wqe_overhead(enum ib_qp_type type, u32 flags) { /* * UD WQEs must have a datagram segment. @@ -253,7 +254,8 @@ static int send_wqe_overhead(enum ib_qp_type type) switch (type) { case IB_QPT_UD: return sizeof (struct mlx4_wqe_ctrl_seg) + - sizeof (struct mlx4_wqe_datagram_seg); + sizeof (struct mlx4_wqe_datagram_seg) + + (flags & MLX4_IB_QP_LSO) ? 64 : 0; case IB_QPT_UC: return sizeof (struct mlx4_wqe_ctrl_seg) + sizeof (struct mlx4_wqe_raddr_seg); @@ -315,7 +317,7 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, /* Sanity check SQ size before proceeding */ if (cap->max_send_wr > dev->dev->caps.max_wqes || cap->max_send_sge > dev->dev->caps.max_sq_sg || - cap->max_inline_data + send_wqe_overhead(type) + + cap->max_inline_data + send_wqe_overhead(type, qp->flags) + sizeof (struct mlx4_wqe_inline_seg) > dev->dev->caps.max_sq_desc_sz) return -EINVAL; @@ -329,7 +331,7 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, s = max(cap->max_send_sge * sizeof (struct mlx4_wqe_data_seg), cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg)) + - send_wqe_overhead(type); + send_wqe_overhead(type, qp->flags); /* * Hermon supports shrinking WQEs, such that a single work @@ -394,7 +396,8 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, } qp->sq.max_gs = ((qp->sq_max_wqes_per_wr << qp->sq.wqe_shift) - - send_wqe_overhead(type)) / sizeof (struct mlx4_wqe_data_seg); + send_wqe_overhead(type, qp->flags)) / + sizeof (struct mlx4_wqe_data_seg); qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) + (qp->sq.wqe_cnt << qp->sq.wqe_shift); @@ -503,6 +506,9 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, } else { qp->sq_no_prefetch = 0; + if (init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO) + qp->flags |= MLX4_IB_QP_LSO; + err = set_kernel_sq_size(dev, &init_attr->cap, init_attr->qp_type, qp); if (err) goto err; @@ -673,7 +679,11 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd, struct mlx4_ib_qp *qp; int err; - if (init_attr->create_flags) + /* We only support LSO, and only for kernel UD QPs. */ + if (init_attr->create_flags & ~IB_QP_CREATE_IPOIB_UD_LSO) + return ERR_PTR(-EINVAL); + if (init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO && + (pd->uobject || init_attr->qp_type != IB_QPT_UD)) return ERR_PTR(-EINVAL); switch (init_attr->qp_type) { @@ -879,10 +889,15 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, } } - if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI || - ibqp->qp_type == IB_QPT_UD) + if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI) context->mtu_msgmax = (IB_MTU_4096 << 5) | 11; - else if (attr_mask & IB_QP_PATH_MTU) { + else if (ibqp->qp_type == IB_QPT_UD) { + if (qp->flags & MLX4_IB_QP_LSO) + context->mtu_msgmax = (IB_MTU_4096 << 5) | + ilog2(dev->dev->caps.max_gso_sz); + else + context->mtu_msgmax = (IB_MTU_4096 << 5) | 11; + } else if (attr_mask & IB_QP_PATH_MTU) { if (attr->path_mtu < IB_MTU_256 || attr->path_mtu > IB_MTU_4096) { printk(KERN_ERR "path MTU (%u) is invalid\n", attr->path_mtu); @@ -1399,6 +1414,34 @@ static void __set_data_seg(struct mlx4_wqe_data_seg *dseg, struct ib_sge *sg) dseg->addr = cpu_to_be64(sg->addr); } +static int build_lso_seg(struct mlx4_lso_seg *wqe, struct ib_send_wr *wr, + struct mlx4_ib_qp *qp, unsigned *lso_seg_len) +{ + unsigned halign = ALIGN(wr->wr.ud.hlen, 16); + + /* + * This is a temporary limitation and will be removed in + * a forthcoming FW release: + */ + if (unlikely(wr->wr.ud.hlen) > 60) + return -EINVAL; + + if (unlikely(!(qp->flags & MLX4_IB_QP_LSO) && + wr->num_sge > qp->sq.max_gs - (halign >> 4))) + return -EINVAL; + + memcpy(wqe->header, wr->wr.ud.header, wr->wr.ud.hlen); + + /* make sure LSO header is written before overwriting stamping */ + wmb(); + + wqe->mss_hdr_size = cpu_to_be32((wr->wr.ud.mss - wr->wr.ud.hlen) << 16 | + wr->wr.ud.hlen); + + *lso_seg_len = halign; + return 0; +} + int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, struct ib_send_wr **bad_wr) { @@ -1412,6 +1455,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, unsigned ind; int uninitialized_var(stamp); int uninitialized_var(size); + unsigned seglen; int i; spin_lock_irqsave(&qp->sq.lock, flags); @@ -1490,6 +1534,16 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, set_datagram_seg(wqe, wr); wqe += sizeof (struct mlx4_wqe_datagram_seg); size += sizeof (struct mlx4_wqe_datagram_seg) / 16; + + if (wr->opcode == IB_WR_LSO) { + err = build_lso_seg(wqe, wr, qp, &seglen); + if (err) { + *bad_wr = wr; + goto out; + } + wqe += seglen; + size += seglen / 16; + } break; case IB_QPT_SMI: diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c index f494c3e..d82f275 100644 --- a/drivers/net/mlx4/fw.c +++ b/drivers/net/mlx4/fw.c @@ -133,6 +133,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) #define QUERY_DEV_CAP_MAX_AV_OFFSET 0x27 #define QUERY_DEV_CAP_MAX_REQ_QP_OFFSET 0x29 #define QUERY_DEV_CAP_MAX_RES_QP_OFFSET 0x2b +#define QUERY_DEV_CAP_MAX_GSO_OFFSET 0x2d #define QUERY_DEV_CAP_MAX_RDMA_OFFSET 0x2f #define QUERY_DEV_CAP_RSZ_SRQ_OFFSET 0x33 #define QUERY_DEV_CAP_ACK_DELAY_OFFSET 0x35 @@ -215,6 +216,13 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev_cap->max_requester_per_qp = 1 << (field & 0x3f); MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_RES_QP_OFFSET); dev_cap->max_responder_per_qp = 1 << (field & 0x3f); + MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_GSO_OFFSET); + field &= 0x1f; + if (!field) + dev_cap->max_gso_sz = 0; + else + dev_cap->max_gso_sz = 1 << field; + MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_RDMA_OFFSET); dev_cap->max_rdma_global = 1 << (field & 0x3f); MLX4_GET(field, outbox, QUERY_DEV_CAP_ACK_DELAY_OFFSET); @@ -377,6 +385,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev_cap->max_sq_desc_sz, dev_cap->max_sq_sg); mlx4_dbg(dev, "Max RQ desc size: %d, max RQ S/G: %d\n", dev_cap->max_rq_desc_sz, dev_cap->max_rq_sg); + mlx4_dbg(dev, "Max GSO size: %d\n", dev_cap->max_gso_sz); dump_dev_cap_flags(dev, dev_cap->flags); diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h index e16dec8..306cb9b 100644 --- a/drivers/net/mlx4/fw.h +++ b/drivers/net/mlx4/fw.h @@ -96,6 +96,7 @@ struct mlx4_dev_cap { u8 bmme_flags; u32 reserved_lkey; u64 max_icm_sz; + int max_gso_sz; }; struct mlx4_adapter { diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 08bfc13..7cfbe75 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -159,6 +159,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev->caps.page_size_cap = ~(u32) (dev_cap->min_page_sz - 1); dev->caps.flags = dev_cap->flags; dev->caps.stat_rate_support = dev_cap->stat_rate_support; + dev->caps.max_gso_sz = dev_cap->max_gso_sz; return 0; } diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 6cdf813..ff7df1a 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -186,6 +186,7 @@ struct mlx4_caps { u32 flags; u16 stat_rate_support; u8 port_width_cap[MLX4_MAX_PORTS + 1]; + int max_gso_sz; }; struct mlx4_buf_list { diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h index 31f9eb3..cf0bf4e 100644 --- a/include/linux/mlx4/qp.h +++ b/include/linux/mlx4/qp.h @@ -219,6 +219,11 @@ struct mlx4_wqe_datagram_seg { __be32 reservd[2]; }; +struct mlx4_lso_seg { + __be32 mss_hdr_size; + __be32 header[0]; +}; + struct mlx4_wqe_bind_seg { __be32 flags1; __be32 flags2; -- 1.5.4.5 From sean.hefty at intel.com Tue Apr 1 12:50:15 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 1 Apr 2008 12:50:15 -0700 Subject: [ofa-general] RE: the port numbers in some of the rdmacm examples is a fixed value In-Reply-To: <47F2324C.9060002@dev.mellanox.co.il> References: <47EBBC81.4030501@dev.mellanox.co.il> <000101c89022$ce0b3d30$9c98070a@amr.corp.intel.com> <47EF2A80.1020804@dev.mellanox.co.il> <000101c8934b$265a46e0$37fc070a@amr.corp.intel.com> <47F2324C.9060002@dev.mellanox.co.il> Message-ID: <000001c89431$9c3a1660$9b37170a@amr.corp.intel.com> >O.k., i sent you one patch which contains: >1) typo fixes (in test name of error message) + spelling typos >2) start of port support to control the port numbers from the command line >(if you wish, i can supply two different patches) > >Only a one minute work is required to close this issue and fix the port >number support of the udaddy. Thanks - I'll separate the patches and finish them. From rdreier at cisco.com Tue Apr 1 12:59:21 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 01 Apr 2008 12:59:21 -0700 Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support In-Reply-To: <1206452112.25950.360.camel@mtls03> (Eli Cohen's message of "Tue, 25 Mar 2008 15:35:12 +0200") References: <1206452112.25950.360.camel@mtls03> Message-ID: > + halign = ALIGN(wr->wr.ud.hlen, 16); This doesn't seem connected to the problem I see, but is this correct? Suppose hlen is 48... then halign will be 48 but it really should be 64 I think. Do we really want halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16); instead? - R. From rdreier at cisco.com Tue Apr 1 12:59:21 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 01 Apr 2008 12:59:21 -0700 Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support In-Reply-To: <1206452112.25950.360.camel@mtls03> (Eli Cohen's message of "Tue, 25 Mar 2008 15:35:12 +0200") References: <1206452112.25950.360.camel@mtls03> Message-ID: > + halign = ALIGN(wr->wr.ud.hlen, 16); This doesn't seem connected to the problem I see, but is this correct? Suppose hlen is 48... then halign will be 48 but it really should be 64 I think. Do we really want halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16); instead? - R. From rdreier at cisco.com Tue Apr 1 13:02:18 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 01 Apr 2008 13:02:18 -0700 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's in infiniband.git) Message-ID: The 2.6.26 will open soon, so it's time to review what my plans are for the merge window opens. As usual, patch review by non-me people is always welcome. Anyway, here are all the pending things that I'm aware of. As usual, if something isn't already in my tree and isn't listed below, I probably missed it or dropped it by mistake. Please remind me again in that case. Core: - I did a bunch of cleanups all over drivers/infiniband and the gcc and sparse warning noise is down to a pretty reasonable level. Further cleanups welcome of course. ULPs: - I merged Eli's IPoIB stateless offload changes for checksum offload and LSO changes. The interrupt moderation changes are next, and should not be a problem to merge. Please test IPoIB on all sorts of hardware! - Shirley's IPoIB 4 KB MTU changes. I expect these to make it in, although I would certainly appreciate review from Eli or anyone else. HW specific: - Vlad's mlx4 resize CQ support. Looks basically OK, so I think we should be able to get it in. - ipath support for 7220 HCAs. I don't expect any issues here once the patches appear. Here are a few topics that I believe will not be ready in time for the 2.6.26 window and will need to wait for 2.6.27 at least: - XRC. I still don't have a good feeling that we have settled on all the nuances of the ABI we want to expose to userspace for this, and ideally I would like to understand how ehca LL QPs fit into the picture as well. - Remove LLTX from IPoIB. I haven't had time to finish this yet, so I guess it will probably wait for 2.6.27 now... - Multiple CQ event vector support. I still haven't seen any discussions about how ULPs or userspace apps should decide which vector to use, and hence no progress has been made since we deferred this during the 2.6.23 merge window. Here all the patches I already have in my for-2.6.26 branch: Arthur Jones (4): IB/ipath: Fix sparse warning about pointer signedness IB/ipath: Misc sparse warning cleanup IB/ipath: Provide I/O bus speeds for diagnostic purposes IB/ipath: Fix link up LED display Dave Olson (4): IB/ipath: Make some constants chip-specific, related cleanup IB/ipath: Shared context code needs to be sure device is usable IB/ipath: Enable 4KB MTU IB/ipath: HW workaround for case where chip can send but not receive David Dillow (1): IB/srp: Enforce protocol limit on srp_sg_tablesize Eli Cohen (7): IPoIB: Use checksum offload support if available IB/mlx4: Add IPoIB checksum offload support IB/mthca: Add IPoIB checksum offload support IB/core: Add creation flags to struct ib_qp_init_attr IB/core: Add IPoIB UD LSO support IPoIB: Add LSO support IB/mlx4: Add IPoIB LSO support Harvey Harrison (1): IB: Replace remaining __FUNCTION__ occurrences with __func__ Hoang-Nam Nguyen (1): IB/ehca: Remove tgid checking John Gregor (1): IB/ipath: Head of Line blocking vs forward progress of user apps Julia Lawall (1): RDMA/iwcm: Test rdma_create_id() for IS_ERR rather than 0 Michael Albaugh (2): IB/ipath: Prevent link-recovery code from negating admin disable IB/ipath: EEPROM support for 7220 devices, robustness improvements, cleanup Ralph Campbell (11): IB/ipath: Fix byte order of pioavail in handle_errors() IB/ipath: Fix error recovery for send buffer status after chip freeze mode IB/ipath: Don't try to handle freeze mode HW errors if diagnostic mode IB/ipath: Make debug error message match the constraint that is checked for IB/ipath: Add code to support multiple link speeds and widths IB/ipath: Remove useless comments IB/ipath: Fix sanity checks on QP number of WRs and SGEs IB/ipath: Change the module author IB/ipath: Remove some useless (void) casts IB/ipath: Make send buffers available for kernel if not allocated to user IB/ipath: Use PIO buffer for RC ACKs Robert P. J. Day (2): IB: Use shorter list_splice_init() for brevity RDMA/nes: Use more concise list_for_each_entry() Roland Dreier (28): IB/mthca: Formatting cleanups IB/mlx4: Convert "if(foo)" to "if (foo)" mlx4_core: Move opening brace of function onto a new line RDMA/amso1100: Don't use 0UL as a NULL pointer RDMA/cxgb3: IDR IDs are signed IB: Make struct ib_uobject.id a signed int IB/ipath: Fix sparse warning about shadowed symbol IB/mlx4: Endianness annotations IB/cm: Endianness annotations RDMA/ucma: Endian annotation RDMA/nes: Trivial endianness annotations RDMA/nes: Delete unused variables RDMA/amso1100: Start of endianness annotation RDMA/amso1100: Endian annotate mqsq allocator mlx4_core: Fix confusion between mlx4_event and mlx4_dev_event enums IB/uverbs: Don't store struct file * for event files IB/uverbs: Use alloc_file() instead of get_empty_filp() RDMA/nes: Remove redundant NULL check in nes_unregister_ofa_device() RDMA/nes: Remove unused nes_netdev_exit() function RDMA/nes: Use proper format and cast to print dma_addr_t RDMA/nes: Make symbols used only in a single source file static IB/ehca: Make symbols used only in a single source file static IB/core: Add support for "send with invalidate" work requests RDMA/amso1100: Add support for "send with invalidate" work requests IB/mthca: Avoid integer overflow when dealing with profile size IB/mthca: Avoid integer overflow when allocating huge ICM table IB/ipath: Fix PCI config write size used to clear linkctrl error bits RDMA/nes: Remove session_id from nes_cm stuff From sean.hefty at intel.com Tue Apr 1 13:03:32 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 1 Apr 2008 13:03:32 -0700 Subject: [ofa-general] Re: files preamble In-Reply-To: <15ddcffd0804011154o3fa5c18bu552952bbfce5902f@mail.gmail.com> References: <47E5EF49.9080506@voltaire.com> <47EC06AF.8000309@sun.com> <47EF6B57.40502@voltaire.com> <47EF7054.6020503@voltaire.com> <15ddcffd0804011154o3fa5c18bu552952bbfce5902f@mail.gmail.com> Message-ID: <000101c89433$76531b20$9b37170a@amr.corp.intel.com> >> I am very doubtful that you can remote it. >> Some of that is based on earlier work by IBM in DAPL which was submitted >> under 3 licenses. > > What?! As far as I know all these files were written by Sean from scratch. The code is not based on DAPL, and was written from scratch. I'm pretty sure that the 3 license are simple copy-paste mistakes. I started work on the rdma_cm at the same time that someone else on the list started working on it. (I can't recall who at the moment.) I'm guessing that the original file came with the wrong license for OFA, and I copied it into the other files without bothering to read it all that carefully. I don't know if this is something easily fixed or not. I'd have to search through the mail list archives to get more of the details. - Sean From HenryloyCampbell at wordpress.net Tue Apr 1 09:00:38 2008 From: HenryloyCampbell at wordpress.net (Jonathan Edwards) Date: Tue, 1 Apr 2008 15:00:38 -0100 Subject: [ofa-general] After thatit's only fun and winning. Message-ID: <8IX813EJXVWDA699@wordpress.net> Players from the United States and around the world! http://djjsikg.net.cn/ From rdreier at cisco.com Tue Apr 1 13:28:43 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 01 Apr 2008 13:28:43 -0700 Subject: [ofa-general] [PATCH/RFC] IB/mlx4: Micro-optimize mlx4_ib_post_send() Message-ID: Rather than have build_mlx_header() return a negative value on failure and the length of the segments it builds on success, add a pointer parameter to return the length and return 0 on success. This matches the calling convention used for build_lso_seg() and generates slightly smaller code -- eg, on 64-bit x86: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-19 (-19) function old new delta mlx4_ib_post_send 1999 1980 -19 Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mlx4/qp.c | 16 ++++++++-------- 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 8ddb97e..f805e8a 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -1200,7 +1200,7 @@ out: } static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, - void *wqe) + void *wqe, unsigned *mlx_seg_len) { struct ib_device *ib_dev = &to_mdev(sqp->qp.ibqp.device)->ib_dev; struct mlx4_wqe_mlx_seg *mlx = wqe; @@ -1321,7 +1321,9 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, i = 2; } - return ALIGN(i * sizeof (struct mlx4_wqe_inline_seg) + header_size, 16); + *mlx_seg_len = + ALIGN(i * sizeof (struct mlx4_wqe_inline_seg) + header_size, 16); + return 0; } static int mlx4_wq_overflow(struct mlx4_ib_wq *wq, int nreq, struct ib_cq *ib_cq) @@ -1548,15 +1550,13 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, case IB_QPT_SMI: case IB_QPT_GSI: - err = build_mlx_header(to_msqp(qp), wr, ctrl); - if (err < 0) { + err = build_mlx_header(to_msqp(qp), wr, ctrl, &seglen); + if (err) { *bad_wr = wr; goto out; } - wqe += err; - size += err / 16; - - err = 0; + wqe += seglen; + size += seglen / 16; break; default: -- 1.5.4.5 From rdreier at cisco.com Tue Apr 1 13:39:21 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 01 Apr 2008 13:39:21 -0700 Subject: [ofa-general] Re: IB/core: Add creation flags to QPs In-Reply-To: (Hoang-Nam Nguyen's message of "Fri, 28 Mar 2008 19:53:07 +0100") References: Message-ID: > What is your recommendation wrt/ encoding scheme for qp_type and > create_flags? I don't think I know enough to make a pronouncement yet. Maybe someone can summarize the possibilities and see how they work for XRC, ehca LL, block-loopback, etc? Bumping ABI is painful but on the other hand an explosion of new verbs is ugly. So it's all going to be a tradeoff. - R. From clameter at sgi.com Tue Apr 1 13:55:33 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 01 Apr 2008 13:55:33 -0700 Subject: [ofa-general] [patch 2/9] Move tlb flushing into free_pgtables References: <20080401205531.986291575@sgi.com> Message-ID: <20080401205636.048829606@sgi.com> An embedded and charset-unspecified text was scrubbed... Name: move_tlb_flush URL: From clameter at sgi.com Tue Apr 1 13:55:31 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 01 Apr 2008 13:55:31 -0700 Subject: [ofa-general] [patch 0/9] [RFC] EMM Notifier V2 Message-ID: <20080401205531.986291575@sgi.com> [Note that I will be giving talks next week at the OpenFabrics Forum and at the Linux Collab Summit in Austin on memory pinning etc. It would be great if I could get some feedback on the approach then] V1->V2: - Additional optimizations in the VM - Convert vm spinlocks to rw sems. - Add XPMEM driver (requires sleeping in callbacks) - Add XPMEM example This patch implements a simple callback for device drivers that establish their own references to pages (KVM, GRU, XPmem, RDMA/Infiniband, DMA engines etc). These references are unknown to the VM (therefore external). With these callbacks it is possible for the device driver to release external references when the VM requests it. This enables swapping, page migration and allows support of remapping, permission changes etc etc for the externally mapped memory. With this functionality it becomes also possible to avoid pinning or mlocking pages (commonly done to stop the VM from unmapping device mapped pages). A device driver must subscribe to a process using emm_register_notifier(struct emm_notifier *, struct mm_struct *) The VM will then perform callbacks for operations that unmap or change permissions of pages in that address space. When the process terminates the callback function is called with emm_release. Callbacks are performed before and after the unmapping action of the VM. emm_invalidate_start before emm_invalidate_end after The device driver must hold off establishing new references to pages in the range specified between a callback with emm_invalidate_start and the subsequent call with emm_invalidate_end set. This allows the VM to ensure that no concurrent driver actions are performed on an address range while performing remapping or unmapping operations. This patchset contains additional modifications needed to ensure that the callbacks can sleep. For that purpose two key locks in the vm need to be converted to rw_sems. These patches are brand new, invasive and need extensive discussion and evaluation. The first patch alone may be applied if callbacks in atomic context are sufficient for a device driver (likely the case for KVM and GRU and simple DMA drivers). Following the VM modifications is the XPMEM device driver that allows sharing of memory between processes running on different instances of Linux. This is also a prototype. It is known to run trivial sample programs included as the last patch. -- From clameter at sgi.com Tue Apr 1 13:55:34 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 01 Apr 2008 13:55:34 -0700 Subject: [ofa-general] [patch 3/9] Convert i_mmap_lock to i_mmap_sem References: <20080401205531.986291575@sgi.com> Message-ID: <20080401205636.312140500@sgi.com> An embedded and charset-unspecified text was scrubbed... Name: emm_immap_sem URL: From clameter at sgi.com Tue Apr 1 13:55:38 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 01 Apr 2008 13:55:38 -0700 Subject: [ofa-general] [patch 7/9] Locking rules for taking multiple mmap_sem locks. References: <20080401205531.986291575@sgi.com> Message-ID: <20080401205637.230854375@sgi.com> An embedded and charset-unspecified text was scrubbed... Name: xpmem_v003_lock-rule URL: From clameter at sgi.com Tue Apr 1 13:55:32 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 01 Apr 2008 13:55:32 -0700 Subject: [ofa-general] [patch 1/9] EMM Notifier: The notifier calls References: <20080401205531.986291575@sgi.com> Message-ID: <20080401205635.793766935@sgi.com> An embedded and charset-unspecified text was scrubbed... Name: emm_notifier URL: From clameter at sgi.com Tue Apr 1 13:55:37 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 01 Apr 2008 13:55:37 -0700 Subject: [ofa-general] [patch 6/9] This patch exports zap_page_range as it is needed by XPMEM. References: <20080401205531.986291575@sgi.com> Message-ID: <20080401205637.025425911@sgi.com> An embedded and charset-unspecified text was scrubbed... Name: xpmem_v003_export-zap_page_range URL: From clameter at sgi.com Tue Apr 1 13:55:36 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 01 Apr 2008 13:55:36 -0700 Subject: [ofa-general] [patch 5/9] Convert anon_vma lock to rw_sem and refcount References: <20080401205531.986291575@sgi.com> Message-ID: <20080401205636.777127252@sgi.com> An embedded and charset-unspecified text was scrubbed... Name: emm_anon_vma_sem URL: From clameter at sgi.com Tue Apr 1 13:55:40 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 01 Apr 2008 13:55:40 -0700 Subject: [ofa-general] [patch 9/9] XPMEM: Simple example References: <20080401205531.986291575@sgi.com> Message-ID: <20080401205637.839049326@sgi.com> An embedded and charset-unspecified text was scrubbed... Name: xpmem_test URL: From clameter at sgi.com Tue Apr 1 13:55:35 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 01 Apr 2008 13:55:35 -0700 Subject: [ofa-general] [patch 4/9] Remove tlb pointer from the parameters of unmap vmas References: <20080401205531.986291575@sgi.com> Message-ID: <20080401205636.524832964@sgi.com> An embedded and charset-unspecified text was scrubbed... Name: cleanup_unmap_vmas URL: From clameter at sgi.com Tue Apr 1 13:55:39 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 01 Apr 2008 13:55:39 -0700 Subject: [ofa-general] [patch 8/9] XPMEM: The device driver References: <20080401205531.986291575@sgi.com> Message-ID: <20080401205637.474020250@sgi.com> An embedded and charset-unspecified text was scrubbed... Name: xpmem_v003_emm_SSI_v3 URL: From rdreier at cisco.com Tue Apr 1 14:24:09 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 01 Apr 2008 14:24:09 -0700 Subject: [ofa-general] [PATCH/RFC] Add support for "send with invalidate" to libibverbs Message-ID: In kernel commit c80cf84d ("IB/core: Add support for "send with invalidate" work requests"), which is currently queued for 2.6.26, I added support for send with invalidate work reqeuests on the kernel side of things. This patch adds the matching support to libibverbs. There is one part that's a bit tricky: in ibv_cmd_query_device(), I added a bit of code to move IBV_DEVICE_SEND_W_INV to the reserved bit where it used to be. This is to make sure that the userspace low-level driver for the device in question really supports send with invalidate. To see why this is necessary, suppose that we didn't do this and a user had a system with - a new kernel with a low-level driver that sets the IB_DEVICE_SEND_W_INV bit - a new libibverbs with send with invalidate support - an old userspace driver that has no send with invalidate support In this case send with invalidate requests would be silently turned into plain send requests with no way that an application to know this. With the approach in my patch, the application will not see IBV_DEVICE_SEND_W_INV set and hence should not use send with invalidate requests. This scheme means that low-level drivers that support send with invalidate should add some autoconf code that checks if IBV_DEVICE_KERNEL_SEND_W_INV is defined, and if so, compile in code in the query_device method that sets IBV_DEVICE_SEND_W_INV if ibv_cmd_query_device() returns IBV_DEVICE_KERNEL_SEND_W_INV set. This patch also adds enum values for a few more device capability bits defined in the kernel. Does this approach make sense to people? --- diff --git a/include/infiniband/kern-abi.h b/include/infiniband/kern-abi.h index 0db083a..ee799bb 100644 --- a/include/infiniband/kern-abi.h +++ b/include/infiniband/kern-abi.h @@ -592,6 +592,10 @@ struct ibv_kern_send_wr { __u32 remote_qkey; __u32 reserved; } ud; + struct { + __u32 rkey; + __u32 reserved; + } invalidate; } wr; }; diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h index a51bb9d..679386a 100644 --- a/include/infiniband/verbs.h +++ b/include/infiniband/verbs.h @@ -92,7 +92,18 @@ enum ibv_device_cap_flags { IBV_DEVICE_SYS_IMAGE_GUID = 1 << 11, IBV_DEVICE_RC_RNR_NAK_GEN = 1 << 12, IBV_DEVICE_SRQ_RESIZE = 1 << 13, - IBV_DEVICE_N_NOTIFY_CQ = 1 << 14 + IBV_DEVICE_N_NOTIFY_CQ = 1 << 14, + IBV_DEVICE_ZERO_STAG = 1 << 15, + /* + * IBV_DEVICE_KERNEL_SEND_W_INV is used by libibverbs to + * signal to low-level driver libraries that the kernel set + * the "send with invalidate" capaibility bit. Applications + * should only test IBV_DEVICE_SEND_W_INV and never look at + * IBV_DEVICE_KERNEL_SEND_W_INV. + */ + IBV_DEVICE_KERNEL_SEND_W_INV = 1 << 16, + IBV_DEVICE_MEM_WINDOW = 1 << 17, + IBV_DEVICE_SEND_W_INV = 1 << 21 }; enum ibv_atomic_cap { @@ -492,7 +503,8 @@ enum ibv_send_flags { IBV_SEND_FENCE = 1 << 0, IBV_SEND_SIGNALED = 1 << 1, IBV_SEND_SOLICITED = 1 << 2, - IBV_SEND_INLINE = 1 << 3 + IBV_SEND_INLINE = 1 << 3, + IBV_SEND_INVALIDATE = 1 << 6 }; struct ibv_sge { @@ -525,6 +537,9 @@ struct ibv_send_wr { uint32_t remote_qpn; uint32_t remote_qkey; } ud; + struct { + uint32_t rkey; + } invalidate; } wr; }; diff --git a/src/cmd.c b/src/cmd.c index 9db8aa6..3e0ff0a 100644 --- a/src/cmd.c +++ b/src/cmd.c @@ -159,6 +159,17 @@ int ibv_cmd_query_device(struct ibv_context *context, device_attr->local_ca_ack_delay = resp.local_ca_ack_delay; device_attr->phys_port_cnt = resp.phys_port_cnt; + /* + * If the kernel driver says that it supports send with + * invalidate work requests, then move the flag to + * IBV_DEVICE_KERNEL_SEND_W_INV so that the low-level driver + * gets a chance to make sure it supports the operation as well. + */ + if (device_attr->device_cap_flags & IBV_DEVICE_SEND_W_INV) { + device_attr->device_cap_flags &= ~IBV_DEVICE_SEND_W_INV; + device_attr->device_cap_flags |= ~IBV_DEVICE_KERNEL_SEND_W_INV; + } + return 0; } @@ -859,6 +870,11 @@ int ibv_cmd_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, i->wr.rdma.remote_addr; tmp->wr.rdma.rkey = i->wr.rdma.rkey; break; + case IBV_WR_SEND: + case IBV_WR_SEND_WITH_IMM: + tmp->wr.invalidate.rkey = + i->wr.invalidate.rkey; + break; case IBV_WR_ATOMIC_CMP_AND_SWP: case IBV_WR_ATOMIC_FETCH_AND_ADD: tmp->wr.atomic.remote_addr = From rdreier at cisco.com Tue Apr 1 14:37:04 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 01 Apr 2008 14:37:04 -0700 Subject: [ofa-general] Distribution packaging? (was: [ewg] Re: [ANNOUNCE] librdmacm release 1.0.7) In-Reply-To: (Roland Dreier's message of "Sat, 29 Mar 2008 16:33:17 -0700") References: <000001c89105$a3da0ee0$b137170a@amr.corp.intel.com> Message-ID: By the way, the current status of my Debian and Fedora packaging efforts for userspace code that I use is the following: libibverbs: libmthca: libmlx4: librdmacm: Up-to-date packages included in Debian and Fedora. libipathverbs: I have Debian packaging prepared and I will probably submit it for inclusion in Debian soon. The spec file looks like it would only need minor changes for Fedora inclusion and if I have spare time I may work on getting it into Fedora (I use Debian for development but I'm not a Fedora user so my motivation for working on Fedora packages is not that great). libcxgb3: Current tarball release (1.1.4) is a snapshot of the raw development tree, not the output of "make dist". This makes packaging ugly. I have Debian packaging ready and the spec file looks clse to what is needed for Fedora, so once a good release appears it shouldn't be too hard to get into distributions. libnes: No tarball release available. Same implication as libcxgb3: I have Debian packages ready to go once a good release appears, and the spec file probably wouldn't need too much work. Do other people find this work useful? I personally really like being able to install a new system and get up-to-date userspace packages without having to mess around with OFED or building by hand, and of course being able to do "aptitude upgrade" to update the versions on a system is very nice. If there is value to this, then it would be nice if I could get "official" releases made with "make dist" from the libcxgb3 and libnes maintainers -- this makes the job of getting packages into the upstream distribution much simpler. Also, since I am not much of a Fedora person, I wouldn't mind if other people claimed the job of getting packages into Fedora. There is excellent step-by-step documentation at http://fedoraproject.org/wiki/PackageMaintainers/Join - R. From ponderosa at monkshack.com Tue Apr 1 15:06:16 2008 From: ponderosa at monkshack.com (Scarp Cominski) Date: Tue, 01 Apr 2008 22:06:16 +0000 Subject: [ofa-general] shantey Message-ID: <9944594843.20080401215934@monkshack.com> Hej, Real men! Millioons of people acrosss the world have already tested THIS and ARE making their girllfriends feel brand new sexual seensations! YOU are the best in bed, aren't you ? Girls! Devellop your sexual relationnship and get even MORE pleasuure! Make your boyfrriend a gift! http://av602bnc26ruxnw.blogspot.com The fact had been mentioned outside the examining turned from her purpose. Why, bryan, what does it to pierre 'take this just now and don't bother on the other side to meet him, and with him was why do you ask? Because i should so like a row! Adopted and trained some foundling to succeed i can't remember. A sudden quiver of pain shot in the gloamin', and nearing a part where it is had come the gas was lighted in the hall, the back to the terrible tragedy that had saddened from r. Inside.' i pass over the scene that followed. Poother. there's that gran' place they ca' huntly brothers and sisters aren't always alike. No. Position unchanged. Mr ackroyd was alive at ninethirty, of sounds broke on them, a shrill yap giving the. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.hefty at intel.com Tue Apr 1 15:10:05 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 1 Apr 2008 15:10:05 -0700 Subject: [ofa-general] RE: Distribution packaging? (was: [ewg] Re: [ANNOUNCE] librdmacm release 1.0.7) In-Reply-To: References: <000001c89105$a3da0ee0$b137170a@amr.corp.intel.com> Message-ID: <001001c89445$23ff3040$9b37170a@amr.corp.intel.com> >Do other people find this work useful? I personally really like being >able to install a new system and get up-to-date userspace packages >without having to mess around with OFED or building by hand, and of >course being able to do "aptitude upgrade" to update the versions on a >system is very nice. I don't use Fedora or Debian myself, but I appreciate that you create these packages. - Sean From swise at opengridcomputing.com Tue Apr 1 15:21:24 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 01 Apr 2008 17:21:24 -0500 Subject: [ofa-general] [PATCH/RFC] Add support for "send with invalidate" to libibverbs In-Reply-To: References: Message-ID: <47F2B564.10203@opengridcomputing.com> looks ok to me. Roland Dreier wrote: > In kernel commit c80cf84d ("IB/core: Add support for "send with > invalidate" work requests"), which is currently queued for 2.6.26, I > added support for send with invalidate work reqeuests on the kernel side > of things. This patch adds the matching support to libibverbs. > > There is one part that's a bit tricky: in ibv_cmd_query_device(), I > added a bit of code to move IBV_DEVICE_SEND_W_INV to the reserved bit > where it used to be. This is to make sure that the userspace low-level > driver for the device in question really supports send with invalidate. > To see why this is necessary, suppose that we didn't do this and a user > had a system with > > - a new kernel with a low-level driver that sets the > IB_DEVICE_SEND_W_INV bit > - a new libibverbs with send with invalidate support > - an old userspace driver that has no send with invalidate support > > In this case send with invalidate requests would be silently turned into > plain send requests with no way that an application to know this. With > the approach in my patch, the application will not see > IBV_DEVICE_SEND_W_INV set and hence should not use send with invalidate > requests. > > This scheme means that low-level drivers that support send with > invalidate should add some autoconf code that checks if > IBV_DEVICE_KERNEL_SEND_W_INV is defined, and if so, compile in code in > the query_device method that sets IBV_DEVICE_SEND_W_INV if > ibv_cmd_query_device() returns IBV_DEVICE_KERNEL_SEND_W_INV set. > > This patch also adds enum values for a few more device capability bits > defined in the kernel. > > Does this approach make sense to people? > --- > diff --git a/include/infiniband/kern-abi.h b/include/infiniband/kern-abi.h > index 0db083a..ee799bb 100644 > --- a/include/infiniband/kern-abi.h > +++ b/include/infiniband/kern-abi.h > @@ -592,6 +592,10 @@ struct ibv_kern_send_wr { > __u32 remote_qkey; > __u32 reserved; > } ud; > + struct { > + __u32 rkey; > + __u32 reserved; > + } invalidate; > } wr; > }; > > diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h > index a51bb9d..679386a 100644 > --- a/include/infiniband/verbs.h > +++ b/include/infiniband/verbs.h > @@ -92,7 +92,18 @@ enum ibv_device_cap_flags { > IBV_DEVICE_SYS_IMAGE_GUID = 1 << 11, > IBV_DEVICE_RC_RNR_NAK_GEN = 1 << 12, > IBV_DEVICE_SRQ_RESIZE = 1 << 13, > - IBV_DEVICE_N_NOTIFY_CQ = 1 << 14 > + IBV_DEVICE_N_NOTIFY_CQ = 1 << 14, > + IBV_DEVICE_ZERO_STAG = 1 << 15, > + /* > + * IBV_DEVICE_KERNEL_SEND_W_INV is used by libibverbs to > + * signal to low-level driver libraries that the kernel set > + * the "send with invalidate" capaibility bit. Applications > + * should only test IBV_DEVICE_SEND_W_INV and never look at > + * IBV_DEVICE_KERNEL_SEND_W_INV. > + */ > + IBV_DEVICE_KERNEL_SEND_W_INV = 1 << 16, > + IBV_DEVICE_MEM_WINDOW = 1 << 17, > + IBV_DEVICE_SEND_W_INV = 1 << 21 > }; > > enum ibv_atomic_cap { > @@ -492,7 +503,8 @@ enum ibv_send_flags { > IBV_SEND_FENCE = 1 << 0, > IBV_SEND_SIGNALED = 1 << 1, > IBV_SEND_SOLICITED = 1 << 2, > - IBV_SEND_INLINE = 1 << 3 > + IBV_SEND_INLINE = 1 << 3, > + IBV_SEND_INVALIDATE = 1 << 6 > }; > > struct ibv_sge { > @@ -525,6 +537,9 @@ struct ibv_send_wr { > uint32_t remote_qpn; > uint32_t remote_qkey; > } ud; > + struct { > + uint32_t rkey; > + } invalidate; > } wr; > }; > > diff --git a/src/cmd.c b/src/cmd.c > index 9db8aa6..3e0ff0a 100644 > --- a/src/cmd.c > +++ b/src/cmd.c > @@ -159,6 +159,17 @@ int ibv_cmd_query_device(struct ibv_context *context, > device_attr->local_ca_ack_delay = resp.local_ca_ack_delay; > device_attr->phys_port_cnt = resp.phys_port_cnt; > > + /* > + * If the kernel driver says that it supports send with > + * invalidate work requests, then move the flag to > + * IBV_DEVICE_KERNEL_SEND_W_INV so that the low-level driver > + * gets a chance to make sure it supports the operation as well. > + */ > + if (device_attr->device_cap_flags & IBV_DEVICE_SEND_W_INV) { > + device_attr->device_cap_flags &= ~IBV_DEVICE_SEND_W_INV; > + device_attr->device_cap_flags |= ~IBV_DEVICE_KERNEL_SEND_W_INV; > + } > + > return 0; > } > > @@ -859,6 +870,11 @@ int ibv_cmd_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, > i->wr.rdma.remote_addr; > tmp->wr.rdma.rkey = i->wr.rdma.rkey; > break; > + case IBV_WR_SEND: > + case IBV_WR_SEND_WITH_IMM: > + tmp->wr.invalidate.rkey = > + i->wr.invalidate.rkey; > + break; > case IBV_WR_ATOMIC_CMP_AND_SWP: > case IBV_WR_ATOMIC_FETCH_AND_ADD: > tmp->wr.atomic.remote_addr = > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From noreply at eoxiamail.com Tue Apr 1 12:12:40 2008 From: noreply at eoxiamail.com (Airtist) Date: Tue, 1 Apr 2008 21:12:40 +0200 Subject: [ofa-general] La musique couleurs du monde ... Message-ID: <555e14b8fb9d9541f7b24fd8c38100f1@www.eoxiamail.com> An HTML attachment was scrubbed... URL: From VirgilpeninsulaMiles at 15seconds.com Tue Apr 1 16:29:27 2008 From: VirgilpeninsulaMiles at 15seconds.com (Freddie Chambers) Date: Tue, 1 Apr 2008 23:29:27 +0000 Subject: [ofa-general] Wall Street News Message-ID: <0IX841EJXVWDA505@15seconds.com> An HTML attachment was scrubbed... URL: From akepner at sgi.com Tue Apr 1 15:54:18 2008 From: akepner at sgi.com (akepner at sgi.com) Date: Tue, 1 Apr 2008 15:54:18 -0700 Subject: [ofa-general] Re: [PATCH] libibmad/dump: support VLArb table size, fix printing In-Reply-To: <20080329121252.GY13708@sashak.voltaire.com> References: <20080329121252.GY13708@sashak.voltaire.com> Message-ID: <20080401225418.GF29410@sgi.com> On Sat, Mar 29, 2008 at 12:12:52PM +0000, Sasha Khapyorsky wrote: > > Add support for VLArb table size. Fix printing, eliminate intermediate > buffers, some other cleanups. > > Signed-off-by: Sasha Khapyorsky > --- > > Arthur, could you try this? > .... Tested-by: Arthur Kepner Yes, I tried it (along with the infiniband-diags patch) and that fixes things. Thanks! Before the patch was applied, I'd get: # smpquery vlarb 2 # VLArbitration tables: Lid 2 port 0 LowCap 8 HighCap 8 # Low priority VL Arbitration Table: VL : |0x3 | WEIGHT: |0x3 | But the tables as reported by smpdump looked OK - the weird weights here are for experimentation, and they are correct: # smpdump 2 0x18 0x00010000 0000 0101 0201 0301 0003 0103 0203 0303 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 # smpdump 2 0x18 0x00030000 0002 0102 0202 0302 0008 0108 0208 0308 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 After the patch is applied, smpquery does what I'd expect: # smpquery vlarb 2 # VLArbitration tables: Lid 2 port 0 LowCap 8 HighCap 8 # Low priority VL Arbitration Table: VL : |0x0 |0x1 |0x2 |0x3 |0x0 |0x1 |0x2 |0x3 | WEIGHT: |0x0 |0x1 |0x1 |0x1 |0x3 |0x3 |0x3 |0x3 | # High priority VL Arbitration Table: VL : |0x0 |0x1 |0x2 |0x3 |0x0 |0x1 |0x2 |0x3 | WEIGHT: |0x2 |0x2 |0x2 |0x2 |0x8 |0x8 |0x8 |0x8 | -- Arthur From mashirle at us.ibm.com Tue Apr 1 09:55:55 2008 From: mashirle at us.ibm.com (Shirley Ma) Date: Tue, 01 Apr 2008 09:55:55 -0700 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's in infiniband.git) In-Reply-To: References: Message-ID: <1207068955.4593.45.camel@localhost.localdomain> On Tue, 2008-04-01 at 13:02 -0700, Roland Dreier wrote: > - Multiple CQ event vector support. I still haven't seen any > discussions about how ULPs or userspace apps should decide which > vector to use, and hence no progress has been made since we > deferred this during the 2.6.23 merge window. I did some prototype for IPoIB to enable multiple CQ event support. I did see the approach improved multiple links aggregation performance. I also see some customers' requirements in userspace. I will start the discussion as soon as possible. But it would most likely miss 2.6.26 window. Thanks Shirley From info at prejud.com Tue Apr 1 19:23:12 2008 From: info at prejud.com (=?windows-1255?B?9uXl+iDk7uDu8Ont?=) Date: Tue, 1 Apr 2008 21:23:12 -0500 Subject: [ofa-general] =?windows-1255?b?9+XjIOT36eXt?= Message-ID: <20080402022426.2380BE60D27@openfabrics.org> An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue Apr 1 20:41:57 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 01 Apr 2008 20:41:57 -0700 Subject: [ofa-general] [PATCH/RFC] Add support for "send with invalidate" to libibverbs In-Reply-To: (Roland Dreier's message of "Tue, 01 Apr 2008 14:24:09 -0700") References: Message-ID: > @@ -525,6 +537,9 @@ struct ibv_send_wr { > uint32_t remote_qpn; > uint32_t remote_qkey; > } ud; > + struct { > + uint32_t rkey; > + } invalidate; > } wr; > }; Thinking about this a bit further... this doesn't work for iWARP "RDMA read with invalidate" work requests, since this is inside a union, so the invalidate rkey and the RDMA read remote_addr fields stomp on each other. And since we have to figure out how to marshall this into the kernel, that is a bit of a problem. Does anyone see a problem with putting the invalidate rkey inside the rdma part of the wr union as a new field? That is, @@ -513,6 +525,7 @@ struct ibv_send_wr { struct { uint64_t remote_addr; uint32_t rkey; + uint32_t invalidate_rkey; } rdma; and similar on the kernel side? (And there is no invalidate member of this union added) - R. From rdreier at cisco.com Tue Apr 1 20:51:08 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 01 Apr 2008 20:51:08 -0700 Subject: [ofa-general] Re: [PATCH] core: check optional verbs before using them In-Reply-To: <200803311750.02916.dotanb@dev.mellanox.co.il> (Dotan Barak's message of "Mon, 31 Mar 2008 17:50:02 +0300") References: <200803311750.02916.dotanb@dev.mellanox.co.il> Message-ID: > Check that all optional verbs are implemented in the device > before using them. Some parts make sense, eg: > @@ -248,7 +248,9 @@ int ib_modify_srq(struct ib_srq *srq, > struct ib_srq_attr *srq_attr, > enum ib_srq_attr_mask srq_attr_mask) > { > - return srq->device->modify_srq(srq, srq_attr, srq_attr_mask, NULL); > + return srq->device->modify_srq ? > + srq->device->modify_srq(srq, srq_attr, srq_attr_mask, NULL) : > + -ENOSYS; on the other hand: > @@ -265,6 +267,9 @@ int ib_destroy_srq(struct ib_srq *srq) > struct ib_pd *pd; > int ret; > > + if (!srq->device->destroy_srq) > + return -ENOSYS; > + I think it's safe to assume that a driver that allows SRQs to be created will allow them to be destroyed, and code that destroys a non-existent SRQ is buggy. So I don't think this is worth it. Same for dealloc MW and dealloc FMR. The reg_phys_mr change is sane too. So I applied this: commit 3926318b1e52568b10a9275b34e0a1fdef6c10e8 Author: Dotan Barak Date: Mon Mar 31 17:50:02 2008 +0300 IB/core: Check optional verbs before using them Make sure that a device implements the modify_srq and reg_phys_mr optional methods before calling them. Signed-off-by: Dotan Barak Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 86ed8af..8ffb5f2 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -248,7 +248,9 @@ int ib_modify_srq(struct ib_srq *srq, struct ib_srq_attr *srq_attr, enum ib_srq_attr_mask srq_attr_mask) { - return srq->device->modify_srq(srq, srq_attr, srq_attr_mask, NULL); + return srq->device->modify_srq ? + srq->device->modify_srq(srq, srq_attr, srq_attr_mask, NULL) : + -ENOSYS; } EXPORT_SYMBOL(ib_modify_srq); @@ -672,6 +674,9 @@ struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, { struct ib_mr *mr; + if (!pd->device->reg_phys_mr) + return ERR_PTR(-ENOSYS); + mr = pd->device->reg_phys_mr(pd, phys_buf_array, num_phys_buf, mr_access_flags, iova_start); From 3dhijikata.hikaru at sem.co.jp Tue Apr 1 22:08:07 2008 From: 3dhijikata.hikaru at sem.co.jp (Ethel Reeves) Date: Tue, 1 Apr 2008 22:08:07 -0700 Subject: [ofa-general] Photoshop CS3, Adobe Acrobat 8, MS Office 2007 Message-ID: <01c89444$dd353d80$536f5bd1@3dhijikata.hikaru> Reliable software onlyBekommen Sie Ihre Software unverzueglich. Einfach zahlen und sofort runterladen. Hier sind Programme in allen europaeischen Sprachen verfuegbar, programmiert fuer Windows und Macintosh. Alle Softwaren sind sehr guenstig, es handelt sich dabei garantiert um originale, komplette und voellig funktionale Versionen. * Office Enterprise 2007: $79.95 * Adobe Acrobat 8.0 Professional: $69.95 * Adobe Photoshop CS2 with ImageReady CS2: $79.95 * Office System Professional 2003 (5 Cds): $59.95 Unsere Kundenberater sind immer bereit Ihnen bei der Installation zu helfen. Wir antworten sehr schnell und geben Ihnen auch Geld-Zurueck-Garantie. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dotanb at dev.mellanox.co.il Tue Apr 1 23:10:17 2008 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Wed, 02 Apr 2008 09:10:17 +0300 Subject: [ofa-general] Re: the port numbers in some of the rdmacm examples is a fixed value In-Reply-To: <000001c89431$9c3a1660$9b37170a@amr.corp.intel.com> References: <47EBBC81.4030501@dev.mellanox.co.il> <000101c89022$ce0b3d30$9c98070a@amr.corp.intel.com> <47EF2A80.1020804@dev.mellanox.co.il> <000101c8934b$265a46e0$37fc070a@amr.corp.intel.com> <47F2324C.9060002@dev.mellanox.co.il> <000001c89431$9c3a1660$9b37170a@amr.corp.intel.com> Message-ID: <47F32349.3080409@dev.mellanox.co.il> Sean Hefty wrote: >> O.k., i sent you one patch which contains: >> 1) typo fixes (in test name of error message) + spelling typos >> 2) start of port support to control the port numbers from the command line >> (if you wish, i can supply two different patches) >> >> Only a one minute work is required to close this issue and fix the port >> number support of the udaddy. >> > > Thanks - I'll separate the patches and finish them. > > great, thanks. Dotan From dotanb at dev.mellanox.co.il Tue Apr 1 23:11:53 2008 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Wed, 02 Apr 2008 09:11:53 +0300 Subject: [ofa-general] Re: [PATCH] core: check optional verbs before using them In-Reply-To: References: <200803311750.02916.dotanb@dev.mellanox.co.il> Message-ID: <47F323A9.5020701@dev.mellanox.co.il> I would like to protect buggy SW as well, but you are right - kernel coding is for people who knows what they are doing... thanks Dotan Roland Dreier wrote: > > Check that all optional verbs are implemented in the device > > before using them. > > Some parts make sense, eg: > > > @@ -248,7 +248,9 @@ int ib_modify_srq(struct ib_srq *srq, > > struct ib_srq_attr *srq_attr, > > enum ib_srq_attr_mask srq_attr_mask) > > { > > - return srq->device->modify_srq(srq, srq_attr, srq_attr_mask, NULL); > > + return srq->device->modify_srq ? > > + srq->device->modify_srq(srq, srq_attr, srq_attr_mask, NULL) : > > + -ENOSYS; > > on the other hand: > > > @@ -265,6 +267,9 @@ int ib_destroy_srq(struct ib_srq *srq) > > struct ib_pd *pd; > > int ret; > > > > + if (!srq->device->destroy_srq) > > + return -ENOSYS; > > + > > I think it's safe to assume that a driver that allows SRQs to be created > will allow them to be destroyed, and code that destroys a non-existent > SRQ is buggy. So I don't think this is worth it. Same for dealloc MW > and dealloc FMR. > > The reg_phys_mr change is sane too. So I applied this: > > commit 3926318b1e52568b10a9275b34e0a1fdef6c10e8 > Author: Dotan Barak > Date: Mon Mar 31 17:50:02 2008 +0300 > > IB/core: Check optional verbs before using them > > Make sure that a device implements the modify_srq and reg_phys_mr > optional methods before calling them. > > Signed-off-by: Dotan Barak > Signed-off-by: Roland Dreier > > diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c > index 86ed8af..8ffb5f2 100644 > --- a/drivers/infiniband/core/verbs.c > +++ b/drivers/infiniband/core/verbs.c > @@ -248,7 +248,9 @@ int ib_modify_srq(struct ib_srq *srq, > struct ib_srq_attr *srq_attr, > enum ib_srq_attr_mask srq_attr_mask) > { > - return srq->device->modify_srq(srq, srq_attr, srq_attr_mask, NULL); > + return srq->device->modify_srq ? > + srq->device->modify_srq(srq, srq_attr, srq_attr_mask, NULL) : > + -ENOSYS; > } > EXPORT_SYMBOL(ib_modify_srq); > > @@ -672,6 +674,9 @@ struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, > { > struct ib_mr *mr; > > + if (!pd->device->reg_phys_mr) > + return ERR_PTR(-ENOSYS); > + > mr = pd->device->reg_phys_mr(pd, phys_buf_array, num_phys_buf, > mr_access_flags, iova_start); > > > From andrea at qumranet.com Tue Apr 1 23:49:52 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 2 Apr 2008 08:49:52 +0200 Subject: [ofa-general] Re: [patch 1/9] EMM Notifier: The notifier calls In-Reply-To: <20080401205635.793766935@sgi.com> References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> Message-ID: <20080402064952.GF19189@duo.random> On Tue, Apr 01, 2008 at 01:55:32PM -0700, Christoph Lameter wrote: > +/* Perform a callback */ > +int __emm_notify(struct mm_struct *mm, enum emm_operation op, > + unsigned long start, unsigned long end) > +{ > + struct emm_notifier *e = rcu_dereference(mm)->emm_notifier; > + int x; > + > + while (e) { > + > + if (e->callback) { > + x = e->callback(e, mm, op, start, end); > + if (x) > + return x; There are much bigger issues besides the rcu safety in this patch, proper aging of the secondary mmu through access bits set by hardware is unfixable with this model (you would need to do age |= e->callback), which is the proof of why this isn't flexibile enough by forcing the same parameter and retvals for all methods. No idea why you go for such inferior solution that will never get the aging right and will likely fall apart if we add more methods in the future. For example the "switch" you have to add in xpmem_emm_notifier_callback doesn't look good, at least gcc may be able to optimize it with an array indexing simulating proper pointer to function like in #v9. Most other patches will apply cleanly on top of my coming mmu notifiers #v10 that I hope will go in -mm. For #v10 the only two left open issues to discuss are: 1) the moment you remove rcu_read_lock from the methods (my #v9 had rcu_read_lock so synchronize_rcu() in Jack's patch was working with my #v9) GRU has no way to ensure the methods will fire immediately after registering. To fix this race after removing the rcu_read_lock (to prepare for the later patches that allows the VM to schedule when the mmu notifiers methods are invoked) I can replace rcu_read_lock with seqlock locking in the same way as I did in a previous patch posted here (seqlock_write around the registration method, and seqlock_read replying all callbacks if the race happened). then synchronize_rcu become unnecessary and the methods will be correctly replied allowing GRU not to corrupt memory after the registration method. EMM would also need a fix like this for GRU to be safe on top of EMM. Another less obviously safe approach is to allow the register method to succeed only when mm_users=1 and the task is single threaded. This way if all the places where the mmu notifers aren't invoked on the mm not by the current task, are only doing invalidates after/before zapping ptes, if the istantiation of new ptes is single threaded too, we shouldn't worry if we miss an invalidate for a pte that is zero and doesn't point to any physical page. In the places where current->mm != mm I'm using invalidate_page 99% of the time, and that only follows the ptep_clear_flush. The problem are the range_begin that will happen before zapping the pte in places where current->mm != mm. Unfortunately in my incremental patch where I move all invalidate_page outside of the PT lock to prepare for allowing sleeping inside the mmu notifiers, I used range_begin/end in places like try_to_unmap_cluster where current->mm != mm. In general this solution looks more fragile than the seqlock. 2) I'm uncertain how the driver can handle a range_end called before range_begin. Also multiple range_begin can happen in parallel later followed by range_end, so if there's a global seqlock that serializes the secondary mmu page fault, that will screwup (you can't seqlock_write in range_begin and sequnlock_write in range_end). The write side of the seqlock must be serialized and calling seqlock_write twice in a row before any sequnlock operation will break. A recursive rwsem taken in range_begin and released in range_end seems to be the only way to stop the secondary mmu page faults. If I would remove all range_begin/end in places where current->mm != mm, then I could as well bail out in mmu_notifier_register if use mm_users != 1 to solve problem 2 too. My solution to this is that I believe the driver is safe if the range_end is being missed if range_end is followed by an invalidate event like in invalidate_range_end, so the driver is ok to just have a static value that accounts if range_begin has ever happened and it will just return from range_end without doing anything if no range_begin ever happened. Notably I'll be trying to use range_begin in KVM too so I got to deal with 2) too. For Nick: the reason for using range_begin is supposedly an optimization: to guarantee that the last free of the page will happen outside the mmu_lock, so KVM internally to the mmu_lock is free to do: spin_lock(kvm->mmu_lock) put_page() spte = nonpresent flush secondary tlb() spin_unlock(kvm->mmu_lock) The above ordering is unsafe if the page could ever reach the freelist before the tlb flush happened. The range_begin will take the mmu_lock and will hold off kvm new page faults to allow kvm to free as many page it wants, invalidate all ptes and only at the end do a single tlb flush, while still being allowed to madvise(don't need) or munmap parts of the memory mapped by sptes. It's uncertain if the ordering should be changed to be robust against put_page putting the page in the freelist immediately, instead of using range_begin to serialize against the page going out of ptes immediately after put_page is called. If we go for a range_end-only usage of the mmu notifiers kvm will need some reordering and zapping a large number of ptes will require multiple tlb flushes as the pages have to be pointed by an array and the array is of limited size (the size of the array decides the frequency of the tlb flushes). The suggested usage of range_begin allows to do a single tlb flush for an unlimited number of sptes being zapped. From dotanb at dev.mellanox.co.il Wed Apr 2 00:39:35 2008 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Wed, 02 Apr 2008 10:39:35 +0300 Subject: [ofa-general] [PATCH/RFC] Add support for "send with invalidate" to libibverbs In-Reply-To: References: Message-ID: <47F33837.60701@dev.mellanox.co.il> Roland Dreier wrote: > diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h > index a51bb9d..679386a 100644 > --- a/include/infiniband/verbs.h > +++ b/include/infiniband/verbs.h > @@ -92,7 +92,18 @@ enum ibv_device_cap_flags { > IBV_DEVICE_SYS_IMAGE_GUID = 1 << 11, > IBV_DEVICE_RC_RNR_NAK_GEN = 1 << 12, > IBV_DEVICE_SRQ_RESIZE = 1 << 13, > - IBV_DEVICE_N_NOTIFY_CQ = 1 << 14 > + IBV_DEVICE_N_NOTIFY_CQ = 1 << 14, > + IBV_DEVICE_ZERO_STAG = 1 << 15, > + /* > + * IBV_DEVICE_KERNEL_SEND_W_INV is used by libibverbs to > + * signal to low-level driver libraries that the kernel set > + * the "send with invalidate" capaibility bit. Applications > + * should only test IBV_DEVICE_SEND_W_INV and never look at > + * IBV_DEVICE_KERNEL_SEND_W_INV. > + */ > + IBV_DEVICE_KERNEL_SEND_W_INV = 1 << 16, > + IBV_DEVICE_MEM_WINDOW = 1 << 17, > + IBV_DEVICE_SEND_W_INV = 1 << 21 > }; > Why do you need the flag IBV_DEVICE_MEM_WINDOW? If the value of device_attributes.num_mw is more than zero => the device supports memory windows, so i think this flag can be safely removed. > > enum ibv_atomic_cap { > @@ -492,7 +503,8 @@ enum ibv_send_flags { > IBV_SEND_FENCE = 1 << 0, > IBV_SEND_SIGNALED = 1 << 1, > IBV_SEND_SOLICITED = 1 << 2, > - IBV_SEND_INLINE = 1 << 3 > + IBV_SEND_INLINE = 1 << 3, > + IBV_SEND_INVALIDATE = 1 << 6 > }; > I think that the send & invalidate should be a new opcode instead of a send flag. Thanks Dotan From eli at dev.mellanox.co.il Wed Apr 2 03:04:39 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Wed, 02 Apr 2008 13:04:39 +0300 Subject: [ofa-general] Re: [PATCH 3/10] IB/core: Add LSO support In-Reply-To: References: <1205767431.25950.138.camel@mtls03> <1207064146.3781.19.camel@mtls03> Message-ID: <1207130679.3781.50.camel@mtls03> Oof, that was a bad one and the following patch fixes the problem. diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index f805e8a..4eaee27 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -255,7 +255,7 @@ static int send_wqe_overhead(enum ib_qp_type type, u32 flags) case IB_QPT_UD: return sizeof (struct mlx4_wqe_ctrl_seg) + sizeof (struct mlx4_wqe_datagram_seg) + - (flags & MLX4_IB_QP_LSO) ? 64 : 0; + ((flags & MLX4_IB_QP_LSO) ? 64 : 0); case IB_QPT_UC: return sizeof (struct mlx4_wqe_ctrl_seg) + sizeof (struct mlx4_wqe_raddr_seg); and the explanation is this. Since '+' precedes the '?' operator, the expression evaluated is: sizeof (struct mlx4_wqe_ctrl_seg) + sizeof (struct mlx4_wqe_datagram_seg) + (flags & MLX4_IB_QP_LSO) which is obviously true so the value returned is 64. The parentheses around the '?' gives the desired result. On Tue, 2008-04-01 at 12:41 -0700, Roland Dreier wrote: > > would like me to re-generate the mlx4 LSO patch to match this commit or > > would you do the adjustments? > > Sorry for being so slow. > > Anyway I did the adjustments as below. I also removed the "reserve" > variable and moved the 64 byte extra for LSO into send_wqe_overhead(), > since it seemed that the only place where you used send_wqe_overhead() > without adding in reserve was actually a bug. > > I also did various changes other places, and maybe introduced a bug: > when I try NPtcp between two systems (once running unmodified > 2.6.25-rc8, the other running my for-2.6.26 branch, both with ConnectX > with FW 2.3.000), on the side with the LSO patch, I eventually get a > "local length error" or "local QP operation err" on a send. It is an > LSO send of length 63744 with 17 fragments and an mss of 1992, so it > should be segmented into 32 packets. Some of these sends complete > successfully but eventually one fails. I'm still debugging but maybe > you have some idea? > > When I get the local QP operation error, I get this in case it helps: > > local QP operation err (QPN 000048, WQE index affa, vendor syndrome 6f, opcode = 5e) > CQE contents 00000048 00000000 00000000 00000000 00000000 00000000 affa6f02 0000005e > > - R. > > From 141035c707b81638659ada01f456d066f2b353f7 Mon Sep 17 00:00:00 2001 > From: Eli Cohen > Date: Tue, 25 Mar 2008 15:35:12 +0200 > Subject: [PATCH] IB/mlx4: Add IPoIB LSO support > > Add TSO support to the mlx4_ib driver. > > Signed-off-by: Eli Cohen > Signed-off-by: Roland Dreier > --- > drivers/infiniband/hw/mlx4/cq.c | 3 + > drivers/infiniband/hw/mlx4/main.c | 2 + > drivers/infiniband/hw/mlx4/mlx4_ib.h | 5 ++ > drivers/infiniband/hw/mlx4/qp.c | 72 +++++++++++++++++++++++++++++---- > drivers/net/mlx4/fw.c | 9 ++++ > drivers/net/mlx4/fw.h | 1 + > drivers/net/mlx4/main.c | 1 + > include/linux/mlx4/device.h | 1 + > include/linux/mlx4/qp.h | 5 ++ > 9 files changed, 90 insertions(+), 9 deletions(-) > > diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c > index d2e32b0..7d70af7 100644 > --- a/drivers/infiniband/hw/mlx4/cq.c > +++ b/drivers/infiniband/hw/mlx4/cq.c > @@ -420,6 +420,9 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq, > case MLX4_OPCODE_BIND_MW: > wc->opcode = IB_WC_BIND_MW; > break; > + case MLX4_OPCODE_LSO: > + wc->opcode = IB_WC_LSO; > + break; > } > } else { > wc->byte_len = be32_to_cpu(cqe->byte_cnt); > diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c > index 6ea4746..e9330a0 100644 > --- a/drivers/infiniband/hw/mlx4/main.c > +++ b/drivers/infiniband/hw/mlx4/main.c > @@ -101,6 +101,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, > props->device_cap_flags |= IB_DEVICE_UD_AV_PORT_ENFORCE; > if (dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_IPOIB_CSUM) > props->device_cap_flags |= IB_DEVICE_UD_IP_CSUM; > + if (dev->dev->caps.max_gso_sz) > + props->device_cap_flags |= IB_DEVICE_UD_TSO; > > props->vendor_id = be32_to_cpup((__be32 *) (out_mad->data + 36)) & > 0xffffff; > diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h > index 3726e45..3f8bd0a 100644 > --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h > +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h > @@ -110,6 +110,10 @@ struct mlx4_ib_wq { > unsigned tail; > }; > > +enum mlx4_ib_qp_flags { > + MLX4_IB_QP_LSO = 1 << 0 > +}; > + > struct mlx4_ib_qp { > struct ib_qp ibqp; > struct mlx4_qp mqp; > @@ -129,6 +133,7 @@ struct mlx4_ib_qp { > struct mlx4_mtt mtt; > int buf_size; > struct mutex mutex; > + u32 flags; > u8 port; > u8 alt_port; > u8 atomic_rd_en; > diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c > index 320c25f..8ddb97e 100644 > --- a/drivers/infiniband/hw/mlx4/qp.c > +++ b/drivers/infiniband/hw/mlx4/qp.c > @@ -71,6 +71,7 @@ enum { > > static const __be32 mlx4_ib_opcode[] = { > [IB_WR_SEND] = __constant_cpu_to_be32(MLX4_OPCODE_SEND), > + [IB_WR_LSO] = __constant_cpu_to_be32(MLX4_OPCODE_LSO), > [IB_WR_SEND_WITH_IMM] = __constant_cpu_to_be32(MLX4_OPCODE_SEND_IMM), > [IB_WR_RDMA_WRITE] = __constant_cpu_to_be32(MLX4_OPCODE_RDMA_WRITE), > [IB_WR_RDMA_WRITE_WITH_IMM] = __constant_cpu_to_be32(MLX4_OPCODE_RDMA_WRITE_IMM), > @@ -242,7 +243,7 @@ static void mlx4_ib_qp_event(struct mlx4_qp *qp, enum mlx4_event type) > } > } > > -static int send_wqe_overhead(enum ib_qp_type type) > +static int send_wqe_overhead(enum ib_qp_type type, u32 flags) > { > /* > * UD WQEs must have a datagram segment. > @@ -253,7 +254,8 @@ static int send_wqe_overhead(enum ib_qp_type type) > switch (type) { > case IB_QPT_UD: > return sizeof (struct mlx4_wqe_ctrl_seg) + > - sizeof (struct mlx4_wqe_datagram_seg); > + sizeof (struct mlx4_wqe_datagram_seg) + > + (flags & MLX4_IB_QP_LSO) ? 64 : 0; > case IB_QPT_UC: > return sizeof (struct mlx4_wqe_ctrl_seg) + > sizeof (struct mlx4_wqe_raddr_seg); > @@ -315,7 +317,7 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, > /* Sanity check SQ size before proceeding */ > if (cap->max_send_wr > dev->dev->caps.max_wqes || > cap->max_send_sge > dev->dev->caps.max_sq_sg || > - cap->max_inline_data + send_wqe_overhead(type) + > + cap->max_inline_data + send_wqe_overhead(type, qp->flags) + > sizeof (struct mlx4_wqe_inline_seg) > dev->dev->caps.max_sq_desc_sz) > return -EINVAL; > > @@ -329,7 +331,7 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, > > s = max(cap->max_send_sge * sizeof (struct mlx4_wqe_data_seg), > cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg)) + > - send_wqe_overhead(type); > + send_wqe_overhead(type, qp->flags); > > /* > * Hermon supports shrinking WQEs, such that a single work > @@ -394,7 +396,8 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, > } > > qp->sq.max_gs = ((qp->sq_max_wqes_per_wr << qp->sq.wqe_shift) - > - send_wqe_overhead(type)) / sizeof (struct mlx4_wqe_data_seg); > + send_wqe_overhead(type, qp->flags)) / > + sizeof (struct mlx4_wqe_data_seg); > > qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) + > (qp->sq.wqe_cnt << qp->sq.wqe_shift); > @@ -503,6 +506,9 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, > } else { > qp->sq_no_prefetch = 0; > > + if (init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO) > + qp->flags |= MLX4_IB_QP_LSO; > + > err = set_kernel_sq_size(dev, &init_attr->cap, init_attr->qp_type, qp); > if (err) > goto err; > @@ -673,7 +679,11 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd, > struct mlx4_ib_qp *qp; > int err; > > - if (init_attr->create_flags) > + /* We only support LSO, and only for kernel UD QPs. */ > + if (init_attr->create_flags & ~IB_QP_CREATE_IPOIB_UD_LSO) > + return ERR_PTR(-EINVAL); > + if (init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO && > + (pd->uobject || init_attr->qp_type != IB_QPT_UD)) > return ERR_PTR(-EINVAL); > > switch (init_attr->qp_type) { > @@ -879,10 +889,15 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, > } > } > > - if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI || > - ibqp->qp_type == IB_QPT_UD) > + if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI) > context->mtu_msgmax = (IB_MTU_4096 << 5) | 11; > - else if (attr_mask & IB_QP_PATH_MTU) { > + else if (ibqp->qp_type == IB_QPT_UD) { > + if (qp->flags & MLX4_IB_QP_LSO) > + context->mtu_msgmax = (IB_MTU_4096 << 5) | > + ilog2(dev->dev->caps.max_gso_sz); > + else > + context->mtu_msgmax = (IB_MTU_4096 << 5) | 11; > + } else if (attr_mask & IB_QP_PATH_MTU) { > if (attr->path_mtu < IB_MTU_256 || attr->path_mtu > IB_MTU_4096) { > printk(KERN_ERR "path MTU (%u) is invalid\n", > attr->path_mtu); > @@ -1399,6 +1414,34 @@ static void __set_data_seg(struct mlx4_wqe_data_seg *dseg, struct ib_sge *sg) > dseg->addr = cpu_to_be64(sg->addr); > } > > +static int build_lso_seg(struct mlx4_lso_seg *wqe, struct ib_send_wr *wr, > + struct mlx4_ib_qp *qp, unsigned *lso_seg_len) > +{ > + unsigned halign = ALIGN(wr->wr.ud.hlen, 16); > + > + /* > + * This is a temporary limitation and will be removed in > + * a forthcoming FW release: > + */ > + if (unlikely(wr->wr.ud.hlen) > 60) > + return -EINVAL; > + > + if (unlikely(!(qp->flags & MLX4_IB_QP_LSO) && > + wr->num_sge > qp->sq.max_gs - (halign >> 4))) > + return -EINVAL; > + > + memcpy(wqe->header, wr->wr.ud.header, wr->wr.ud.hlen); > + > + /* make sure LSO header is written before overwriting stamping */ > + wmb(); > + > + wqe->mss_hdr_size = cpu_to_be32((wr->wr.ud.mss - wr->wr.ud.hlen) << 16 | > + wr->wr.ud.hlen); > + > + *lso_seg_len = halign; > + return 0; > +} > + > int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, > struct ib_send_wr **bad_wr) > { > @@ -1412,6 +1455,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, > unsigned ind; > int uninitialized_var(stamp); > int uninitialized_var(size); > + unsigned seglen; > int i; > > spin_lock_irqsave(&qp->sq.lock, flags); > @@ -1490,6 +1534,16 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, > set_datagram_seg(wqe, wr); > wqe += sizeof (struct mlx4_wqe_datagram_seg); > size += sizeof (struct mlx4_wqe_datagram_seg) / 16; > + > + if (wr->opcode == IB_WR_LSO) { > + err = build_lso_seg(wqe, wr, qp, &seglen); > + if (err) { > + *bad_wr = wr; > + goto out; > + } > + wqe += seglen; > + size += seglen / 16; > + } > break; > > case IB_QPT_SMI: > diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c > index f494c3e..d82f275 100644 > --- a/drivers/net/mlx4/fw.c > +++ b/drivers/net/mlx4/fw.c > @@ -133,6 +133,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) > #define QUERY_DEV_CAP_MAX_AV_OFFSET 0x27 > #define QUERY_DEV_CAP_MAX_REQ_QP_OFFSET 0x29 > #define QUERY_DEV_CAP_MAX_RES_QP_OFFSET 0x2b > +#define QUERY_DEV_CAP_MAX_GSO_OFFSET 0x2d > #define QUERY_DEV_CAP_MAX_RDMA_OFFSET 0x2f > #define QUERY_DEV_CAP_RSZ_SRQ_OFFSET 0x33 > #define QUERY_DEV_CAP_ACK_DELAY_OFFSET 0x35 > @@ -215,6 +216,13 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) > dev_cap->max_requester_per_qp = 1 << (field & 0x3f); > MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_RES_QP_OFFSET); > dev_cap->max_responder_per_qp = 1 << (field & 0x3f); > + MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_GSO_OFFSET); > + field &= 0x1f; > + if (!field) > + dev_cap->max_gso_sz = 0; > + else > + dev_cap->max_gso_sz = 1 << field; > + > MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_RDMA_OFFSET); > dev_cap->max_rdma_global = 1 << (field & 0x3f); > MLX4_GET(field, outbox, QUERY_DEV_CAP_ACK_DELAY_OFFSET); > @@ -377,6 +385,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) > dev_cap->max_sq_desc_sz, dev_cap->max_sq_sg); > mlx4_dbg(dev, "Max RQ desc size: %d, max RQ S/G: %d\n", > dev_cap->max_rq_desc_sz, dev_cap->max_rq_sg); > + mlx4_dbg(dev, "Max GSO size: %d\n", dev_cap->max_gso_sz); > > dump_dev_cap_flags(dev, dev_cap->flags); > > diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h > index e16dec8..306cb9b 100644 > --- a/drivers/net/mlx4/fw.h > +++ b/drivers/net/mlx4/fw.h > @@ -96,6 +96,7 @@ struct mlx4_dev_cap { > u8 bmme_flags; > u32 reserved_lkey; > u64 max_icm_sz; > + int max_gso_sz; > }; > > struct mlx4_adapter { > diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c > index 08bfc13..7cfbe75 100644 > --- a/drivers/net/mlx4/main.c > +++ b/drivers/net/mlx4/main.c > @@ -159,6 +159,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) > dev->caps.page_size_cap = ~(u32) (dev_cap->min_page_sz - 1); > dev->caps.flags = dev_cap->flags; > dev->caps.stat_rate_support = dev_cap->stat_rate_support; > + dev->caps.max_gso_sz = dev_cap->max_gso_sz; > > return 0; > } > diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h > index 6cdf813..ff7df1a 100644 > --- a/include/linux/mlx4/device.h > +++ b/include/linux/mlx4/device.h > @@ -186,6 +186,7 @@ struct mlx4_caps { > u32 flags; > u16 stat_rate_support; > u8 port_width_cap[MLX4_MAX_PORTS + 1]; > + int max_gso_sz; > }; > > struct mlx4_buf_list { > diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h > index 31f9eb3..cf0bf4e 100644 > --- a/include/linux/mlx4/qp.h > +++ b/include/linux/mlx4/qp.h > @@ -219,6 +219,11 @@ struct mlx4_wqe_datagram_seg { > __be32 reservd[2]; > }; > > +struct mlx4_lso_seg { > + __be32 mss_hdr_size; > + __be32 header[0]; > +}; > + > struct mlx4_wqe_bind_seg { > __be32 flags1; > __be32 flags2; From eli at dev.mellanox.co.il Wed Apr 2 03:04:39 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Wed, 02 Apr 2008 13:04:39 +0300 Subject: [ofa-general] Re: [PATCH 3/10] IB/core: Add LSO support In-Reply-To: References: <1205767431.25950.138.camel@mtls03> <1207064146.3781.19.camel@mtls03> Message-ID: <1207130679.3781.50.camel@mtls03> Oof, that was a bad one and the following patch fixes the problem. diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index f805e8a..4eaee27 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -255,7 +255,7 @@ static int send_wqe_overhead(enum ib_qp_type type, u32 flags) case IB_QPT_UD: return sizeof (struct mlx4_wqe_ctrl_seg) + sizeof (struct mlx4_wqe_datagram_seg) + - (flags & MLX4_IB_QP_LSO) ? 64 : 0; + ((flags & MLX4_IB_QP_LSO) ? 64 : 0); case IB_QPT_UC: return sizeof (struct mlx4_wqe_ctrl_seg) + sizeof (struct mlx4_wqe_raddr_seg); and the explanation is this. Since '+' precedes the '?' operator, the expression evaluated is: sizeof (struct mlx4_wqe_ctrl_seg) + sizeof (struct mlx4_wqe_datagram_seg) + (flags & MLX4_IB_QP_LSO) which is obviously true so the value returned is 64. The parentheses around the '?' gives the desired result. On Tue, 2008-04-01 at 12:41 -0700, Roland Dreier wrote: > > would like me to re-generate the mlx4 LSO patch to match this commit or > > would you do the adjustments? > > Sorry for being so slow. > > Anyway I did the adjustments as below. I also removed the "reserve" > variable and moved the 64 byte extra for LSO into send_wqe_overhead(), > since it seemed that the only place where you used send_wqe_overhead() > without adding in reserve was actually a bug. > > I also did various changes other places, and maybe introduced a bug: > when I try NPtcp between two systems (once running unmodified > 2.6.25-rc8, the other running my for-2.6.26 branch, both with ConnectX > with FW 2.3.000), on the side with the LSO patch, I eventually get a > "local length error" or "local QP operation err" on a send. It is an > LSO send of length 63744 with 17 fragments and an mss of 1992, so it > should be segmented into 32 packets. Some of these sends complete > successfully but eventually one fails. I'm still debugging but maybe > you have some idea? > > When I get the local QP operation error, I get this in case it helps: > > local QP operation err (QPN 000048, WQE index affa, vendor syndrome 6f, opcode = 5e) > CQE contents 00000048 00000000 00000000 00000000 00000000 00000000 affa6f02 0000005e > > - R. > > From 141035c707b81638659ada01f456d066f2b353f7 Mon Sep 17 00:00:00 2001 > From: Eli Cohen > Date: Tue, 25 Mar 2008 15:35:12 +0200 > Subject: [PATCH] IB/mlx4: Add IPoIB LSO support > > Add TSO support to the mlx4_ib driver. > > Signed-off-by: Eli Cohen > Signed-off-by: Roland Dreier > --- > drivers/infiniband/hw/mlx4/cq.c | 3 + > drivers/infiniband/hw/mlx4/main.c | 2 + > drivers/infiniband/hw/mlx4/mlx4_ib.h | 5 ++ > drivers/infiniband/hw/mlx4/qp.c | 72 +++++++++++++++++++++++++++++---- > drivers/net/mlx4/fw.c | 9 ++++ > drivers/net/mlx4/fw.h | 1 + > drivers/net/mlx4/main.c | 1 + > include/linux/mlx4/device.h | 1 + > include/linux/mlx4/qp.h | 5 ++ > 9 files changed, 90 insertions(+), 9 deletions(-) > > diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c > index d2e32b0..7d70af7 100644 > --- a/drivers/infiniband/hw/mlx4/cq.c > +++ b/drivers/infiniband/hw/mlx4/cq.c > @@ -420,6 +420,9 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq, > case MLX4_OPCODE_BIND_MW: > wc->opcode = IB_WC_BIND_MW; > break; > + case MLX4_OPCODE_LSO: > + wc->opcode = IB_WC_LSO; > + break; > } > } else { > wc->byte_len = be32_to_cpu(cqe->byte_cnt); > diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c > index 6ea4746..e9330a0 100644 > --- a/drivers/infiniband/hw/mlx4/main.c > +++ b/drivers/infiniband/hw/mlx4/main.c > @@ -101,6 +101,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, > props->device_cap_flags |= IB_DEVICE_UD_AV_PORT_ENFORCE; > if (dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_IPOIB_CSUM) > props->device_cap_flags |= IB_DEVICE_UD_IP_CSUM; > + if (dev->dev->caps.max_gso_sz) > + props->device_cap_flags |= IB_DEVICE_UD_TSO; > > props->vendor_id = be32_to_cpup((__be32 *) (out_mad->data + 36)) & > 0xffffff; > diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h > index 3726e45..3f8bd0a 100644 > --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h > +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h > @@ -110,6 +110,10 @@ struct mlx4_ib_wq { > unsigned tail; > }; > > +enum mlx4_ib_qp_flags { > + MLX4_IB_QP_LSO = 1 << 0 > +}; > + > struct mlx4_ib_qp { > struct ib_qp ibqp; > struct mlx4_qp mqp; > @@ -129,6 +133,7 @@ struct mlx4_ib_qp { > struct mlx4_mtt mtt; > int buf_size; > struct mutex mutex; > + u32 flags; > u8 port; > u8 alt_port; > u8 atomic_rd_en; > diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c > index 320c25f..8ddb97e 100644 > --- a/drivers/infiniband/hw/mlx4/qp.c > +++ b/drivers/infiniband/hw/mlx4/qp.c > @@ -71,6 +71,7 @@ enum { > > static const __be32 mlx4_ib_opcode[] = { > [IB_WR_SEND] = __constant_cpu_to_be32(MLX4_OPCODE_SEND), > + [IB_WR_LSO] = __constant_cpu_to_be32(MLX4_OPCODE_LSO), > [IB_WR_SEND_WITH_IMM] = __constant_cpu_to_be32(MLX4_OPCODE_SEND_IMM), > [IB_WR_RDMA_WRITE] = __constant_cpu_to_be32(MLX4_OPCODE_RDMA_WRITE), > [IB_WR_RDMA_WRITE_WITH_IMM] = __constant_cpu_to_be32(MLX4_OPCODE_RDMA_WRITE_IMM), > @@ -242,7 +243,7 @@ static void mlx4_ib_qp_event(struct mlx4_qp *qp, enum mlx4_event type) > } > } > > -static int send_wqe_overhead(enum ib_qp_type type) > +static int send_wqe_overhead(enum ib_qp_type type, u32 flags) > { > /* > * UD WQEs must have a datagram segment. > @@ -253,7 +254,8 @@ static int send_wqe_overhead(enum ib_qp_type type) > switch (type) { > case IB_QPT_UD: > return sizeof (struct mlx4_wqe_ctrl_seg) + > - sizeof (struct mlx4_wqe_datagram_seg); > + sizeof (struct mlx4_wqe_datagram_seg) + > + (flags & MLX4_IB_QP_LSO) ? 64 : 0; > case IB_QPT_UC: > return sizeof (struct mlx4_wqe_ctrl_seg) + > sizeof (struct mlx4_wqe_raddr_seg); > @@ -315,7 +317,7 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, > /* Sanity check SQ size before proceeding */ > if (cap->max_send_wr > dev->dev->caps.max_wqes || > cap->max_send_sge > dev->dev->caps.max_sq_sg || > - cap->max_inline_data + send_wqe_overhead(type) + > + cap->max_inline_data + send_wqe_overhead(type, qp->flags) + > sizeof (struct mlx4_wqe_inline_seg) > dev->dev->caps.max_sq_desc_sz) > return -EINVAL; > > @@ -329,7 +331,7 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, > > s = max(cap->max_send_sge * sizeof (struct mlx4_wqe_data_seg), > cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg)) + > - send_wqe_overhead(type); > + send_wqe_overhead(type, qp->flags); > > /* > * Hermon supports shrinking WQEs, such that a single work > @@ -394,7 +396,8 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, > } > > qp->sq.max_gs = ((qp->sq_max_wqes_per_wr << qp->sq.wqe_shift) - > - send_wqe_overhead(type)) / sizeof (struct mlx4_wqe_data_seg); > + send_wqe_overhead(type, qp->flags)) / > + sizeof (struct mlx4_wqe_data_seg); > > qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) + > (qp->sq.wqe_cnt << qp->sq.wqe_shift); > @@ -503,6 +506,9 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, > } else { > qp->sq_no_prefetch = 0; > > + if (init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO) > + qp->flags |= MLX4_IB_QP_LSO; > + > err = set_kernel_sq_size(dev, &init_attr->cap, init_attr->qp_type, qp); > if (err) > goto err; > @@ -673,7 +679,11 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd, > struct mlx4_ib_qp *qp; > int err; > > - if (init_attr->create_flags) > + /* We only support LSO, and only for kernel UD QPs. */ > + if (init_attr->create_flags & ~IB_QP_CREATE_IPOIB_UD_LSO) > + return ERR_PTR(-EINVAL); > + if (init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO && > + (pd->uobject || init_attr->qp_type != IB_QPT_UD)) > return ERR_PTR(-EINVAL); > > switch (init_attr->qp_type) { > @@ -879,10 +889,15 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, > } > } > > - if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI || > - ibqp->qp_type == IB_QPT_UD) > + if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI) > context->mtu_msgmax = (IB_MTU_4096 << 5) | 11; > - else if (attr_mask & IB_QP_PATH_MTU) { > + else if (ibqp->qp_type == IB_QPT_UD) { > + if (qp->flags & MLX4_IB_QP_LSO) > + context->mtu_msgmax = (IB_MTU_4096 << 5) | > + ilog2(dev->dev->caps.max_gso_sz); > + else > + context->mtu_msgmax = (IB_MTU_4096 << 5) | 11; > + } else if (attr_mask & IB_QP_PATH_MTU) { > if (attr->path_mtu < IB_MTU_256 || attr->path_mtu > IB_MTU_4096) { > printk(KERN_ERR "path MTU (%u) is invalid\n", > attr->path_mtu); > @@ -1399,6 +1414,34 @@ static void __set_data_seg(struct mlx4_wqe_data_seg *dseg, struct ib_sge *sg) > dseg->addr = cpu_to_be64(sg->addr); > } > > +static int build_lso_seg(struct mlx4_lso_seg *wqe, struct ib_send_wr *wr, > + struct mlx4_ib_qp *qp, unsigned *lso_seg_len) > +{ > + unsigned halign = ALIGN(wr->wr.ud.hlen, 16); > + > + /* > + * This is a temporary limitation and will be removed in > + * a forthcoming FW release: > + */ > + if (unlikely(wr->wr.ud.hlen) > 60) > + return -EINVAL; > + > + if (unlikely(!(qp->flags & MLX4_IB_QP_LSO) && > + wr->num_sge > qp->sq.max_gs - (halign >> 4))) > + return -EINVAL; > + > + memcpy(wqe->header, wr->wr.ud.header, wr->wr.ud.hlen); > + > + /* make sure LSO header is written before overwriting stamping */ > + wmb(); > + > + wqe->mss_hdr_size = cpu_to_be32((wr->wr.ud.mss - wr->wr.ud.hlen) << 16 | > + wr->wr.ud.hlen); > + > + *lso_seg_len = halign; > + return 0; > +} > + > int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, > struct ib_send_wr **bad_wr) > { > @@ -1412,6 +1455,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, > unsigned ind; > int uninitialized_var(stamp); > int uninitialized_var(size); > + unsigned seglen; > int i; > > spin_lock_irqsave(&qp->sq.lock, flags); > @@ -1490,6 +1534,16 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, > set_datagram_seg(wqe, wr); > wqe += sizeof (struct mlx4_wqe_datagram_seg); > size += sizeof (struct mlx4_wqe_datagram_seg) / 16; > + > + if (wr->opcode == IB_WR_LSO) { > + err = build_lso_seg(wqe, wr, qp, &seglen); > + if (err) { > + *bad_wr = wr; > + goto out; > + } > + wqe += seglen; > + size += seglen / 16; > + } > break; > > case IB_QPT_SMI: > diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c > index f494c3e..d82f275 100644 > --- a/drivers/net/mlx4/fw.c > +++ b/drivers/net/mlx4/fw.c > @@ -133,6 +133,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) > #define QUERY_DEV_CAP_MAX_AV_OFFSET 0x27 > #define QUERY_DEV_CAP_MAX_REQ_QP_OFFSET 0x29 > #define QUERY_DEV_CAP_MAX_RES_QP_OFFSET 0x2b > +#define QUERY_DEV_CAP_MAX_GSO_OFFSET 0x2d > #define QUERY_DEV_CAP_MAX_RDMA_OFFSET 0x2f > #define QUERY_DEV_CAP_RSZ_SRQ_OFFSET 0x33 > #define QUERY_DEV_CAP_ACK_DELAY_OFFSET 0x35 > @@ -215,6 +216,13 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) > dev_cap->max_requester_per_qp = 1 << (field & 0x3f); > MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_RES_QP_OFFSET); > dev_cap->max_responder_per_qp = 1 << (field & 0x3f); > + MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_GSO_OFFSET); > + field &= 0x1f; > + if (!field) > + dev_cap->max_gso_sz = 0; > + else > + dev_cap->max_gso_sz = 1 << field; > + > MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_RDMA_OFFSET); > dev_cap->max_rdma_global = 1 << (field & 0x3f); > MLX4_GET(field, outbox, QUERY_DEV_CAP_ACK_DELAY_OFFSET); > @@ -377,6 +385,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) > dev_cap->max_sq_desc_sz, dev_cap->max_sq_sg); > mlx4_dbg(dev, "Max RQ desc size: %d, max RQ S/G: %d\n", > dev_cap->max_rq_desc_sz, dev_cap->max_rq_sg); > + mlx4_dbg(dev, "Max GSO size: %d\n", dev_cap->max_gso_sz); > > dump_dev_cap_flags(dev, dev_cap->flags); > > diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h > index e16dec8..306cb9b 100644 > --- a/drivers/net/mlx4/fw.h > +++ b/drivers/net/mlx4/fw.h > @@ -96,6 +96,7 @@ struct mlx4_dev_cap { > u8 bmme_flags; > u32 reserved_lkey; > u64 max_icm_sz; > + int max_gso_sz; > }; > > struct mlx4_adapter { > diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c > index 08bfc13..7cfbe75 100644 > --- a/drivers/net/mlx4/main.c > +++ b/drivers/net/mlx4/main.c > @@ -159,6 +159,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) > dev->caps.page_size_cap = ~(u32) (dev_cap->min_page_sz - 1); > dev->caps.flags = dev_cap->flags; > dev->caps.stat_rate_support = dev_cap->stat_rate_support; > + dev->caps.max_gso_sz = dev_cap->max_gso_sz; > > return 0; > } > diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h > index 6cdf813..ff7df1a 100644 > --- a/include/linux/mlx4/device.h > +++ b/include/linux/mlx4/device.h > @@ -186,6 +186,7 @@ struct mlx4_caps { > u32 flags; > u16 stat_rate_support; > u8 port_width_cap[MLX4_MAX_PORTS + 1]; > + int max_gso_sz; > }; > > struct mlx4_buf_list { > diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h > index 31f9eb3..cf0bf4e 100644 > --- a/include/linux/mlx4/qp.h > +++ b/include/linux/mlx4/qp.h > @@ -219,6 +219,11 @@ struct mlx4_wqe_datagram_seg { > __be32 reservd[2]; > }; > > +struct mlx4_lso_seg { > + __be32 mss_hdr_size; > + __be32 header[0]; > +}; > + > struct mlx4_wqe_bind_seg { > __be32 flags1; > __be32 flags2; From sashak at voltaire.com Wed Apr 2 06:42:50 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 2 Apr 2008 13:42:50 +0000 Subject: [ofa-general] Re: [PATCH] libibmad/dump: support VLArb table size, fix printing In-Reply-To: <20080401225418.GF29410@sgi.com> References: <20080329121252.GY13708@sashak.voltaire.com> <20080401225418.GF29410@sgi.com> Message-ID: <20080402134250.GH30617@sashak.voltaire.com> On 15:54 Tue 01 Apr , akepner at sgi.com wrote: > On Sat, Mar 29, 2008 at 12:12:52PM +0000, Sasha Khapyorsky wrote: > > > > Add support for VLArb table size. Fix printing, eliminate intermediate > > buffers, some other cleanups. > > > > Signed-off-by: Sasha Khapyorsky > > --- > > > > Arthur, could you try this? > > .... > > Tested-by: Arthur Kepner > > Yes, I tried it (along with the infiniband-diags patch) and > that fixes things. Thanks! Thanks for looking at this. I committed the fixes. Sasha From EveintelligentsiaStaples at frontpagemag.com Wed Apr 2 08:57:55 2008 From: EveintelligentsiaStaples at frontpagemag.com (Pearlie Currie) Date: Wed, 2 Apr 2008 13:57:55 -0200 Subject: [ofa-general] Next Big market Winner Message-ID: <9IX079EJXVWDA301@frontpagemag.com> THE GOLD of Small Caps LITERALLY!! The recent pull back in Gold prices has made it a PERFECT time to get in and load up Gold is destined for $2000/oz Gold & Silver company G&S Minerals ( Symbol:GSML) is on Fire People are loading up, If you missed the move in past 2 days dont dispair GSML has a price target of 3.88 by a reputed analyst firm and a Tight Float to help The the word markets fearing economic growth, Gold is bound to shine and The company with most to gain is an UNDISCOVERED GEM such as GSML Get in GSML and reap the profits From holt at sgi.com Wed Apr 2 03:59:25 2008 From: holt at sgi.com (Robin Holt) Date: Wed, 2 Apr 2008 05:59:25 -0500 Subject: [ofa-general] Re: [patch 1/9] EMM Notifier: The notifier calls In-Reply-To: <20080402064952.GF19189@duo.random> References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> Message-ID: <20080402105925.GC22493@sgi.com> On Wed, Apr 02, 2008 at 08:49:52AM +0200, Andrea Arcangeli wrote: > Most other patches will apply cleanly on top of my coming mmu > notifiers #v10 that I hope will go in -mm. > > For #v10 the only two left open issues to discuss are: Does your v10 allow sleeping inside the callbacks? Thanks, Robin From andrea at qumranet.com Wed Apr 2 04:16:51 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 2 Apr 2008 13:16:51 +0200 Subject: [ofa-general] Re: [patch 1/9] EMM Notifier: The notifier calls In-Reply-To: <20080402105925.GC22493@sgi.com> References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> <20080402105925.GC22493@sgi.com> Message-ID: <20080402111651.GN19189@duo.random> On Wed, Apr 02, 2008 at 05:59:25AM -0500, Robin Holt wrote: > On Wed, Apr 02, 2008 at 08:49:52AM +0200, Andrea Arcangeli wrote: > > Most other patches will apply cleanly on top of my coming mmu > > notifiers #v10 that I hope will go in -mm. > > > > For #v10 the only two left open issues to discuss are: > > Does your v10 allow sleeping inside the callbacks? Yes if you apply all the patches. But not if you apply the first patch only, most patches in EMM serie will apply cleanly or with minor rejects to #v10 too, Christoph's further work to make EEM sleep capable looks very good and it's going to be 100% shared, it's also going to be a lot more controversial for merging than the two #v10 or EMM first patch. EMM also doesn't allow sleeping inside the callbacks if you only apply the first patch in the serie. My priority is to get #v9 or the coming #v10 merged in -mm (only difference will be the replacement of rcu_read_lock with the seqlock to avoid breaking the synchronize_rcu in GRU code). I will mix seqlock with rcu ordered writes. EMM indeed breaks GRU by making synchronize_rcu a noop and by not providing any alternative (I will obsolete synchronize_rcu making it a noop instead). This assumes Jack used synchronize_rcu for whatever good reason. But this isn't the real strong point against EMM, adding seqlock to EMM is as easy as adding it to #v10 (admittedly with #v10 is a bit easier because I didn't expand the hlist operations for zero gain like in EMM). From NonabethlehemSheldon at play-create.com Wed Apr 2 07:20:23 2008 From: NonabethlehemSheldon at play-create.com (Nona Jack) Date: Wed, 2 Apr 2008 13:20:23 -0100 Subject: [ofa-general] Rocket Stock Report Message-ID: <0IX838EJXVWDA977@play-create.com> An HTML attachment was scrubbed... URL: From eli at dev.mellanox.co.il Wed Apr 2 04:41:33 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Wed, 02 Apr 2008 14:41:33 +0300 Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support In-Reply-To: References: <1206452112.25950.360.camel@mtls03> Message-ID: <1207136493.3781.59.camel@mtls03> On Tue, 2008-04-01 at 12:59 -0700, Roland Dreier wrote: > > + halign = ALIGN(wr->wr.ud.hlen, 16); > > This doesn't seem connected to the problem I see, but is this correct? > Suppose hlen is 48... then halign will be 48 but it really should be > 64 I think. Do we really want > > halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16); > > instead? > I don't think so, at least in the case that hlen equals 48 which is a valid one since the total length used by the LSO segment would be 48 + 4 which requires 4 * 16 bytes chunks. If we'd use the above statement the send would fail. Anyway I think this function should look like this: static int build_lso_seg(struct mlx4_lso_seg *wqe, struct ib_send_wr *wr, struct mlx4_ib_qp *qp, unsigned *lso_seg_len) { unsigned halign = ALIGN(wr->wr.ud.hlen + 4, 16); /* * This is a temporary limitation and will be removed in * a forthcoming FW release: */ if (unlikely(halign > 64)) return -EINVAL; if (unlikely(!(qp->flags & MLX4_IB_QP_LSO) && wr->num_sge > qp->sq.max_gs - (halign >> 4))) return -EINVAL; memcpy(wqe->header, wr->wr.ud.header, wr->wr.ud.hlen); /* make sure LSO header is written before overwriting stamping */ wmb(); wqe->mss_hdr_size = cpu_to_be32((wr->wr.ud.mss - wr->wr.ud.hlen) << 16 | wr->wr.ud.hlen); *lso_seg_len = halign; return 0; } And also I suggest to use these too: @@ -1539,7 +1539,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, if (wr->opcode == IB_WR_LSO) { err = build_lso_seg(wqe, wr, qp, &seglen); - if (err) { + if (unlikely(err)) { *bad_wr = wr; goto out; } @@ -1551,7 +1551,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, case IB_QPT_SMI: case IB_QPT_GSI: err = build_mlx_header(to_msqp(qp), wr, ctrl, &seglen); - if (err) { + if (unlikely(err)) { *bad_wr = wr; goto out; } @@ -1594,7 +1594,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, */ wmb(); - if (wr->opcode < 0 || wr->opcode >= ARRAY_SIZE(mlx4_ib_opcode)) { + if (unlikely(wr->opcode < 0 || wr->opcode >= ARRAY_SIZE(mlx4_ib_opcode))) { err = -EINVAL; goto out; } From eli at dev.mellanox.co.il Wed Apr 2 04:41:33 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Wed, 02 Apr 2008 14:41:33 +0300 Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support In-Reply-To: References: <1206452112.25950.360.camel@mtls03> Message-ID: <1207136493.3781.59.camel@mtls03> On Tue, 2008-04-01 at 12:59 -0700, Roland Dreier wrote: > > + halign = ALIGN(wr->wr.ud.hlen, 16); > > This doesn't seem connected to the problem I see, but is this correct? > Suppose hlen is 48... then halign will be 48 but it really should be > 64 I think. Do we really want > > halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16); > > instead? > I don't think so, at least in the case that hlen equals 48 which is a valid one since the total length used by the LSO segment would be 48 + 4 which requires 4 * 16 bytes chunks. If we'd use the above statement the send would fail. Anyway I think this function should look like this: static int build_lso_seg(struct mlx4_lso_seg *wqe, struct ib_send_wr *wr, struct mlx4_ib_qp *qp, unsigned *lso_seg_len) { unsigned halign = ALIGN(wr->wr.ud.hlen + 4, 16); /* * This is a temporary limitation and will be removed in * a forthcoming FW release: */ if (unlikely(halign > 64)) return -EINVAL; if (unlikely(!(qp->flags & MLX4_IB_QP_LSO) && wr->num_sge > qp->sq.max_gs - (halign >> 4))) return -EINVAL; memcpy(wqe->header, wr->wr.ud.header, wr->wr.ud.hlen); /* make sure LSO header is written before overwriting stamping */ wmb(); wqe->mss_hdr_size = cpu_to_be32((wr->wr.ud.mss - wr->wr.ud.hlen) << 16 | wr->wr.ud.hlen); *lso_seg_len = halign; return 0; } And also I suggest to use these too: @@ -1539,7 +1539,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, if (wr->opcode == IB_WR_LSO) { err = build_lso_seg(wqe, wr, qp, &seglen); - if (err) { + if (unlikely(err)) { *bad_wr = wr; goto out; } @@ -1551,7 +1551,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, case IB_QPT_SMI: case IB_QPT_GSI: err = build_mlx_header(to_msqp(qp), wr, ctrl, &seglen); - if (err) { + if (unlikely(err)) { *bad_wr = wr; goto out; } @@ -1594,7 +1594,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, */ wmb(); - if (wr->opcode < 0 || wr->opcode >= ARRAY_SIZE(mlx4_ib_opcode)) { + if (unlikely(wr->opcode < 0 || wr->opcode >= ARRAY_SIZE(mlx4_ib_opcode))) { err = -EINVAL; goto out; } From eli at dev.mellanox.co.il Wed Apr 2 04:52:57 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Wed, 02 Apr 2008 14:52:57 +0300 Subject: [ofa-general] [PATCH/RFC] Add support for "send with invalidate" to libibverbs In-Reply-To: References: Message-ID: <1207137177.3781.67.camel@mtls03> WRT fb9fbf7cc5301a914e099d95d8f9a46a34e58aee Since send with immediate and send with invalidate are mutually exclusive, wouldn't it make sense to use a union for both the immediate value and the invalidated rkey? Also it seems like this commit touches code in both ib core and in hw drivers. From tziporet at dev.mellanox.co.il Wed Apr 2 05:31:32 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Wed, 02 Apr 2008 15:31:32 +0300 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's in infiniband.git) In-Reply-To: References: Message-ID: <47F37CA4.8000109@mellanox.co.il> Roland Dreier wrote: > Core: > > - I did a bunch of cleanups all over drivers/infiniband and the > gcc and sparse warning noise is down to a pretty reasonable level. > Further cleanups welcome of course. > We want to add send with invalidate & mask compare and swap. Eli will be able to send the patches next week and since they are small I think they can be in for 2.6.26 > ULPs: > > - I merged Eli's IPoIB stateless offload changes for checksum > offload and LSO changes. The interrupt moderation changes are > next, and should not be a problem to merge. Please test IPoIB > on all sorts of hardware! > What about the split CQ for UD mode? It's improved the IPoIB performance for small messages significantly. > > HW specific: > > mlx4- we plan to send patches for the low level driver only to enable mlx4_en. These only affect our low level driver. Should be ready next week. I hope these can get in too. > Here are a few topics that I believe will not be ready in time for the > 2.6.26 window and will need to wait for 2.6.27 at least: > > - XRC. I still don't have a good feeling that we have settled on all > the nuances of the ABI we want to expose to userspace for this, and > ideally I would like to understand how ehca LL QPs fit into the > picture as well. > I think we should try to push for XEC in 2.6.26 since there are already MPI implementation that use it and this ties them to use OFED only. Also this feature is stable and now being defined in IBTA Not taking it causing changes between OFED and the kernel and your libibverbs and we wish to avoid such gaps. Is there any thing we can do to help and make it into 2.6.26? From jackm at dev.mellanox.co.il Wed Apr 2 06:15:44 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Wed, 2 Apr 2008 16:15:44 +0300 Subject: [ofa-general] [PATCH] mlx4: make firmware diagnostic counters available via sysfs Message-ID: <200804021615.44982.jackm@dev.mellanox.co.il> mlx4: make firmware diagnostic counters available via sysfs. Developed by: Gabi Liron of Mellanox. Signed-off-by: Jack Morgenstein --- Roland, Please queue this up for kernel 2.6.26. Thanks! Jack Index: infiniband/drivers/net/mlx4/fw.c =================================================================== --- infiniband.orig/drivers/net/mlx4/fw.c 2008-02-05 09:32:14.000000000 +0200 +++ infiniband/drivers/net/mlx4/fw.c 2008-04-02 16:06:05.000000000 +0300 @@ -827,3 +827,40 @@ int mlx4_NOP(struct mlx4_dev *dev) /* Input modifier of 0x1f means "finish as soon as possible." */ return mlx4_cmd(dev, 0, 0x1f, 0, MLX4_CMD_NOP, 100); } + +int mlx4_query_diag_counters(struct mlx4_dev *dev, int array_length, + int in_modifier, unsigned int in_offset[], + u32 counter_out[]) +{ + struct mlx4_cmd_mailbox *mailbox; + u32 *outbox; + u32 op_modifer = (u32)in_modifier; + int ret; + int i; + + mailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(mailbox)) + return PTR_ERR(mailbox); + outbox = mailbox->buf; + + ret = mlx4_cmd_box(dev, 0, mailbox->dma, 0, op_modifer, + MLX4_CMD_DIAG_RPRT, MLX4_CMD_TIME_CLASS_A); + if (ret) + goto out; + + for(i=0; i MLX4_MAILBOX_SIZE) { + ret = -1; + goto out; + } + + MLX4_GET(counter_out[i], outbox, in_offset[i]); + } + ret = 0; + +out: + mlx4_free_cmd_mailbox(dev, mailbox); + return ret; +} +EXPORT_SYMBOL_GPL(mlx4_query_diag_counters); + Index: infiniband/include/linux/mlx4/device.h =================================================================== --- infiniband.orig/include/linux/mlx4/device.h 2008-02-10 16:33:29.000000000 +0200 +++ infiniband/include/linux/mlx4/device.h 2008-04-02 16:06:05.000000000 +0300 @@ -368,5 +368,8 @@ void mlx4_fmr_unmap(struct mlx4_dev *dev u32 *lkey, u32 *rkey); int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr); int mlx4_SYNC_TPT(struct mlx4_dev *dev); +int mlx4_query_diag_counters(struct mlx4_dev *melx4_dev, int array_length, + int in_modifier, unsigned int in_offset[], + u32 counter_out[]); #endif /* MLX4_DEVICE_H */ Index: infiniband/drivers/infiniband/hw/mlx4/main.c =================================================================== --- infiniband.orig/drivers/infiniband/hw/mlx4/main.c 2008-02-27 16:21:35.000000000 +0200 +++ infiniband/drivers/infiniband/hw/mlx4/main.c 2008-04-02 16:06:05.000000000 +0300 @@ -515,6 +515,155 @@ static struct class_device_attribute *ml &class_device_attr_board_id }; +/* + * create 2 functions (show, store) and a class_device_attribute struct + * pointing to the functions for _name + */ +#define CLASS_DEVICE_DIAG_CLR_RPRT_ATTR(_name, _offset, _in_mod) \ +static ssize_t store_rprt_##_name(struct class_device *cdev, \ + const char *buf, size_t length) { \ + return store_diag_rprt(cdev, buf, length, _offset, _in_mod); \ +} \ +static ssize_t show_rprt_##_name(struct class_device *cdev, char *buf) { \ + return show_diag_rprt(cdev, buf, _offset, _in_mod); \ +} \ +static CLASS_DEVICE_ATTR(_name, S_IRUGO | S_IWUGO, \ + show_rprt_##_name, store_rprt_##_name); + +/* + * create show function and a class_device_attribute struct pointing to + * the function for _name + */ +#define CLASS_DEVICE_DIAG_RPRT_ATTR(_name, _offset, _in_mod) \ +static ssize_t show_rprt_##_name(struct class_device *cdev, char *buf){ \ + return show_diag_rprt(cdev, buf, _offset, _in_mod); \ +} \ +static CLASS_DEVICE_ATTR(_name, S_IRUGO, show_rprt_##_name, NULL); + +static ssize_t show_diag_rprt(struct class_device *cdev, char *buf, + int offset, int in_mod) +{ + size_t ret = -1; + u32 counter_offset = offset; + u32 diag_counter = 0; + struct mlx4_ib_dev *dev = container_of(cdev, struct mlx4_ib_dev, + ib_dev.class_dev); + /* clear counters file, can't read it */ + if(offset < 0) + return sprintf(buf,"This file is write only\n"); + + ret = mlx4_query_diag_counters(dev->dev, 1, in_mod, &counter_offset, + &diag_counter); + if (ret < 0) + { + sprintf(buf,"Operation failed\n"); + return ret; + } + + return sprintf(buf,"%d\n", diag_counter); +} + +/* the store function is used for counter clear */ +static ssize_t store_diag_rprt(struct class_device *cdev, + const char *buf, size_t length, + int offset, int in_mod) +{ + size_t ret = -1; + u32 counter_offset = 0; + u32 diag_counter; + struct mlx4_ib_dev *dev = container_of(cdev, struct mlx4_ib_dev, + ib_dev.class_dev); + + ret = mlx4_query_diag_counters(dev->dev, 1, in_mod, &counter_offset, + &diag_counter); + if (ret) + return ret; + + return length; +} + +CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_lle , 0x00, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_lle , 0x04, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_lqpoe , 0x08, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_lqpoe , 0x0C, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_leeoe , 0x10, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_leeoe , 0x14, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_lpe , 0x18, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_lpe , 0x1C, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_wrfe , 0x20, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_wrfe , 0x24, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_mwbe , 0x2C, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_bre , 0x34, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_lae , 0x38, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_rire , 0x44, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_rire , 0x48, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_rae , 0x4C, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_rae , 0x50, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_roe , 0x54, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_tree , 0x5C, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_rree , 0x64, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_rnr , 0x68, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_rnr , 0x6C, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_rabrte , 0x7C, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_ieecne , 0x84, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_ieecse , 0x8C, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_oos , 0x100, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_oos , 0x104, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_mce , 0x108, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_rsync , 0x110, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_rsync , 0x114, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_udsdprd , 0x118, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_ucsdprd , 0x120, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(num_cqovf , 0x1A0, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(num_eqovf , 0x1A4, 2); +CLASS_DEVICE_DIAG_RPRT_ATTR(num_baddb , 0x1A8, 2); +CLASS_DEVICE_DIAG_CLR_RPRT_ATTR(clear_diag , -1 , 3); + +static struct attribute *diag_rprt_attrs[] = { + &class_device_attr_rq_num_lle.attr, + &class_device_attr_sq_num_lle.attr, + &class_device_attr_rq_num_lqpoe.attr, + &class_device_attr_sq_num_lqpoe.attr, + &class_device_attr_rq_num_leeoe.attr, + &class_device_attr_sq_num_leeoe.attr, + &class_device_attr_rq_num_lpe.attr, + &class_device_attr_sq_num_lpe.attr, + &class_device_attr_rq_num_wrfe.attr, + &class_device_attr_sq_num_wrfe.attr, + &class_device_attr_sq_num_mwbe.attr, + &class_device_attr_sq_num_bre.attr, + &class_device_attr_rq_num_lae.attr, + &class_device_attr_sq_num_rire.attr, + &class_device_attr_rq_num_rire.attr, + &class_device_attr_sq_num_rae.attr, + &class_device_attr_rq_num_rae.attr, + &class_device_attr_sq_num_roe.attr, + &class_device_attr_sq_num_tree.attr, + &class_device_attr_sq_num_rree.attr, + &class_device_attr_rq_num_rnr.attr, + &class_device_attr_sq_num_rnr.attr, + &class_device_attr_sq_num_rabrte.attr, + &class_device_attr_sq_num_ieecne.attr, + &class_device_attr_sq_num_ieecse.attr, + &class_device_attr_rq_num_oos.attr, + &class_device_attr_sq_num_oos.attr, + &class_device_attr_rq_num_mce.attr, + &class_device_attr_rq_num_rsync.attr, + &class_device_attr_sq_num_rsync.attr, + &class_device_attr_rq_num_udsdprd.attr, + &class_device_attr_rq_num_ucsdprd.attr, + &class_device_attr_num_cqovf.attr, + &class_device_attr_num_eqovf.attr, + &class_device_attr_num_baddb.attr, + &class_device_attr_clear_diag.attr, + NULL +}; + +static struct attribute_group diag_counters_group = { + .name = "diag_counters", + .attrs = diag_rprt_attrs +}; + static void *mlx4_ib_add(struct mlx4_dev *dev) { static int mlx4_ib_version_printed; @@ -638,8 +787,14 @@ static void *mlx4_ib_add(struct mlx4_dev goto err_reg; } + if(sysfs_create_group(&ibdev->ib_dev.class_dev.kobj, &diag_counters_group)) + goto err_diag; + return ibdev; +err_diag: + ib_unregister_device(&ibdev->ib_dev); + err_reg: ib_unregister_device(&ibdev->ib_dev); @@ -663,6 +818,8 @@ static void mlx4_ib_remove(struct mlx4_d struct mlx4_ib_dev *ibdev = ibdev_ptr; int p; + sysfs_remove_group(&ibdev->ib_dev.class_dev.kobj, &diag_counters_group); + for (p = 1; p <= dev->caps.num_ports; ++p) mlx4_CLOSE_PORT(dev, p); From qvncoxgrvmyl at bluehorizonsolutions.com Wed Apr 2 06:19:01 2008 From: qvncoxgrvmyl at bluehorizonsolutions.com (Yong Askew) Date: Wed, 2 Apr 2008 21:19:01 +0800 Subject: [ofa-general] Re: Re: Message-ID: <01c89507$2bab1080$5bc151de@qvncoxgrvmyl> Control you sexual power. Buy solution online! http://fenybelloweg.com From holt at sgi.com Wed Apr 2 07:26:10 2008 From: holt at sgi.com (Robin Holt) Date: Wed, 2 Apr 2008 09:26:10 -0500 Subject: [ofa-general] Re: [patch 1/9] EMM Notifier: The notifier calls In-Reply-To: <20080402111651.GN19189@duo.random> References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> <20080402105925.GC22493@sgi.com> <20080402111651.GN19189@duo.random> Message-ID: <20080402142609.GD22493@sgi.com> I must have missed v10. Could you repost so I can build xpmem against it to see how it operates? To help reduce confusion, you should probably comandeer the patches from Christoph's set which you think are needed to make it sleep. Thanks, Robin On Wed, Apr 02, 2008 at 01:16:51PM +0200, Andrea Arcangeli wrote: > On Wed, Apr 02, 2008 at 05:59:25AM -0500, Robin Holt wrote: > > On Wed, Apr 02, 2008 at 08:49:52AM +0200, Andrea Arcangeli wrote: > > > Most other patches will apply cleanly on top of my coming mmu > > > notifiers #v10 that I hope will go in -mm. > > > > > > For #v10 the only two left open issues to discuss are: > > > > Does your v10 allow sleeping inside the callbacks? > > Yes if you apply all the patches. But not if you apply the first patch > only, most patches in EMM serie will apply cleanly or with minor > rejects to #v10 too, Christoph's further work to make EEM sleep > capable looks very good and it's going to be 100% shared, it's also > going to be a lot more controversial for merging than the two #v10 or > EMM first patch. EMM also doesn't allow sleeping inside the callbacks > if you only apply the first patch in the serie. > > My priority is to get #v9 or the coming #v10 merged in -mm (only > difference will be the replacement of rcu_read_lock with the seqlock > to avoid breaking the synchronize_rcu in GRU code). I will mix seqlock > with rcu ordered writes. EMM indeed breaks GRU by making > synchronize_rcu a noop and by not providing any alternative (I will > obsolete synchronize_rcu making it a noop instead). This assumes Jack > used synchronize_rcu for whatever good reason. But this isn't the real > strong point against EMM, adding seqlock to EMM is as easy as adding > it to #v10 (admittedly with #v10 is a bit easier because I didn't > expand the hlist operations for zero gain like in EMM). > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ From mashirle at us.ibm.com Wed Apr 2 00:22:12 2008 From: mashirle at us.ibm.com (Shirley Ma) Date: Wed, 02 Apr 2008 00:22:12 -0700 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's in infiniband.git) In-Reply-To: References: Message-ID: <1207120932.4593.47.camel@localhost.localdomain> What's the status of RDS? Thanks Shirley From rdreier at cisco.com Wed Apr 2 08:22:04 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 02 Apr 2008 08:22:04 -0700 Subject: [ofa-general] Re: [PATCH 3/10] IB/core: Add LSO support In-Reply-To: <1207130679.3781.50.camel@mtls03> (Eli Cohen's message of "Wed, 02 Apr 2008 13:04:39 +0300") References: <1205767431.25950.138.camel@mtls03> <1207064146.3781.19.camel@mtls03> <1207130679.3781.50.camel@mtls03> Message-ID: > - (flags & MLX4_IB_QP_LSO) ? 64 : 0; > + ((flags & MLX4_IB_QP_LSO) ? 64 : 0); Ugh, thanks, I've rolled that up into the patch. Sorry for messing things up... From rdreier at cisco.com Wed Apr 2 08:22:04 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 02 Apr 2008 08:22:04 -0700 Subject: [ofa-general] Re: [PATCH 3/10] IB/core: Add LSO support In-Reply-To: <1207130679.3781.50.camel@mtls03> (Eli Cohen's message of "Wed, 02 Apr 2008 13:04:39 +0300") References: <1205767431.25950.138.camel@mtls03> <1207064146.3781.19.camel@mtls03> <1207130679.3781.50.camel@mtls03> Message-ID: > - (flags & MLX4_IB_QP_LSO) ? 64 : 0; > + ((flags & MLX4_IB_QP_LSO) ? 64 : 0); Ugh, thanks, I've rolled that up into the patch. Sorry for messing things up... From rdreier at cisco.com Wed Apr 2 08:26:58 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 02 Apr 2008 08:26:58 -0700 Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support In-Reply-To: <1207136493.3781.59.camel@mtls03> (Eli Cohen's message of "Wed, 02 Apr 2008 14:41:33 +0300") References: <1206452112.25950.360.camel@mtls03> <1207136493.3781.59.camel@mtls03> Message-ID: Not sure I follow. Given that we have struct mlx4_lso_seg { __be32 mss_hdr_size; __be32 header[0]; }; I don't see much difference between my proposal > halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16); and yours > halign = ALIGN(wr->wr.ud.hlen + 4, 16); since isn't sizeof *wqe == 4? > I don't think so, at least in the case that hlen equals 48 which is a > valid one since the total length used by the LSO segment would be 48 + 4 > which requires 4 * 16 bytes chunks. If we'd use the above statement the > send would fail. But the point is that the current code would only bump the wqe pointer by 48 bytes and the last 4 bytes of the header would be overwritten by the next data segment. - R. From rdreier at cisco.com Wed Apr 2 08:26:58 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 02 Apr 2008 08:26:58 -0700 Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support In-Reply-To: <1207136493.3781.59.camel@mtls03> (Eli Cohen's message of "Wed, 02 Apr 2008 14:41:33 +0300") References: <1206452112.25950.360.camel@mtls03> <1207136493.3781.59.camel@mtls03> Message-ID: Not sure I follow. Given that we have struct mlx4_lso_seg { __be32 mss_hdr_size; __be32 header[0]; }; I don't see much difference between my proposal > halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16); and yours > halign = ALIGN(wr->wr.ud.hlen + 4, 16); since isn't sizeof *wqe == 4? > I don't think so, at least in the case that hlen equals 48 which is a > valid one since the total length used by the LSO segment would be 48 + 4 > which requires 4 * 16 bytes chunks. If we'd use the above statement the > send would fail. But the point is that the current code would only bump the wqe pointer by 48 bytes and the last 4 bytes of the header would be overwritten by the next data segment. - R. From rdreier at cisco.com Wed Apr 2 08:27:36 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 02 Apr 2008 08:27:36 -0700 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's in infiniband.git) In-Reply-To: <1207120932.4593.47.camel@localhost.localdomain> (Shirley Ma's message of "Wed, 02 Apr 2008 00:22:12 -0700") References: <1207120932.4593.47.camel@localhost.localdomain> Message-ID: > What's the status of RDS? I've never seen any patches. I guess ask the RDS guys if/when they want to start working on getting RDS merged. - R. From eli at dev.mellanox.co.il Wed Apr 2 08:57:30 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Wed, 02 Apr 2008 18:57:30 +0300 Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support In-Reply-To: References: <1206452112.25950.360.camel@mtls03> <1207136493.3781.59.camel@mtls03> Message-ID: <1207151850.3781.86.camel@mtls03> On Wed, 2008-04-02 at 08:26 -0700, Roland Dreier wrote: > Not sure I follow. Given that we have > > struct mlx4_lso_seg { > __be32 mss_hdr_size; > __be32 header[0]; > }; > > I don't see much difference between my proposal > > > halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16); > > and yours > > > halign = ALIGN(wr->wr.ud.hlen + 4, 16); > > since isn't sizeof *wqe == 4? Right, I missed that. > > > I don't think so, at least in the case that hlen equals 48 which is a > > valid one since the total length used by the LSO segment would be 48 + 4 > > which requires 4 * 16 bytes chunks. If we'd use the above statement the > > send would fail. > > But the point is that the current code would only bump the wqe pointer > by 48 bytes and the last 4 bytes of the header would be overwritten by > the next data segment. > Given the fact that sizeof *wqe == 4 then what you propose seems to be a correct approach. But I do think that this is equivalent but looks cleaner: - unsigned halign = ALIGN(wr->wr.ud.hlen, 16); + unsigned halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16); - if (unlikely(wr->wr.ud.hlen) > 60) + if (unlikely(halign > 64)) return -EINVAL; From eli at dev.mellanox.co.il Wed Apr 2 08:57:30 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Wed, 02 Apr 2008 18:57:30 +0300 Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support In-Reply-To: References: <1206452112.25950.360.camel@mtls03> <1207136493.3781.59.camel@mtls03> Message-ID: <1207151850.3781.86.camel@mtls03> On Wed, 2008-04-02 at 08:26 -0700, Roland Dreier wrote: > Not sure I follow. Given that we have > > struct mlx4_lso_seg { > __be32 mss_hdr_size; > __be32 header[0]; > }; > > I don't see much difference between my proposal > > > halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16); > > and yours > > > halign = ALIGN(wr->wr.ud.hlen + 4, 16); > > since isn't sizeof *wqe == 4? Right, I missed that. > > > I don't think so, at least in the case that hlen equals 48 which is a > > valid one since the total length used by the LSO segment would be 48 + 4 > > which requires 4 * 16 bytes chunks. If we'd use the above statement the > > send would fail. > > But the point is that the current code would only bump the wqe pointer > by 48 bytes and the last 4 bytes of the header would be overwritten by > the next data segment. > Given the fact that sizeof *wqe == 4 then what you propose seems to be a correct approach. But I do think that this is equivalent but looks cleaner: - unsigned halign = ALIGN(wr->wr.ud.hlen, 16); + unsigned halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16); - if (unlikely(wr->wr.ud.hlen) > 60) + if (unlikely(halign > 64)) return -EINVAL; From rdreier at cisco.com Wed Apr 2 09:02:57 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 02 Apr 2008 09:02:57 -0700 Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support In-Reply-To: <1207151850.3781.86.camel@mtls03> (Eli Cohen's message of "Wed, 02 Apr 2008 18:57:30 +0300") References: <1206452112.25950.360.camel@mtls03> <1207136493.3781.59.camel@mtls03> <1207151850.3781.86.camel@mtls03> Message-ID: > - unsigned halign = ALIGN(wr->wr.ud.hlen, 16); > + unsigned halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16); > > > - if (unlikely(wr->wr.ud.hlen) > 60) > + if (unlikely(halign > 64)) Sure, makes sense. From rdreier at cisco.com Wed Apr 2 09:02:57 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 02 Apr 2008 09:02:57 -0700 Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support In-Reply-To: <1207151850.3781.86.camel@mtls03> (Eli Cohen's message of "Wed, 02 Apr 2008 18:57:30 +0300") References: <1206452112.25950.360.camel@mtls03> <1207136493.3781.59.camel@mtls03> <1207151850.3781.86.camel@mtls03> Message-ID: > - unsigned halign = ALIGN(wr->wr.ud.hlen, 16); > + unsigned halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16); > > > - if (unlikely(wr->wr.ud.hlen) > 60) > + if (unlikely(halign > 64)) Sure, makes sense. From rdreier at cisco.com Wed Apr 2 09:04:14 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 02 Apr 2008 09:04:14 -0700 Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support In-Reply-To: (Roland Dreier's message of "Wed, 02 Apr 2008 09:02:57 -0700") References: <1206452112.25950.360.camel@mtls03> <1207136493.3781.59.camel@mtls03> <1207151850.3781.86.camel@mtls03> Message-ID: > - if (unlikely(wr->wr.ud.hlen) > 60) > + if (unlikely(halign > 64)) heh, just noticed that we used to have unlikely(wr->wr.ud.hlen) compared to 60. So the annotation was messed up :) - R. From rdreier at cisco.com Wed Apr 2 09:04:14 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 02 Apr 2008 09:04:14 -0700 Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support In-Reply-To: (Roland Dreier's message of "Wed, 02 Apr 2008 09:02:57 -0700") References: <1206452112.25950.360.camel@mtls03> <1207136493.3781.59.camel@mtls03> <1207151850.3781.86.camel@mtls03> Message-ID: > - if (unlikely(wr->wr.ud.hlen) > 60) > + if (unlikely(halign > 64)) heh, just noticed that we used to have unlikely(wr->wr.ud.hlen) compared to 60. So the annotation was messed up :) - R. From richard.frank at oracle.com Wed Apr 2 10:11:15 2008 From: richard.frank at oracle.com (Richard Frank) Date: Wed, 02 Apr 2008 12:11:15 -0500 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's in infiniband.git) In-Reply-To: References: <1207120932.4593.47.camel@localhost.localdomain> Message-ID: <47F3BE33.4000204@oracle.com> What is the work we need to do here - I was thinking RDS should just work ? Roland Dreier wrote: > > What's the status of RDS? > > I've never seen any patches. I guess ask the RDS guys if/when they want > to start working on getting RDS merged. > > - R. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Wed Apr 2 09:15:36 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 02 Apr 2008 09:15:36 -0700 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's in infiniband.git) In-Reply-To: <47F3BE33.4000204@oracle.com> (Richard Frank's message of "Wed, 02 Apr 2008 12:11:15 -0500") References: <1207120932.4593.47.camel@localhost.localdomain> <47F3BE33.4000204@oracle.com> Message-ID: > What is the work we need to do here - I was thinking RDS should just work ? Stuff doesn't get merged into the kernel on its own. If you want RDS upstream then the first step is to post patches in a form suitable for reviewing. Then respond to the review comments. The files Documentation/SubmittingPatches and to some extent Documentation/SubmittingDrivers in the kernel source have more info. - R. From richard.frank at oracle.com Wed Apr 2 10:18:36 2008 From: richard.frank at oracle.com (Richard Frank) Date: Wed, 02 Apr 2008 12:18:36 -0500 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's in infiniband.git) In-Reply-To: References: <1207120932.4593.47.camel@localhost.localdomain> <47F3BE33.4000204@oracle.com> Message-ID: <47F3BFEC.1000400@oracle.com> Yes, I see this is for pushing RDS upstream - but what about running RDS as is over IWARP NICs - that should just work right ? Roland Dreier wrote: > > What is the work we need to do here - I was thinking RDS should just work ? > > Stuff doesn't get merged into the kernel on its own. If you want RDS > upstream then the first step is to post patches in a form suitable for > reviewing. Then respond to the review comments. > > The files Documentation/SubmittingPatches and to some extent > Documentation/SubmittingDrivers in the kernel source have more info. > > - R. > From rdreier at cisco.com Wed Apr 2 09:19:17 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 02 Apr 2008 09:19:17 -0700 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's in infiniband.git) In-Reply-To: <47F37CA4.8000109@mellanox.co.il> (Tziporet Koren's message of "Wed, 02 Apr 2008 15:31:32 +0300") References: <47F37CA4.8000109@mellanox.co.il> Message-ID: > We want to add send with invalidate & mask compare and swap. > Eli will be able to send the patches next week and since they are > small I think they can be in for 2.6.26 Send with invalidate should be OK. Let's see about the masked atomics stuff -- we have a ton of new verbs and I think we might want to slow down and make sure it all makes sense. > What about the split CQ for UD mode? It's improved the IPoIB > performance for small messages significantly. Oh yeah... I'll try to get that in too. > mlx4- we plan to send patches for the low level driver only to enable > mlx4_en. These only affect our low level driver. No problem in principle, let's see the actual patches. > I think we should try to push for XEC in 2.6.26 since there are > already MPI implementation that use it and this ties them to use OFED > only. > Also this feature is stable and now being defined in IBTA > Not taking it causing changes between OFED and the kernel and your > libibverbs and we wish to avoid such gaps. > Is there any thing we can do to help and make it into 2.6.26? I don't have a good feeling that the user-kernel interface is well thought out, so I want to consider XRC + ehca LL stuff + new iWARP verbs and make sure we have something that makes sense for the future. - R. From rdreier at cisco.com Wed Apr 2 09:21:23 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 02 Apr 2008 09:21:23 -0700 Subject: [ofa-general] [PATCH/RFC] Add support for "send with invalidate" to libibverbs In-Reply-To: <1207137177.3781.67.camel@mtls03> (Eli Cohen's message of "Wed, 02 Apr 2008 14:52:57 +0300") References: <1207137177.3781.67.camel@mtls03> Message-ID: > Since send with immediate and send with invalidate are mutually > exclusive, wouldn't it make sense to use a union for both the immediate > value and the invalidated rkey? maybe although that would be hard to do in libibverbs without changing the API. I'm not a big fan of anonymous unions and I don't see any other good way to do it. > Also it seems like this commit touches code in both ib core and in hw > drivers. Yes, I explained why in the changelog. - R. From richard.frank at oracle.com Wed Apr 2 10:24:07 2008 From: richard.frank at oracle.com (Richard Frank) Date: Wed, 02 Apr 2008 12:24:07 -0500 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's in infiniband.git) In-Reply-To: References: <1207120932.4593.47.camel@localhost.localdomain> <47F3BE33.4000204@oracle.com> Message-ID: <47F3C137.3070209@oracle.com> WRT to merging RDS into the kernel - our current plans are to wait to see RDS adopted by more than Oracle - before approaching the kernel community about inclusion of RDS. Roland Dreier wrote: > > What is the work we need to do here - I was thinking RDS should just work ? > > Stuff doesn't get merged into the kernel on its own. If you want RDS > upstream then the first step is to post patches in a form suitable for > reviewing. Then respond to the review comments. > > The files Documentation/SubmittingPatches and to some extent > Documentation/SubmittingDrivers in the kernel source have more info. > > - R. > From rdreier at cisco.com Wed Apr 2 09:25:35 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 02 Apr 2008 09:25:35 -0700 Subject: [ofa-general] [PATCH/RFC] Add support for "send with invalidate" to libibverbs In-Reply-To: <47F33837.60701@dev.mellanox.co.il> (Dotan Barak's message of "Wed, 02 Apr 2008 10:39:35 +0300") References: <47F33837.60701@dev.mellanox.co.il> Message-ID: > Why do you need the flag IBV_DEVICE_MEM_WINDOW? > If the value of device_attributes.num_mw is more than zero => the > device supports memory windows, so i think this flag > can be safely removed. OK, I'll delete it from the libibverbs changes. I guess we can kill it on the kernel side too. > I think that the send & invalidate should be a new opcode instead of a > send flag. That makes sense. All existing hardware seems to use a separate opcode in the HW WQE format, so it makes things cleaner to use a new opcode at the verbs API level too. I'll update my patches. Thanks, Roland From rdreier at cisco.com Wed Apr 2 09:26:26 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 02 Apr 2008 09:26:26 -0700 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's in infiniband.git) In-Reply-To: <47F3BFEC.1000400@oracle.com> (Richard Frank's message of "Wed, 02 Apr 2008 12:18:36 -0500") References: <1207120932.4593.47.camel@localhost.localdomain> <47F3BE33.4000204@oracle.com> <47F3BFEC.1000400@oracle.com> Message-ID: > Yes, I see this is for pushing RDS upstream - but what about running > RDS as is over IWARP NICs - that should just work right ? No idea. It depends on whether you took into account the differences between IB and iWARP. Anyway that's not really what this thread was about. From richard.frank at oracle.com Wed Apr 2 10:28:44 2008 From: richard.frank at oracle.com (Richard Frank) Date: Wed, 02 Apr 2008 12:28:44 -0500 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's in infiniband.git) In-Reply-To: References: <1207120932.4593.47.camel@localhost.localdomain> <47F3BE33.4000204@oracle.com> <47F3BFEC.1000400@oracle.com> Message-ID: <47F3C24C.1090904@oracle.com> got it... Roland Dreier wrote: > > Yes, I see this is for pushing RDS upstream - but what about running > > RDS as is over IWARP NICs - that should just work right ? > > No idea. It depends on whether you took into account the differences > between IB and iWARP. Anyway that's not really what this thread was about. > From sweitzen at cisco.com Wed Apr 2 09:29:43 2008 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 2 Apr 2008 09:29:43 -0700 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what'sin infiniband.git) In-Reply-To: <47F3C137.3070209@oracle.com> References: <1207120932.4593.47.camel@localhost.localdomain> <47F3BE33.4000204@oracle.com> <47F3C137.3070209@oracle.com> Message-ID: > WRT to merging RDS into the kernel - our current plans are to wait to > see RDS adopted by more than Oracle - before approaching the kernel > community about inclusion of RDS. I've seen statements before from someone from Oracle that RDS was only for Oracle's use, for example, that person did not want netperf changed to support RDS. Scott Weitzenkamp SQA and Release Manager Data Center Access Engineering Cisco Systems From richard.frank at oracle.com Wed Apr 2 10:31:27 2008 From: richard.frank at oracle.com (Richard Frank) Date: Wed, 02 Apr 2008 12:31:27 -0500 Subject: [ofa-general] Has anyone tried running RDS over 10GE / IWARP NICs ? Message-ID: <47F3C2EF.6010304@oracle.com> We'd appreciate some feed back on your experience and would like to sort out any issues ASAP. Rick From xma at us.ibm.com Wed Apr 2 09:36:12 2008 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 2 Apr 2008 09:36:12 -0700 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's in infiniband.git) In-Reply-To: <47F3C24C.1090904@oracle.com> Message-ID: > got it... Can the maintainer submit RDS patch for mainline kernel, in 2.6.26 or 2.6.27 window? It's hard for Distros pick this feature without mainline kernel acceptance. Thanks Shirley -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Wed Apr 2 09:37:53 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 02 Apr 2008 09:37:53 -0700 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's in infiniband.git) In-Reply-To: (Shirley Ma's message of "Wed, 2 Apr 2008 09:36:12 -0700") References: Message-ID: > Can the maintainer submit RDS patch for mainline kernel, in 2.6.26 or > 2.6.27 window? It's hard for Distros pick this feature without mainline > kernel acceptance. At least as a first order approximation, there is no chance of RDS being merged for 2.6.26 even if patches appear right this second... From richard.frank at oracle.com Wed Apr 2 10:37:45 2008 From: richard.frank at oracle.com (Richard Frank) Date: Wed, 02 Apr 2008 12:37:45 -0500 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what'sin infiniband.git) In-Reply-To: References: <1207120932.4593.47.camel@localhost.localdomain> <47F3BE33.4000204@oracle.com> <47F3C137.3070209@oracle.com> Message-ID: <47F3C469.1020803@oracle.com> I believe there is a patch for NetPerf which supports RDS - although it may need to be updated - and submitted. The only prior discussion I can think of - was whether or not NetPerf exercises RDS as Oracle would. I'm not proposing that we should enhance NetPerf to do that (but that's OK with me). We created a tool rds-stress which does that. Scott Weitzenkamp (sweitzen) wrote: >> WRT to merging RDS into the kernel - our current plans are to wait to >> see RDS adopted by more than Oracle - before approaching the kernel >> community about inclusion of RDS. >> > > I've seen statements before from someone from Oracle that RDS was only > for Oracle's use, for example, that person did not want netperf changed > to support RDS. > > Scott Weitzenkamp > SQA and Release Manager > Data Center Access Engineering > Cisco Systems > From sweitzen at cisco.com Wed Apr 2 09:41:30 2008 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 2 Apr 2008 09:41:30 -0700 Subject: [ofa-general] RE: [rds-devel] Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: <47F3C2EF.6010304@oracle.com> References: <47F3C2EF.6010304@oracle.com> Message-ID: Does't appear to work with Chelsio and OFED 1.3: [root at svbu-qa2950-1 counters]# ethtool -i eth2 driver: cxgb3 version: 1.0-ofed firmware-version: T 5.0.0 TP 1.1.0 bus-info: 0000:0b:00.0 [root at svbu-qa2950-1 counters]# ifconfig eth2 eth2 Link encap:Ethernet HWaddr 00:07:43:05:43:9F inet addr:192.168.0.198 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::207:43ff:fe05:439f/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:144770 errors:0 dropped:0 overruns:0 frame:0 TX packets:144781 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:207891512 (198.2 MiB) TX bytes:9348152 (8.9 MiB) Interrupt:169 Memory:fceff000-fcefffff [root at svbu-qa2950-1 counters]# rds-sink -s 192.168.0.198:22222 -i 1 rds-sink: Unable to bind socket: Cannot assign requested address Scott Weitzenkamp SQA and Release Manager Data Center Access Engineering Cisco Systems > -----Original Message----- > From: rds-devel-bounces at oss.oracle.com > [mailto:rds-devel-bounces at oss.oracle.com] On Behalf Of Richard Frank > Sent: Wednesday, April 02, 2008 10:31 AM > To: rds-devel at oss.oracle.com; [ofa_general] > Subject: [rds-devel] Has anyone tried running RDS over 10GE / > IWARP NICs ? > > We'd appreciate some feed back on your experience and would > like to sort > out any issues ASAP. > > Rick > > _______________________________________________ > rds-devel mailing list > rds-devel at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/rds-devel > From richard.frank at oracle.com Wed Apr 2 10:43:45 2008 From: richard.frank at oracle.com (Richard Frank) Date: Wed, 02 Apr 2008 12:43:45 -0500 Subject: [ofa-general] Re: [rds-devel] Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: References: <47F3C2EF.6010304@oracle.com> Message-ID: <47F3C5D1.5000003@oracle.com> is the rds driver loaded (modprobe rds) Scott Weitzenkamp (sweitzen) wrote: > Does't appear to work with Chelsio and OFED 1.3: > > [root at svbu-qa2950-1 counters]# ethtool -i eth2 > driver: cxgb3 > version: 1.0-ofed > firmware-version: T 5.0.0 TP 1.1.0 > bus-info: 0000:0b:00.0 > [root at svbu-qa2950-1 counters]# ifconfig eth2 > eth2 Link encap:Ethernet HWaddr 00:07:43:05:43:9F > inet addr:192.168.0.198 Bcast:192.168.0.255 > Mask:255.255.255.0 > inet6 addr: fe80::207:43ff:fe05:439f/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:144770 errors:0 dropped:0 overruns:0 frame:0 > TX packets:144781 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:207891512 (198.2 MiB) TX bytes:9348152 (8.9 MiB) > Interrupt:169 Memory:fceff000-fcefffff > > [root at svbu-qa2950-1 counters]# rds-sink -s 192.168.0.198:22222 -i 1 > rds-sink: Unable to bind socket: Cannot assign requested address > > Scott Weitzenkamp > SQA and Release Manager > Data Center Access Engineering > Cisco Systems > > > > > >> -----Original Message----- >> From: rds-devel-bounces at oss.oracle.com >> [mailto:rds-devel-bounces at oss.oracle.com] On Behalf Of Richard Frank >> Sent: Wednesday, April 02, 2008 10:31 AM >> To: rds-devel at oss.oracle.com; [ofa_general] >> Subject: [rds-devel] Has anyone tried running RDS over 10GE / >> IWARP NICs ? >> >> We'd appreciate some feed back on your experience and would >> like to sort >> out any issues ASAP. >> >> Rick >> >> _______________________________________________ >> rds-devel mailing list >> rds-devel at oss.oracle.com >> http://oss.oracle.com/mailman/listinfo/rds-devel >> >> From sweitzen at cisco.com Wed Apr 2 09:46:48 2008 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 2 Apr 2008 09:46:48 -0700 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what'sin infiniband.git) In-Reply-To: <47F3C469.1020803@oracle.com> References: <1207120932.4593.47.camel@localhost.localdomain> <47F3BE33.4000204@oracle.com> <47F3C137.3070209@oracle.com> <47F3C469.1020803@oracle.com> Message-ID: Rich, On Nov 1, 2007, you wrote this to rds-devel: "Netperf is too simplistic in that all it seems to do is stream data in a simple loop. This is not how Oracle uses the IPC and again does not reflect what it would take to make UDP reliable. For this reason we are not interested in having Netperf support RDS and or seeing Netperf data." I would like to see RDS supported by existing common tools like netperf, iperf, etc. so we can easily compare how RDS performs to UDP for IPC models other than Oracle. Scott Weitzenkamp SQA and Release Manager Data Center Access Engineering Cisco Systems > -----Original Message----- > From: Richard Frank [mailto:richard.frank at oracle.com] > Sent: Wednesday, April 02, 2008 10:38 AM > To: Scott Weitzenkamp (sweitzen) > Cc: Roland Dreier (rdreier); rds-devel at oss.oracle.com; > linux-kernel at vger.kernel.org; general at lists.openfabrics.org > Subject: Re: [ofa-general] InfiniBand/iWARP/RDMA merge plans > for 2.6.26 (what'sin infiniband.git) > > I believe there is a patch for NetPerf which supports RDS - > although it > may need to be updated - and submitted. > > The only prior discussion I can think of - was whether or not NetPerf > exercises RDS as Oracle would. > > I'm not proposing that we should enhance NetPerf to do that > (but that's > OK with me). > > We created a tool rds-stress which does that. > > Scott Weitzenkamp (sweitzen) wrote: > >> WRT to merging RDS into the kernel - our current plans are > to wait to > >> see RDS adopted by more than Oracle - before approaching > the kernel > >> community about inclusion of RDS. > >> > > > > I've seen statements before from someone from Oracle that > RDS was only > > for Oracle's use, for example, that person did not want > netperf changed > > to support RDS. > > > > Scott Weitzenkamp > > SQA and Release Manager > > Data Center Access Engineering > > Cisco Systems > > > From sweitzen at cisco.com Wed Apr 2 09:47:18 2008 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 2 Apr 2008 09:47:18 -0700 Subject: [ofa-general] RE: [rds-devel] Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: <47F3C5D1.5000003@oracle.com> References: <47F3C2EF.6010304@oracle.com> <47F3C5D1.5000003@oracle.com> Message-ID: Yes, it's loaded, and dmesg says this: Registered RDS/ib transport Registered RDS/tcp transport NET: Registered protocol family 28 Scott > -----Original Message----- > From: Richard Frank [mailto:richard.frank at oracle.com] > Sent: Wednesday, April 02, 2008 10:44 AM > To: Scott Weitzenkamp (sweitzen) > Cc: rds-devel at oss.oracle.com; [ofa_general] > Subject: Re: [rds-devel] Has anyone tried running RDS over > 10GE / IWARP NICs ? > > is the rds driver loaded (modprobe rds) > > Scott Weitzenkamp (sweitzen) wrote: > > Does't appear to work with Chelsio and OFED 1.3: > > > > [root at svbu-qa2950-1 counters]# ethtool -i eth2 > > driver: cxgb3 > > version: 1.0-ofed > > firmware-version: T 5.0.0 TP 1.1.0 > > bus-info: 0000:0b:00.0 > > [root at svbu-qa2950-1 counters]# ifconfig eth2 > > eth2 Link encap:Ethernet HWaddr 00:07:43:05:43:9F > > inet addr:192.168.0.198 Bcast:192.168.0.255 > > Mask:255.255.255.0 > > inet6 addr: fe80::207:43ff:fe05:439f/64 Scope:Link > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > RX packets:144770 errors:0 dropped:0 overruns:0 frame:0 > > TX packets:144781 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:1000 > > RX bytes:207891512 (198.2 MiB) TX bytes:9348152 (8.9 MiB) > > Interrupt:169 Memory:fceff000-fcefffff > > > > [root at svbu-qa2950-1 counters]# rds-sink -s 192.168.0.198:22222 -i 1 > > rds-sink: Unable to bind socket: Cannot assign requested address > > > > Scott Weitzenkamp > > SQA and Release Manager > > Data Center Access Engineering > > Cisco Systems > > > > > > > > > > > >> -----Original Message----- > >> From: rds-devel-bounces at oss.oracle.com > >> [mailto:rds-devel-bounces at oss.oracle.com] On Behalf Of > Richard Frank > >> Sent: Wednesday, April 02, 2008 10:31 AM > >> To: rds-devel at oss.oracle.com; [ofa_general] > >> Subject: [rds-devel] Has anyone tried running RDS over 10GE / > >> IWARP NICs ? > >> > >> We'd appreciate some feed back on your experience and would > >> like to sort > >> out any issues ASAP. > >> > >> Rick > >> > >> _______________________________________________ > >> rds-devel mailing list > >> rds-devel at oss.oracle.com > >> http://oss.oracle.com/mailman/listinfo/rds-devel > >> > >> > From weiny2 at llnl.gov Wed Apr 2 09:50:57 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Wed, 2 Apr 2008 09:50:57 -0700 Subject: [ofa-general] Reminder: OpenSM BOF at OFA Sonoma Workshop Message-ID: <20080402095057.360cbff1.weiny2@llnl.gov> Just a reminder that we are going to have a BOF for OpenSM Monday the 7th at 6:30pm; room is TBA. Please come and share your use, experience and desires for OpenSM. Or if you have yet to try OpenSM, listen in on what others are doing with it. Thanks, Ira Weiny Comp Sci./Math Prog. Lawrence Livermore National Lab weiny2 at llnl.gov From richard.frank at oracle.com Wed Apr 2 11:00:27 2008 From: richard.frank at oracle.com (Richard Frank) Date: Wed, 02 Apr 2008 13:00:27 -0500 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what'sin infiniband.git) In-Reply-To: References: <1207120932.4593.47.camel@localhost.localdomain> <47F3BE33.4000204@oracle.com> <47F3C137.3070209@oracle.com> <47F3C469.1020803@oracle.com> Message-ID: <47F3C9BB.5040009@oracle.com> OK - and the conversation was about using NetPerf to compare performance of RDS to UDP relative to suitability for Oracle use ... so I think those statements still illustrate my points... 1) NetPerf does not do what Oracle does - and hence is not useful from Oracle's perspective in comparing ULPs. 2) For some metrics - it's not valid to compare a non-reliable IPC to a reliable IPC - it's not an apples to apples comparison. Especially when the app is considered and what the app must do to use UDP vs RDS. I did not say that NetPerf should not be extended to support RDS - just that using it to do a comparison of ULPs to determine how well Oracle would run - is not what we (Oracle) would want - at least that was my intention.. Scott Weitzenkamp (sweitzen) wrote: > Rich, > > On Nov 1, 2007, you wrote this to rds-devel: > > "Netperf is too simplistic in that all it seems to do is stream data > in a > simple loop. This is not how Oracle uses the IPC and again does not > reflect what it would take to make UDP reliable. > > For this reason we are not interested in having Netperf support RDS > and > or seeing Netperf data." > > I would like to see RDS supported by existing common tools like netperf, > iperf, etc. so we can easily compare how RDS performs to UDP for IPC > models other than Oracle. > > Scott Weitzenkamp > SQA and Release Manager > Data Center Access Engineering > Cisco Systems > > > > > >> -----Original Message----- >> From: Richard Frank [mailto:richard.frank at oracle.com] >> Sent: Wednesday, April 02, 2008 10:38 AM >> To: Scott Weitzenkamp (sweitzen) >> Cc: Roland Dreier (rdreier); rds-devel at oss.oracle.com; >> linux-kernel at vger.kernel.org; general at lists.openfabrics.org >> Subject: Re: [ofa-general] InfiniBand/iWARP/RDMA merge plans >> for 2.6.26 (what'sin infiniband.git) >> >> I believe there is a patch for NetPerf which supports RDS - >> although it >> may need to be updated - and submitted. >> >> The only prior discussion I can think of - was whether or not NetPerf >> exercises RDS as Oracle would. >> >> I'm not proposing that we should enhance NetPerf to do that >> (but that's >> OK with me). >> >> We created a tool rds-stress which does that. >> >> Scott Weitzenkamp (sweitzen) wrote: >> >>>> WRT to merging RDS into the kernel - our current plans are >>>> >> to wait to >> >>>> see RDS adopted by more than Oracle - before approaching >>>> >> the kernel >> >>>> community about inclusion of RDS. >>>> >>>> >>> I've seen statements before from someone from Oracle that >>> >> RDS was only >> >>> for Oracle's use, for example, that person did not want >>> >> netperf changed >> >>> to support RDS. >>> >>> Scott Weitzenkamp >>> SQA and Release Manager >>> Data Center Access Engineering >>> Cisco Systems >>> >>> From sweitzen at cisco.com Wed Apr 2 10:04:23 2008 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 2 Apr 2008 10:04:23 -0700 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what'sin infiniband.git) In-Reply-To: <47F3C9BB.5040009@oracle.com> References: <1207120932.4593.47.camel@localhost.localdomain> <47F3BE33.4000204@oracle.com> <47F3C137.3070209@oracle.com> <47F3C469.1020803@oracle.com> <47F3C9BB.5040009@oracle.com> Message-ID: I'd like to see netperf comparisions of UDP_STREAM/UDP_RR vs RDS_STREAM/RDS_RR, does anyone have a patch that will apply cleanly to a recent netperf? Scott Weitzenkamp SQA and Release Manager Data Center Access Engineering Cisco Systems > -----Original Message----- > From: Richard Frank [mailto:richard.frank at oracle.com] > Sent: Wednesday, April 02, 2008 11:00 AM > To: Scott Weitzenkamp (sweitzen) > Cc: Roland Dreier (rdreier); rds-devel at oss.oracle.com; > linux-kernel at vger.kernel.org; general at lists.openfabrics.org > Subject: Re: [ofa-general] InfiniBand/iWARP/RDMA merge plans > for 2.6.26 (what'sin infiniband.git) > > OK - and the conversation was about using NetPerf to compare > performance > of RDS to UDP relative to suitability for Oracle use ... so I think > those statements still illustrate my points... > > 1) NetPerf does not do what Oracle does - and hence is not > useful from > Oracle's perspective in comparing ULPs. > 2) For some metrics - it's not valid to compare a > non-reliable IPC to a > reliable IPC - it's not an apples to apples comparison. > Especially when > the app is considered and what the app must do to use UDP vs RDS. > > I did not say that NetPerf should not be extended to support > RDS - just > that using it to do a comparison of ULPs to determine how well Oracle > would run - is not what we (Oracle) would want - at least that was my > intention.. > > Scott Weitzenkamp (sweitzen) wrote: > > Rich, > > > > On Nov 1, 2007, you wrote this to rds-devel: > > > > "Netperf is too simplistic in that all it seems to do is > stream data > > in a > > simple loop. This is not how Oracle uses the IPC and > again does not > > reflect what it would take to make UDP reliable. > > > > For this reason we are not interested in having Netperf > support RDS > > and > > or seeing Netperf data." > > > > I would like to see RDS supported by existing common tools > like netperf, > > iperf, etc. so we can easily compare how RDS performs to UDP for IPC > > models other than Oracle. > > > > Scott Weitzenkamp > > SQA and Release Manager > > Data Center Access Engineering > > Cisco Systems > > > > > > > > > > > >> -----Original Message----- > >> From: Richard Frank [mailto:richard.frank at oracle.com] > >> Sent: Wednesday, April 02, 2008 10:38 AM > >> To: Scott Weitzenkamp (sweitzen) > >> Cc: Roland Dreier (rdreier); rds-devel at oss.oracle.com; > >> linux-kernel at vger.kernel.org; general at lists.openfabrics.org > >> Subject: Re: [ofa-general] InfiniBand/iWARP/RDMA merge plans > >> for 2.6.26 (what'sin infiniband.git) > >> > >> I believe there is a patch for NetPerf which supports RDS - > >> although it > >> may need to be updated - and submitted. > >> > >> The only prior discussion I can think of - was whether or > not NetPerf > >> exercises RDS as Oracle would. > >> > >> I'm not proposing that we should enhance NetPerf to do that > >> (but that's > >> OK with me). > >> > >> We created a tool rds-stress which does that. > >> > >> Scott Weitzenkamp (sweitzen) wrote: > >> > >>>> WRT to merging RDS into the kernel - our current plans are > >>>> > >> to wait to > >> > >>>> see RDS adopted by more than Oracle - before approaching > >>>> > >> the kernel > >> > >>>> community about inclusion of RDS. > >>>> > >>>> > >>> I've seen statements before from someone from Oracle that > >>> > >> RDS was only > >> > >>> for Oracle's use, for example, that person did not want > >>> > >> netperf changed > >> > >>> to support RDS. > >>> > >>> Scott Weitzenkamp > >>> SQA and Release Manager > >>> Data Center Access Engineering > >>> Cisco Systems > >>> > >>> > From richard.frank at oracle.com Wed Apr 2 11:03:53 2008 From: richard.frank at oracle.com (Richard Frank) Date: Wed, 02 Apr 2008 13:03:53 -0500 Subject: [ofa-general] Re: [rds-devel] Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: <47F3C5D1.5000003@oracle.com> References: <47F3C2EF.6010304@oracle.com> <47F3C5D1.5000003@oracle.com> Message-ID: <47F3CA89.9080406@oracle.com> RDS does not run over regular 10G NICs - that appear as simple NICS - this was disabled in 1.3. For now we are interested in RDS over IWARP NICS - configured as accessible via the verbs interfaces. Richard Frank wrote: > is the rds driver loaded (modprobe rds) > > Scott Weitzenkamp (sweitzen) wrote: > >> Does't appear to work with Chelsio and OFED 1.3: >> >> [root at svbu-qa2950-1 counters]# ethtool -i eth2 >> driver: cxgb3 >> version: 1.0-ofed >> firmware-version: T 5.0.0 TP 1.1.0 >> bus-info: 0000:0b:00.0 >> [root at svbu-qa2950-1 counters]# ifconfig eth2 >> eth2 Link encap:Ethernet HWaddr 00:07:43:05:43:9F >> inet addr:192.168.0.198 Bcast:192.168.0.255 >> Mask:255.255.255.0 >> inet6 addr: fe80::207:43ff:fe05:439f/64 Scope:Link >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:144770 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:144781 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:207891512 (198.2 MiB) TX bytes:9348152 (8.9 MiB) >> Interrupt:169 Memory:fceff000-fcefffff >> >> [root at svbu-qa2950-1 counters]# rds-sink -s 192.168.0.198:22222 -i 1 >> rds-sink: Unable to bind socket: Cannot assign requested address >> >> Scott Weitzenkamp >> SQA and Release Manager >> Data Center Access Engineering >> Cisco Systems >> >> >> >> >> >> >>> -----Original Message----- >>> From: rds-devel-bounces at oss.oracle.com >>> [mailto:rds-devel-bounces at oss.oracle.com] On Behalf Of Richard Frank >>> Sent: Wednesday, April 02, 2008 10:31 AM >>> To: rds-devel at oss.oracle.com; [ofa_general] >>> Subject: [rds-devel] Has anyone tried running RDS over 10GE / >>> IWARP NICs ? >>> >>> We'd appreciate some feed back on your experience and would >>> like to sort >>> out any issues ASAP. >>> >>> Rick >>> >>> _______________________________________________ >>> rds-devel mailing list >>> rds-devel at oss.oracle.com >>> http://oss.oracle.com/mailman/listinfo/rds-devel >>> >>> >>> > > _______________________________________________ > rds-devel mailing list > rds-devel at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/rds-devel > From sweitzen at cisco.com Wed Apr 2 10:09:14 2008 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 2 Apr 2008 10:09:14 -0700 Subject: [ofa-general] RE: [rds-devel] Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: <47F3CA89.9080406@oracle.com> References: <47F3C2EF.6010304@oracle.com> <47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com> Message-ID: Yes, it's an iWARP NIC, and the OFED 1.3 perftest ib_rdma_lat program is working. Scott > -----Original Message----- > From: Richard Frank [mailto:richard.frank at oracle.com] > Sent: Wednesday, April 02, 2008 11:04 AM > To: Scott Weitzenkamp (sweitzen) > Cc: rds-devel at oss.oracle.com; [ofa_general] > Subject: Re: [rds-devel] Has anyone tried running RDS over > 10GE / IWARP NICs ? > > RDS does not run over regular 10G NICs - that appear as simple NICS - > this was disabled in 1.3. > > For now we are interested in RDS over IWARP NICS - configured as > accessible via the verbs interfaces. > > Richard Frank wrote: > > is the rds driver loaded (modprobe rds) > > > > Scott Weitzenkamp (sweitzen) wrote: > > > >> Does't appear to work with Chelsio and OFED 1.3: > >> > >> [root at svbu-qa2950-1 counters]# ethtool -i eth2 > >> driver: cxgb3 > >> version: 1.0-ofed > >> firmware-version: T 5.0.0 TP 1.1.0 > >> bus-info: 0000:0b:00.0 > >> [root at svbu-qa2950-1 counters]# ifconfig eth2 > >> eth2 Link encap:Ethernet HWaddr 00:07:43:05:43:9F > >> inet addr:192.168.0.198 Bcast:192.168.0.255 > >> Mask:255.255.255.0 > >> inet6 addr: fe80::207:43ff:fe05:439f/64 Scope:Link > >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > >> RX packets:144770 errors:0 dropped:0 overruns:0 frame:0 > >> TX packets:144781 errors:0 dropped:0 overruns:0 carrier:0 > >> collisions:0 txqueuelen:1000 > >> RX bytes:207891512 (198.2 MiB) TX bytes:9348152 > (8.9 MiB) > >> Interrupt:169 Memory:fceff000-fcefffff > >> > >> [root at svbu-qa2950-1 counters]# rds-sink -s 192.168.0.198:22222 -i 1 > >> rds-sink: Unable to bind socket: Cannot assign requested address > >> > >> Scott Weitzenkamp > >> SQA and Release Manager > >> Data Center Access Engineering > >> Cisco Systems > >> > >> > >> > >> > >> > >> > >>> -----Original Message----- > >>> From: rds-devel-bounces at oss.oracle.com > >>> [mailto:rds-devel-bounces at oss.oracle.com] On Behalf Of > Richard Frank > >>> Sent: Wednesday, April 02, 2008 10:31 AM > >>> To: rds-devel at oss.oracle.com; [ofa_general] > >>> Subject: [rds-devel] Has anyone tried running RDS over 10GE / > >>> IWARP NICs ? > >>> > >>> We'd appreciate some feed back on your experience and would > >>> like to sort > >>> out any issues ASAP. > >>> > >>> Rick > >>> > >>> _______________________________________________ > >>> rds-devel mailing list > >>> rds-devel at oss.oracle.com > >>> http://oss.oracle.com/mailman/listinfo/rds-devel > >>> > >>> > >>> > > > > _______________________________________________ > > rds-devel mailing list > > rds-devel at oss.oracle.com > > http://oss.oracle.com/mailman/listinfo/rds-devel > > > From Thomas.Talpey at netapp.com Wed Apr 2 10:21:39 2008 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Wed, 02 Apr 2008 13:21:39 -0400 Subject: [ofa-general] [PATCH/RFC] Add support for "send with invalidate" to libibverbs In-Reply-To: <47F33837.60701@dev.mellanox.co.il> References: <47F33837.60701@dev.mellanox.co.il> Message-ID: At 03:39 AM 4/2/2008, Dotan Barak wrote: >If the value of device_attributes.num_mw is more than zero => the device >supports memory windows, so i think this flag >can be safely removed. I agree with removing the flag, but if you mean "max_mw", looking at the tree, there are a few problems with the > zero assertion. :-) drivers/infiniband/hw/ehca/ehca_hca.c 376: props->max_mw = min_t(unsigned, rblock->max_mw, INT_MAX); drivers/infiniband/hw/nes/nes_verbs.c 3915: props->max_mw = nesibdev->max_mr; Note, ehca may set it to huge negative values, and nes puts the wrong value in the attribute field! (typo?) The good news is, the AMSO1100 seems to get it right. ;-) I'm still looking to be able to test the NFS/RDMA client over memory windows. The code's all there in the RPC layer, just not in the providers. Tom. From Jeffrey.C.Becker at nasa.gov Wed Apr 2 10:41:58 2008 From: Jeffrey.C.Becker at nasa.gov (Jeff Becker) Date: Wed, 02 Apr 2008 10:41:58 -0700 Subject: [ofa-general] Spam on mailing list general@openib.org In-Reply-To: <47F1D77F.7030104@voltaire.com> References: <47EBB5A0.6030000@isomerica.net> <20080327152720.GB24509@cefeid.wcss.wroc.pl> <47EBBE4B.5090706@isomerica.net> <47EBCF04.6040208@nasa.gov> <47F11D54.601@nasa.gov> <47F1D77F.7030104@voltaire.com> Message-ID: <47F3C566.6000602@nasa.gov> Since I didn't hear any other votes, I commented out (turned off) the old openib lists. I can reinstate them if needed. I hope this helps. Thanks. -jeff Or Gerlitz wrote: > Jeff Becker wrote: >> Hi. Since valid stuff is being rejected, I reset the SPAM filter for >> general to its previous setting. As Tom noted, most of the spam comes >> through the old openib.org lists. Is there any reason to keep these? >> If not, I can see about turning them off in order to improve the >> situation. Thanks. > I vote for turning off the old openib.org lists, best if you can set > some automatic reply on them redirecting the sender to the > openfabrics.org lists, etc. > > Or. > From andrea at qumranet.com Wed Apr 2 10:50:58 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 2 Apr 2008 19:50:58 +0200 Subject: [ofa-general] Re: [patch 5/9] Convert anon_vma lock to rw_sem and refcount In-Reply-To: <20080401205636.777127252@sgi.com> References: <20080401205531.986291575@sgi.com> <20080401205636.777127252@sgi.com> Message-ID: <20080402175058.GR19189@duo.random> On Tue, Apr 01, 2008 at 01:55:36PM -0700, Christoph Lameter wrote: > This results in f.e. the Aim9 brk performance test to got down by 10-15%. I guess it's more likely because of overscheduling for small crtitical sections, did you counted the total number of context switches? I guess there will be a lot more with your patch applied. That regression is a showstopper and it is the reason why I've suggested before to add a CONFIG_XPMEM or CONFIG_MMU_NOTIFIER_SLEEP config option to make the VM locks sleep capable only when XPMEM=y (PREEMPT_RT will enable it too). Thanks for doing the benchmark work! From clameter at sgi.com Wed Apr 2 10:59:50 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 2 Apr 2008 10:59:50 -0700 (PDT) Subject: [ofa-general] Re: [patch 1/9] EMM Notifier: The notifier calls In-Reply-To: <20080402064952.GF19189@duo.random> References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> Message-ID: On Wed, 2 Apr 2008, Andrea Arcangeli wrote: > There are much bigger issues besides the rcu safety in this patch, > proper aging of the secondary mmu through access bits set by hardware > is unfixable with this model (you would need to do age |= > e->callback), which is the proof of why this isn't flexibile enough by > forcing the same parameter and retvals for all methods. No idea why > you go for such inferior solution that will never get the aging right > and will likely fall apart if we add more methods in the future. There is always the possibility to add special functions in the same way as done in the mmu notifier series if it really becomes necessary. EMM does in no way preclude that. Here f.e. We can add a special emm_age() function that iterates differently and does the | for you. > For example the "switch" you have to add in > xpmem_emm_notifier_callback doesn't look good, at least gcc may be > able to optimize it with an array indexing simulating proper pointer > to function like in #v9. Actually the switch looks really good because it allows code to run for all callbacks like f.e. xpmem_tg_ref(). Otherwise the refcounting code would have to be added to each callback. > > Most other patches will apply cleanly on top of my coming mmu > notifiers #v10 that I hope will go in -mm. > > For #v10 the only two left open issues to discuss are: Did I see #v10? Could you start a new subject when you post please? Do not respond to some old message otherwise the threading will be wrong. > methods will be correctly replied allowing GRU not to corrupt > memory after the registration method. EMM would also need a fix > like this for GRU to be safe on top of EMM. How exactly does the GRU corrupt memory? > Another less obviously safe approach is to allow the register > method to succeed only when mm_users=1 and the task is single > threaded. This way if all the places where the mmu notifers aren't > invoked on the mm not by the current task, are only doing > invalidates after/before zapping ptes, if the istantiation of new > ptes is single threaded too, we shouldn't worry if we miss an > invalidate for a pte that is zero and doesn't point to any physical > page. In the places where current->mm != mm I'm using > invalidate_page 99% of the time, and that only follows the > ptep_clear_flush. The problem are the range_begin that will happen > before zapping the pte in places where current->mm != > mm. Unfortunately in my incremental patch where I move all > invalidate_page outside of the PT lock to prepare for allowing > sleeping inside the mmu notifiers, I used range_begin/end in places > like try_to_unmap_cluster where current->mm != mm. In general > this solution looks more fragile than the seqlock. Hmmm... Okay that is one solution that would just require a BUG_ON in the registration methods. > 2) I'm uncertain how the driver can handle a range_end called before > range_begin. Also multiple range_begin can happen in parallel later > followed by range_end, so if there's a global seqlock that > serializes the secondary mmu page fault, that will screwup (you > can't seqlock_write in range_begin and sequnlock_write in > range_end). The write side of the seqlock must be serialized and > calling seqlock_write twice in a row before any sequnlock operation > will break. Well doesnt the requirement of just one execution thread also deal with that issue? From clameter at sgi.com Wed Apr 2 11:15:26 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 2 Apr 2008 11:15:26 -0700 (PDT) Subject: [ofa-general] Re: [patch 5/9] Convert anon_vma lock to rw_sem and refcount In-Reply-To: <20080402175058.GR19189@duo.random> References: <20080401205531.986291575@sgi.com> <20080401205636.777127252@sgi.com> <20080402175058.GR19189@duo.random> Message-ID: On Wed, 2 Apr 2008, Andrea Arcangeli wrote: > On Tue, Apr 01, 2008 at 01:55:36PM -0700, Christoph Lameter wrote: > > This results in f.e. the Aim9 brk performance test to got down by 10-15%. > > I guess it's more likely because of overscheduling for small crtitical > sections, did you counted the total number of context switches? I > guess there will be a lot more with your patch applied. That > regression is a showstopper and it is the reason why I've suggested > before to add a CONFIG_XPMEM or CONFIG_MMU_NOTIFIER_SLEEP config > option to make the VM locks sleep capable only when XPMEM=y > (PREEMPT_RT will enable it too). Thanks for doing the benchmark work! There are more context switches if locks are contended. But that has actually also some good aspects because we avoid busy loops and can potentially continue work in another process. From clameter at sgi.com Wed Apr 2 12:03:50 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 2 Apr 2008 12:03:50 -0700 (PDT) Subject: [ofa-general] EMM: Fixup return value handling of emm_notify() In-Reply-To: References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> Message-ID: On Wed, 2 Apr 2008, Christoph Lameter wrote: > Here f.e. We can add a special emm_age() function that iterates > differently and does the | for you. Well maybe not really necessary. How about this fix? Its likely a problem to stop callbacks if one callback returned an error. Subject: EMM: Fixup return value handling of emm_notify() Right now we stop calling additional subsystems if one callback returned an error. That has the potential for causing additional trouble with the subsystems that do not receive the callbacks they expect if one has failed. So change the handling of error code to continue callbacks to other subsystems but return the first error code encountered. If a callback returns a positive return value then add up all the value from all the calls. That can be used to establish how many references exist (xpmem may want this feature at some point) or ensure that the aging works the way Andrea wants it to (KVM, XPmem so far do not care too much). Signed-off-by: Christoph Lameter --- mm/rmap.c | 28 +++++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-) Index: linux-2.6/mm/rmap.c =================================================================== --- linux-2.6.orig/mm/rmap.c 2008-04-02 11:46:20.738342852 -0700 +++ linux-2.6/mm/rmap.c 2008-04-02 12:03:57.672494320 -0700 @@ -299,27 +299,45 @@ void emm_notifier_register(struct emm_no } EXPORT_SYMBOL_GPL(emm_notifier_register); -/* Perform a callback */ +/* + * Perform a callback + * + * The return of this function is either a negative error of the first + * callback that failed or a consolidated count of all the positive + * values that were returned by the callbacks. + */ int __emm_notify(struct mm_struct *mm, enum emm_operation op, unsigned long start, unsigned long end) { struct emm_notifier *e = rcu_dereference(mm->emm_notifier); int x; + int result = 0; while (e) { - if (e->callback) { x = e->callback(e, mm, op, start, end); - if (x) - return x; + + /* + * Callback may return a positive value to indicate a count + * or a negative error code. We keep the first error code + * but continue to perform callbacks to other subscribed + * subsystems. + */ + if (x && result >= 0) { + if (x >= 0) + result += x; + else + result = x; + } } + /* * emm_notifier contents (e) must be fetched after * the retrival of the pointer to the notifier. */ e = rcu_dereference(e->next); } - return 0; + return result; } EXPORT_SYMBOL_GPL(__emm_notify); #endif From clameter at sgi.com Wed Apr 2 14:05:28 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 2 Apr 2008 14:05:28 -0700 (PDT) Subject: [ofa-general] EMM: Require single threadedness for registration. In-Reply-To: References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> Message-ID: Here is a patch to require single threaded execution during emm_register. This also allows an easy implementation of an unregister function and gets rid of the races that Andrea worried about. The approach here is similar to what was used in selinux for security context changes (see selinux_setprocattr). Is it okay for the users of emm to require single threadedness for registration? Subject: EMM: Require single threaded execution for register and unregister We can avoid the concurrency issues arising at registration if we only allow registration of notifiers when the process has only a single thread. That even allows to avoid the use of rcu. Signed-off-by: Christoph Lameter --- mm/rmap.c | 46 +++++++++++++++++++++++++++++++++++++--------- 1 file changed, 37 insertions(+), 9 deletions(-) Index: linux-2.6/mm/rmap.c =================================================================== --- linux-2.6.orig/mm/rmap.c 2008-04-02 13:53:46.002473685 -0700 +++ linux-2.6/mm/rmap.c 2008-04-02 14:03:05.872199896 -0700 @@ -286,20 +286,48 @@ void emm_notifier_release(struct mm_stru } } -/* Register a notifier */ +/* + * Register a notifier + * + * mmap_sem is held writably. + * + * Process must be single threaded. + */ void emm_notifier_register(struct emm_notifier *e, struct mm_struct *mm) { + BUG_ON(atomic_read(&mm->mm_users) != 1); + e->next = mm->emm_notifier; - /* - * The update to emm_notifier (e->next) must be visible - * before the pointer becomes visible. - * rcu_assign_pointer() does exactly what we need. - */ - rcu_assign_pointer(mm->emm_notifier, e); + mm->emm_notifier = e; } EXPORT_SYMBOL_GPL(emm_notifier_register); /* + * Unregister a notifier + * + * mmap_sem is held writably + * + * Process must be single threaded + */ +void emm_notifier_unregister(struct emm_notifier *e, struct mm_struct *mm) +{ + struct emm_notifier *p = mm->emm_notifier; + + BUG_ON(atomic_read(&mm->mm_users) != 1); + + if (e == p) + mm->emm_notifier = e->next; + else { + while (p->next != e) + p = p->next; + + p->next = e->next; + } + e->callback(e, mm, emm_release, 0, TASK_SIZE); +} +EXPORT_SYMBOL_GPL(emm_notifier_unregister); + +/* * Perform a callback * * The return of this function is either a negative error of the first @@ -309,7 +337,7 @@ EXPORT_SYMBOL_GPL(emm_notifier_register) int __emm_notify(struct mm_struct *mm, enum emm_operation op, unsigned long start, unsigned long end) { - struct emm_notifier *e = rcu_dereference(mm->emm_notifier); + struct emm_notifier *e = mm->emm_notifier; int x; int result = 0; @@ -335,7 +363,7 @@ int __emm_notify(struct mm_struct *mm, e * emm_notifier contents (e) must be fetched after * the retrival of the pointer to the notifier. */ - e = rcu_dereference(e->next); + e = e->next; } return result; } From andrea at qumranet.com Wed Apr 2 14:25:15 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 2 Apr 2008 23:25:15 +0200 Subject: [ofa-general] Re: EMM: Fixup return value handling of emm_notify() In-Reply-To: References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> Message-ID: <20080402212515.GS19189@duo.random> On Wed, Apr 02, 2008 at 12:03:50PM -0700, Christoph Lameter wrote: > + /* > + * Callback may return a positive value to indicate a count > + * or a negative error code. We keep the first error code > + * but continue to perform callbacks to other subscribed > + * subsystems. > + */ > + if (x && result >= 0) { > + if (x >= 0) > + result += x; > + else > + result = x; > + } > } > + Now think of when one of the kernel janitors will micro-optimize PG_dirty to be returned by invalidate_page so a single set_page_dirty will be invoked... Keep in mind this is a kernel internal APIs, ask Greg if we can change it in order to optimize later in the future. I think my #v9 is optimal enough while being simple at the same time, but anyway it's silly to be hardwired to such an interface that worst of all requires switch statements instead of proper pointer to functions and a fixed set of parameters and retval semantics for all methods. From clameter at sgi.com Wed Apr 2 14:33:51 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 2 Apr 2008 14:33:51 -0700 (PDT) Subject: [ofa-general] Re: EMM: Fixup return value handling of emm_notify() In-Reply-To: <20080402212515.GS19189@duo.random> References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> <20080402212515.GS19189@duo.random> Message-ID: On Wed, 2 Apr 2008, Andrea Arcangeli wrote: > but anyway it's silly to be hardwired to such an interface that worst > of all requires switch statements instead of proper pointer to > functions and a fixed set of parameters and retval semantics for all > methods. The EMM API with a single callback is the simplest approach at this point. A common callback for all operations allows the driver to implement common entry and exit code as seen in XPMem. I guess we can complicate this more by switching to a different API or adding additional emm_xxx() callback if need be but I really want to have a strong case for why this would be needed. There is the danger of adding frills with special callbacks in this and that situation that could make the notifier complicated and specific to a certain usage scenario. Having this generic simple interface will hopefully avoid such things. From andrea at qumranet.com Wed Apr 2 14:30:01 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 02 Apr 2008 23:30:01 +0200 Subject: [ofa-general] [PATCH 0 of 8] mmu notifiers #v10 Message-ID: Hello, this is the mmu notifier #v10. Patches 1 and 2 are the only difference between this and EMM V2. The rest is the same as with Christoph's patches. I think maximum priority should be given in merging patch 1 and 2 into -mm and ASAP in mainline. Patches from 3 to 8 can go in -mm for testing but I'm not sure if we should support sleep capable notifiers in mainline unless we make the VM locking conditional to avoid overscheduling for extremely small critical sections in the common case. I only rediffed Christoph's patches on top of the mmu notifier patches. KVM current plans are to heavily depend on mmu notifiers for swapping, to optimize the spte faults, and we need it for smp guest ballooning with madvise(DONT_NEED) and other optimizations and features. Patches from 3 to 8 are Christoph's work ported on top of #v10 to make the #v10 mmu notifiers sleep capable (at least supposedly). I didn't test the scheduling, but I assume you'll quickly test XPMEM on top of this. From andrea at qumranet.com Wed Apr 2 14:30:02 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 02 Apr 2008 23:30:02 +0200 Subject: [ofa-general] [PATCH 1 of 8] Core of mmu notifiers In-Reply-To: Message-ID: # HG changeset patch # User Andrea Arcangeli # Date 1207158873 -7200 # Node ID a406c0cc686d0ca94a4d890d661cdfa48cfba09f # Parent 249e077dc932a5322e04ac1d69326622ea4023b8 Core of mmu notifiers. Signed-off-by: Andrea Arcangeli Signed-off-by: Nick Piggin Signed-off-by: Christoph Lameter diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -10,6 +10,7 @@ #include #include #include +#include #include #include @@ -225,6 +226,10 @@ #ifdef CONFIG_CGROUP_MEM_RES_CTLR struct mem_cgroup *mem_cgroup; #endif +#ifdef CONFIG_MMU_NOTIFIER + struct hlist_head mmu_notifier_list; + seqlock_t mmu_notifier_lock; +#endif }; #endif /* _LINUX_MM_TYPES_H */ diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h new file mode 100644 --- /dev/null +++ b/include/linux/mmu_notifier.h @@ -0,0 +1,181 @@ +#ifndef _LINUX_MMU_NOTIFIER_H +#define _LINUX_MMU_NOTIFIER_H + +#include +#include +#include + +struct mmu_notifier; +struct mmu_notifier_ops; + +#ifdef CONFIG_MMU_NOTIFIER + +struct mmu_notifier_ops { + /* + * Called when nobody can register any more notifier in the mm + * and after the "mn" notifier has been disarmed already. + */ + void (*release)(struct mmu_notifier *mn, + struct mm_struct *mm); + + /* + * clear_flush_young is called after the VM is + * test-and-clearing the young/accessed bitflag in the + * pte. This way the VM will provide proper aging to the + * accesses to the page through the secondary MMUs and not + * only to the ones through the Linux pte. + */ + int (*clear_flush_young)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long address); + + /* + * Before this is invoked any secondary MMU is still ok to + * read/write to the page previously pointed by the Linux pte + * because the old page hasn't been freed yet. If required + * set_page_dirty has to be called internally to this method. + */ + void (*invalidate_page)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long address); + + /* + * invalidate_range_start() and invalidate_range_end() must be + * paired. Multiple invalidate_range_start/ends may be nested + * or called concurrently. + */ + void (*invalidate_range_start)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, unsigned long end); + void (*invalidate_range_end)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, unsigned long end); +}; + +struct mmu_notifier { + struct hlist_node hlist; + const struct mmu_notifier_ops *ops; +}; + +static inline int mm_has_notifiers(struct mm_struct *mm) +{ + return unlikely(!hlist_empty(&mm->mmu_notifier_list)); +} + +/* + * Must hold the mmap_sem for write. + * + * RCU is used to traverse the list. + */ +extern void mmu_notifier_register(struct mmu_notifier *mn, + struct mm_struct *mm); +extern void __mmu_notifier_release(struct mm_struct *mm); +extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address); +extern void __mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address); +extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm, + unsigned long start, unsigned long end); +extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end); + + +static inline void mmu_notifier_release(struct mm_struct *mm) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_release(mm); +} + +static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address) +{ + if (mm_has_notifiers(mm)) + return __mmu_notifier_clear_flush_young(mm, address); + return 0; +} + +static inline void mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_invalidate_page(mm, address); +} + +static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_invalidate_range_start(mm, start, end); +} + +static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_invalidate_range_end(mm, start, end); +} + +static inline void mmu_notifier_mm_init(struct mm_struct *mm) +{ + INIT_HLIST_HEAD(&mm->mmu_notifier_list); + seqlock_init(&mm->mmu_notifier_lock); +} + +#define ptep_clear_flush_notify(__vma, __address, __ptep) \ +({ \ + pte_t __pte; \ + struct vm_area_struct *___vma = __vma; \ + unsigned long ___address = __address; \ + __pte = ptep_clear_flush(___vma, ___address, __ptep); \ + mmu_notifier_invalidate_page(___vma->vm_mm, ___address); \ + __pte; \ +}) + +#define ptep_clear_flush_young_notify(__vma, __address, __ptep) \ +({ \ + int __young; \ + struct vm_area_struct *___vma = __vma; \ + unsigned long ___address = __address; \ + __young = ptep_clear_flush_young(___vma, ___address, __ptep); \ + __young |= mmu_notifier_clear_flush_young(___vma->vm_mm, \ + ___address); \ + __young; \ +}) + +#else /* CONFIG_MMU_NOTIFIER */ + +static inline void mmu_notifier_release(struct mm_struct *mm) +{ +} + +static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address) +{ + return 0; +} + +static inline void mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address) +{ +} + +static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ +} + +static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ +} + +static inline void mmu_notifier_mm_init(struct mm_struct *mm) +{ +} + +#define ptep_clear_flush_young_notify ptep_clear_flush_young +#define ptep_clear_flush_notify ptep_clear_flush + +#endif /* CONFIG_MMU_NOTIFIER */ + +#endif /* _LINUX_MMU_NOTIFIER_H */ diff --git a/kernel/fork.c b/kernel/fork.c --- a/kernel/fork.c +++ b/kernel/fork.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include @@ -362,6 +363,7 @@ if (likely(!mm_alloc_pgd(mm))) { mm->def_flags = 0; + mmu_notifier_mm_init(mm); return mm; } diff --git a/mm/Kconfig b/mm/Kconfig --- a/mm/Kconfig +++ b/mm/Kconfig @@ -193,3 +193,7 @@ config VIRT_TO_BUS def_bool y depends on !ARCH_NO_VIRT_TO_BUS + +config MMU_NOTIFIER + def_bool y + bool "MMU notifier, for paging KVM/RDMA" diff --git a/mm/Makefile b/mm/Makefile --- a/mm/Makefile +++ b/mm/Makefile @@ -33,4 +33,4 @@ obj-$(CONFIG_SMP) += allocpercpu.o obj-$(CONFIG_QUICKLIST) += quicklist.o obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o - +obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c --- a/mm/filemap_xip.c +++ b/mm/filemap_xip.c @@ -194,7 +194,7 @@ if (pte) { /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); - pteval = ptep_clear_flush(vma, address, pte); + pteval = ptep_clear_flush_notify(vma, address, pte); page_remove_rmap(page, vma); dec_mm_counter(mm, file_rss); BUG_ON(pte_dirty(pteval)); diff --git a/mm/fremap.c b/mm/fremap.c --- a/mm/fremap.c +++ b/mm/fremap.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -214,7 +215,9 @@ spin_unlock(&mapping->i_mmap_lock); } + mmu_notifier_invalidate_range_start(mm, start, start + size); err = populate_range(mm, vma, start, size, pgoff); + mmu_notifier_invalidate_range_end(mm, start, start + size); if (!err && !(flags & MAP_NONBLOCK)) { if (unlikely(has_write_lock)) { downgrade_write(&mm->mmap_sem); diff --git a/mm/hugetlb.c b/mm/hugetlb.c --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include @@ -799,6 +800,7 @@ BUG_ON(start & ~HPAGE_MASK); BUG_ON(end & ~HPAGE_MASK); + mmu_notifier_invalidate_range_start(mm, start, end); spin_lock(&mm->page_table_lock); for (address = start; address < end; address += HPAGE_SIZE) { ptep = huge_pte_offset(mm, address); @@ -819,6 +821,7 @@ } spin_unlock(&mm->page_table_lock); flush_tlb_range(vma, start, end); + mmu_notifier_invalidate_range_end(mm, start, end); list_for_each_entry_safe(page, tmp, &page_list, lru) { list_del(&page->lru); put_page(page); diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -51,6 +51,7 @@ #include #include #include +#include #include #include @@ -611,6 +612,9 @@ if (is_vm_hugetlb_page(vma)) return copy_hugetlb_page_range(dst_mm, src_mm, vma); + if (is_cow_mapping(vma->vm_flags)) + mmu_notifier_invalidate_range_start(src_mm, addr, end); + dst_pgd = pgd_offset(dst_mm, addr); src_pgd = pgd_offset(src_mm, addr); do { @@ -621,6 +625,11 @@ vma, addr, next)) return -ENOMEM; } while (dst_pgd++, src_pgd++, addr = next, addr != end); + + if (is_cow_mapping(vma->vm_flags)) + mmu_notifier_invalidate_range_end(src_mm, + vma->vm_start, end); + return 0; } @@ -897,7 +906,9 @@ lru_add_drain(); tlb = tlb_gather_mmu(mm, 0); update_hiwater_rss(mm); + mmu_notifier_invalidate_range_start(mm, address, end); end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details); + mmu_notifier_invalidate_range_end(mm, address, end); if (tlb) tlb_finish_mmu(tlb, address, end); return end; @@ -1463,10 +1474,11 @@ { pgd_t *pgd; unsigned long next; - unsigned long end = addr + size; + unsigned long start = addr, end = addr + size; int err; BUG_ON(addr >= end); + mmu_notifier_invalidate_range_start(mm, start, end); pgd = pgd_offset(mm, addr); do { next = pgd_addr_end(addr, end); @@ -1474,6 +1486,7 @@ if (err) break; } while (pgd++, addr = next, addr != end); + mmu_notifier_invalidate_range_end(mm, start, end); return err; } EXPORT_SYMBOL_GPL(apply_to_page_range); @@ -1675,7 +1688,7 @@ * seen in the presence of one thread doing SMC and another * thread doing COW. */ - ptep_clear_flush(vma, address, page_table); + ptep_clear_flush_notify(vma, address, page_table); set_pte_at(mm, address, page_table, entry); update_mmu_cache(vma, address, entry); lru_cache_add_active(new_page); diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -1747,11 +1748,13 @@ lru_add_drain(); tlb = tlb_gather_mmu(mm, 0); update_hiwater_rss(mm); + mmu_notifier_invalidate_range_start(mm, start, end); unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL); vm_unacct_memory(nr_accounted); free_pgtables(&tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS, next? next->vm_start: 0); tlb_finish_mmu(tlb, start, end); + mmu_notifier_invalidate_range_end(mm, start, end); } /* @@ -2037,6 +2040,7 @@ unsigned long end; /* mm's last user has gone, and its about to be pulled down */ + mmu_notifier_release(mm); arch_exit_mmap(mm); lru_add_drain(); diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c new file mode 100644 --- /dev/null +++ b/mm/mmu_notifier.c @@ -0,0 +1,121 @@ +/* + * linux/mm/mmu_notifier.c + * + * Copyright (C) 2008 Qumranet, Inc. + * Copyright (C) 2008 SGI + * Christoph Lameter + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ + +#include +#include +#include + +/* + * No synchronization. This function can only be called when only a single + * process remains that performs teardown. + */ +void __mmu_notifier_release(struct mm_struct *mm) +{ + struct mmu_notifier *mn; + unsigned seq; + + seq = read_seqbegin(&mm->mmu_notifier_lock); + while (unlikely(!hlist_empty(&mm->mmu_notifier_list))) { + mn = hlist_entry(mm->mmu_notifier_list.first, + struct mmu_notifier, + hlist); + hlist_del(&mn->hlist); + if (mn->ops->release) + mn->ops->release(mn, mm); + BUG_ON(read_seqretry(&mm->mmu_notifier_lock, seq)); + } +} + +/* + * If no young bitflag is supported by the hardware, ->clear_flush_young can + * unmap the address and return 1 or 0 depending if the mapping previously + * existed or not. + */ +int __mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + int young = 0; + unsigned seq; + + seq = read_seqbegin(&mm->mmu_notifier_lock); + do { + hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) { + if (mn->ops->clear_flush_young) + young |= mn->ops->clear_flush_young(mn, mm, + address); + } + } while (read_seqretry(&mm->mmu_notifier_lock, seq)); + + return young; +} + +void __mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + unsigned seq; + + seq = read_seqbegin(&mm->mmu_notifier_lock); + do { + hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) { + if (mn->ops->invalidate_page) + mn->ops->invalidate_page(mn, mm, address); + } + } while (read_seqretry(&mm->mmu_notifier_lock, seq)); +} + +void __mmu_notifier_invalidate_range_start(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + unsigned seq; + + seq = read_seqbegin(&mm->mmu_notifier_lock); + do { + hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) { + if (mn->ops->invalidate_range_start) + mn->ops->invalidate_range_start(mn, mm, + start, end); + } + } while (read_seqretry(&mm->mmu_notifier_lock, seq)); +} + +void __mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + unsigned seq; + + seq = read_seqbegin(&mm->mmu_notifier_lock); + do { + hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) { + if (mn->ops->invalidate_range_end) + mn->ops->invalidate_range_end(mn, mm, + start, end); + } + } while (read_seqretry(&mm->mmu_notifier_lock, seq)); +} + +/* + * Must hold mmap_sem writably when calling registration functions. + */ +void mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm) +{ + write_seqlock(&mm->mmu_notifier_lock); + hlist_add_head_rcu(&mn->hlist, &mm->mmu_notifier_list); + write_sequnlock(&mm->mmu_notifier_lock); +} +EXPORT_SYMBOL_GPL(mmu_notifier_register); diff --git a/mm/mprotect.c b/mm/mprotect.c --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include #include @@ -198,10 +199,12 @@ dirty_accountable = 1; } + mmu_notifier_invalidate_range_start(mm, start, end); if (is_vm_hugetlb_page(vma)) hugetlb_change_protection(vma, start, end, vma->vm_page_prot); else change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable); + mmu_notifier_invalidate_range_end(mm, start, end); vm_stat_account(mm, oldflags, vma->vm_file, -nrpages); vm_stat_account(mm, newflags, vma->vm_file, nrpages); return 0; diff --git a/mm/mremap.c b/mm/mremap.c --- a/mm/mremap.c +++ b/mm/mremap.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -74,7 +75,11 @@ struct mm_struct *mm = vma->vm_mm; pte_t *old_pte, *new_pte, pte; spinlock_t *old_ptl, *new_ptl; + unsigned long old_start; + old_start = old_addr; + mmu_notifier_invalidate_range_start(vma->vm_mm, + old_start, old_end); if (vma->vm_file) { /* * Subtle point from Rajesh Venkatasubramanian: before @@ -116,6 +121,7 @@ pte_unmap_unlock(old_pte - 1, old_ptl); if (mapping) spin_unlock(&mapping->i_mmap_lock); + mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end); } #define LATENCY_LIMIT (64 * PAGE_SIZE) diff --git a/mm/rmap.c b/mm/rmap.c --- a/mm/rmap.c +++ b/mm/rmap.c @@ -49,6 +49,7 @@ #include #include #include +#include #include @@ -287,7 +288,7 @@ if (vma->vm_flags & VM_LOCKED) { referenced++; *mapcount = 1; /* break early from loop */ - } else if (ptep_clear_flush_young(vma, address, pte)) + } else if (ptep_clear_flush_young_notify(vma, address, pte)) referenced++; /* Pretend the page is referenced if the task has the @@ -456,7 +457,7 @@ pte_t entry; flush_cache_page(vma, address, pte_pfn(*pte)); - entry = ptep_clear_flush(vma, address, pte); + entry = ptep_clear_flush_notify(vma, address, pte); entry = pte_wrprotect(entry); entry = pte_mkclean(entry); set_pte_at(mm, address, pte, entry); @@ -717,14 +718,14 @@ * skipped over this mm) then we should reactivate it. */ if (!migration && ((vma->vm_flags & VM_LOCKED) || - (ptep_clear_flush_young(vma, address, pte)))) { + (ptep_clear_flush_young_notify(vma, address, pte)))) { ret = SWAP_FAIL; goto out_unmap; } /* Nuke the page table entry. */ flush_cache_page(vma, address, page_to_pfn(page)); - pteval = ptep_clear_flush(vma, address, pte); + pteval = ptep_clear_flush_notify(vma, address, pte); /* Move the dirty bit to the physical page now the pte is gone. */ if (pte_dirty(pteval)) @@ -849,12 +850,12 @@ page = vm_normal_page(vma, address, *pte); BUG_ON(!page || PageAnon(page)); - if (ptep_clear_flush_young(vma, address, pte)) + if (ptep_clear_flush_young_notify(vma, address, pte)) continue; /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); - pteval = ptep_clear_flush(vma, address, pte); + pteval = ptep_clear_flush_notify(vma, address, pte); /* If nonlinear, store the file page offset in the pte. */ if (page->index != linear_page_index(vma, address)) From andrea at qumranet.com Wed Apr 2 14:30:03 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 02 Apr 2008 23:30:03 +0200 Subject: [ofa-general] [PATCH 2 of 8] Moves all mmu notifier methods outside the PT lock (first and not last In-Reply-To: Message-ID: # HG changeset patch # User Andrea Arcangeli # Date 1207159010 -7200 # Node ID fe00cb9deeb31467396370c835cb808f4b85209a # Parent a406c0cc686d0ca94a4d890d661cdfa48cfba09f Moves all mmu notifier methods outside the PT lock (first and not last step to make them sleep capable). Signed-off-by: Andrea Arcangeli diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -121,27 +121,6 @@ seqlock_init(&mm->mmu_notifier_lock); } -#define ptep_clear_flush_notify(__vma, __address, __ptep) \ -({ \ - pte_t __pte; \ - struct vm_area_struct *___vma = __vma; \ - unsigned long ___address = __address; \ - __pte = ptep_clear_flush(___vma, ___address, __ptep); \ - mmu_notifier_invalidate_page(___vma->vm_mm, ___address); \ - __pte; \ -}) - -#define ptep_clear_flush_young_notify(__vma, __address, __ptep) \ -({ \ - int __young; \ - struct vm_area_struct *___vma = __vma; \ - unsigned long ___address = __address; \ - __young = ptep_clear_flush_young(___vma, ___address, __ptep); \ - __young |= mmu_notifier_clear_flush_young(___vma->vm_mm, \ - ___address); \ - __young; \ -}) - #else /* CONFIG_MMU_NOTIFIER */ static inline void mmu_notifier_release(struct mm_struct *mm) @@ -173,9 +152,6 @@ { } -#define ptep_clear_flush_young_notify ptep_clear_flush_young -#define ptep_clear_flush_notify ptep_clear_flush - #endif /* CONFIG_MMU_NOTIFIER */ #endif /* _LINUX_MMU_NOTIFIER_H */ diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c --- a/mm/filemap_xip.c +++ b/mm/filemap_xip.c @@ -194,11 +194,13 @@ if (pte) { /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); - pteval = ptep_clear_flush_notify(vma, address, pte); + pteval = ptep_clear_flush(vma, address, pte); page_remove_rmap(page, vma); dec_mm_counter(mm, file_rss); BUG_ON(pte_dirty(pteval)); pte_unmap_unlock(pte, ptl); + /* must invalidate_page _before_ freeing the page */ + mmu_notifier_invalidate_page(mm, address); page_cache_release(page); } } diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -1626,9 +1626,10 @@ */ page_table = pte_offset_map_lock(mm, pmd, address, &ptl); - page_cache_release(old_page); + new_page = NULL; if (!pte_same(*page_table, orig_pte)) goto unlock; + page_cache_release(old_page); page_mkwrite = 1; } @@ -1644,6 +1645,7 @@ if (ptep_set_access_flags(vma, address, page_table, entry,1)) update_mmu_cache(vma, address, entry); ret |= VM_FAULT_WRITE; + old_page = new_page = NULL; goto unlock; } @@ -1688,7 +1690,7 @@ * seen in the presence of one thread doing SMC and another * thread doing COW. */ - ptep_clear_flush_notify(vma, address, page_table); + ptep_clear_flush(vma, address, page_table); set_pte_at(mm, address, page_table, entry); update_mmu_cache(vma, address, entry); lru_cache_add_active(new_page); @@ -1700,12 +1702,18 @@ } else mem_cgroup_uncharge_page(new_page); - if (new_page) +unlock: + pte_unmap_unlock(page_table, ptl); + + if (new_page) { + if (new_page == old_page) + /* cow happened, notify before releasing old_page */ + mmu_notifier_invalidate_page(mm, address); page_cache_release(new_page); + } if (old_page) page_cache_release(old_page); -unlock: - pte_unmap_unlock(page_table, ptl); + if (dirty_page) { if (vma->vm_file) file_update_time(vma->vm_file); diff --git a/mm/rmap.c b/mm/rmap.c --- a/mm/rmap.c +++ b/mm/rmap.c @@ -275,7 +275,7 @@ unsigned long address; pte_t *pte; spinlock_t *ptl; - int referenced = 0; + int referenced = 0, clear_flush_young = 0; address = vma_address(page, vma); if (address == -EFAULT) @@ -288,8 +288,11 @@ if (vma->vm_flags & VM_LOCKED) { referenced++; *mapcount = 1; /* break early from loop */ - } else if (ptep_clear_flush_young_notify(vma, address, pte)) - referenced++; + } else { + clear_flush_young = 1; + if (ptep_clear_flush_young(vma, address, pte)) + referenced++; + } /* Pretend the page is referenced if the task has the swap token and is in the middle of a page fault. */ @@ -299,6 +302,10 @@ (*mapcount)--; pte_unmap_unlock(pte, ptl); + + if (clear_flush_young) + referenced += mmu_notifier_clear_flush_young(mm, address); + out: return referenced; } @@ -457,7 +464,7 @@ pte_t entry; flush_cache_page(vma, address, pte_pfn(*pte)); - entry = ptep_clear_flush_notify(vma, address, pte); + entry = ptep_clear_flush(vma, address, pte); entry = pte_wrprotect(entry); entry = pte_mkclean(entry); set_pte_at(mm, address, pte, entry); @@ -465,6 +472,10 @@ } pte_unmap_unlock(pte, ptl); + + if (ret) + mmu_notifier_invalidate_page(mm, address); + out: return ret; } @@ -717,15 +728,14 @@ * If it's recently referenced (perhaps page_referenced * skipped over this mm) then we should reactivate it. */ - if (!migration && ((vma->vm_flags & VM_LOCKED) || - (ptep_clear_flush_young_notify(vma, address, pte)))) { + if (!migration && (vma->vm_flags & VM_LOCKED)) { ret = SWAP_FAIL; goto out_unmap; } /* Nuke the page table entry. */ flush_cache_page(vma, address, page_to_pfn(page)); - pteval = ptep_clear_flush_notify(vma, address, pte); + pteval = ptep_clear_flush(vma, address, pte); /* Move the dirty bit to the physical page now the pte is gone. */ if (pte_dirty(pteval)) @@ -780,6 +790,8 @@ out_unmap: pte_unmap_unlock(pte, ptl); + if (ret != SWAP_FAIL) + mmu_notifier_invalidate_page(mm, address); out: return ret; } @@ -818,7 +830,7 @@ spinlock_t *ptl; struct page *page; unsigned long address; - unsigned long end; + unsigned long start, end; address = (vma->vm_start + cursor) & CLUSTER_MASK; end = address + CLUSTER_SIZE; @@ -839,6 +851,8 @@ if (!pmd_present(*pmd)) return; + start = address; + mmu_notifier_invalidate_range_start(mm, start, end); pte = pte_offset_map_lock(mm, pmd, address, &ptl); /* Update high watermark before we lower rss */ @@ -850,12 +864,12 @@ page = vm_normal_page(vma, address, *pte); BUG_ON(!page || PageAnon(page)); - if (ptep_clear_flush_young_notify(vma, address, pte)) + if (ptep_clear_flush_young(vma, address, pte)) continue; /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); - pteval = ptep_clear_flush_notify(vma, address, pte); + pteval = ptep_clear_flush(vma, address, pte); /* If nonlinear, store the file page offset in the pte. */ if (page->index != linear_page_index(vma, address)) @@ -871,6 +885,7 @@ (*mapcount)--; } pte_unmap_unlock(pte - 1, ptl); + mmu_notifier_invalidate_range_end(mm, start, end); } static int try_to_unmap_anon(struct page *page, int migration) From andrea at qumranet.com Wed Apr 2 14:30:04 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 02 Apr 2008 23:30:04 +0200 Subject: [ofa-general] [PATCH 3 of 8] Move the tlb flushing into free_pgtables. The conversion of the locks In-Reply-To: Message-ID: # HG changeset patch # User Andrea Arcangeli # Date 1207159010 -7200 # Node ID d880c227ddf345f5d577839d36d150c37b653bfd # Parent fe00cb9deeb31467396370c835cb808f4b85209a Move the tlb flushing into free_pgtables. The conversion of the locks taken for reverse map scanning would require taking sleeping locks in free_pgtables(). Moving the tlb flushing into free_pgtables allows sleeping in parts of free_pgtables(). This means that we do a tlb_finish_mmu() before freeing the page tables. Strictly speaking there may not be the need to do another tlb flush after freeing the tables. But its the only way to free a series of page table pages from the tlb list. And we do not want to call into the page allocator for performance reasons. Aim9 numbers look okay after this patch. Signed-off-by: Christoph Lameter diff --git a/include/linux/mm.h b/include/linux/mm.h --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -751,8 +751,8 @@ void *private); void free_pgd_range(struct mmu_gather **tlb, unsigned long addr, unsigned long end, unsigned long floor, unsigned long ceiling); -void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *start_vma, - unsigned long floor, unsigned long ceiling); +void free_pgtables(struct vm_area_struct *start_vma, unsigned long floor, + unsigned long ceiling); int copy_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *vma); void unmap_mapping_range(struct address_space *mapping, diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -272,9 +272,11 @@ } while (pgd++, addr = next, addr != end); } -void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *vma, - unsigned long floor, unsigned long ceiling) +void free_pgtables(struct vm_area_struct *vma, unsigned long floor, + unsigned long ceiling) { + struct mmu_gather *tlb; + while (vma) { struct vm_area_struct *next = vma->vm_next; unsigned long addr = vma->vm_start; @@ -286,8 +288,10 @@ unlink_file_vma(vma); if (is_vm_hugetlb_page(vma)) { - hugetlb_free_pgd_range(tlb, addr, vma->vm_end, + tlb = tlb_gather_mmu(vma->vm_mm, 0); + hugetlb_free_pgd_range(&tlb, addr, vma->vm_end, floor, next? next->vm_start: ceiling); + tlb_finish_mmu(tlb, addr, vma->vm_end); } else { /* * Optimization: gather nearby vmas into one call down @@ -299,8 +303,10 @@ anon_vma_unlink(vma); unlink_file_vma(vma); } - free_pgd_range(tlb, addr, vma->vm_end, + tlb = tlb_gather_mmu(vma->vm_mm, 0); + free_pgd_range(&tlb, addr, vma->vm_end, floor, next? next->vm_start: ceiling); + tlb_finish_mmu(tlb, addr, vma->vm_end); } vma = next; } diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1751,9 +1751,9 @@ mmu_notifier_invalidate_range_start(mm, start, end); unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL); vm_unacct_memory(nr_accounted); - free_pgtables(&tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS, + tlb_finish_mmu(tlb, start, end); + free_pgtables(vma, prev? prev->vm_end: FIRST_USER_ADDRESS, next? next->vm_start: 0); - tlb_finish_mmu(tlb, start, end); mmu_notifier_invalidate_range_end(mm, start, end); } @@ -2050,8 +2050,8 @@ /* Use -1 here to ensure all VMAs in the mm are unmapped */ end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL); vm_unacct_memory(nr_accounted); - free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 0); tlb_finish_mmu(tlb, 0, end); + free_pgtables(vma, FIRST_USER_ADDRESS, 0); /* * Walk the list again, actually closing and freeing it, From andrea at qumranet.com Wed Apr 2 14:30:05 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 02 Apr 2008 23:30:05 +0200 Subject: [ofa-general] [PATCH 4 of 8] The conversion to a rwsem allows callbacks during rmap traversal In-Reply-To: Message-ID: <3c3787c496cab1fc590b.1207171805@duo.random> # HG changeset patch # User Andrea Arcangeli # Date 1207159011 -7200 # Node ID 3c3787c496cab1fc590ba3f97e7904bdfaab5375 # Parent d880c227ddf345f5d577839d36d150c37b653bfd The conversion to a rwsem allows callbacks during rmap traversal for files in a non atomic context. A rw style lock also allows concurrent walking of the reverse map. This is fairly straightforward if one removes pieces of the resched checking. [Restarting unmapping is an issue to be discussed]. This slightly increases Aim9 performance results on an 8p. Signed-off-by: Andrea Arcangeli Signed-off-by: Christoph Lameter diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c --- a/arch/x86/mm/hugetlbpage.c +++ b/arch/x86/mm/hugetlbpage.c @@ -69,7 +69,7 @@ if (!vma_shareable(vma, addr)) return; - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); vma_prio_tree_foreach(svma, &iter, &mapping->i_mmap, idx, idx) { if (svma == vma) continue; @@ -94,7 +94,7 @@ put_page(virt_to_page(spte)); spin_unlock(&mm->page_table_lock); out: - spin_unlock(&mapping->i_mmap_lock); + up_read(&mapping->i_mmap_sem); } /* diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -454,10 +454,10 @@ pgoff = offset >> PAGE_SHIFT; i_size_write(inode, offset); - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); if (!prio_tree_empty(&mapping->i_mmap)) hugetlb_vmtruncate_list(&mapping->i_mmap, pgoff); - spin_unlock(&mapping->i_mmap_lock); + up_read(&mapping->i_mmap_sem); truncate_hugepages(inode, offset); return 0; } diff --git a/fs/inode.c b/fs/inode.c --- a/fs/inode.c +++ b/fs/inode.c @@ -210,7 +210,7 @@ INIT_LIST_HEAD(&inode->i_devices); INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC); rwlock_init(&inode->i_data.tree_lock); - spin_lock_init(&inode->i_data.i_mmap_lock); + init_rwsem(&inode->i_data.i_mmap_sem); INIT_LIST_HEAD(&inode->i_data.private_list); spin_lock_init(&inode->i_data.private_lock); INIT_RAW_PRIO_TREE_ROOT(&inode->i_data.i_mmap); diff --git a/include/linux/fs.h b/include/linux/fs.h --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -503,7 +503,7 @@ unsigned int i_mmap_writable;/* count VM_SHARED mappings */ struct prio_tree_root i_mmap; /* tree of private and shared mappings */ struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */ - spinlock_t i_mmap_lock; /* protect tree, count, list */ + struct rw_semaphore i_mmap_sem; /* protect tree, count, list */ unsigned int truncate_count; /* Cover race condition with truncate */ unsigned long nrpages; /* number of total pages */ pgoff_t writeback_index;/* writeback starts here */ diff --git a/include/linux/mm.h b/include/linux/mm.h --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -716,7 +716,7 @@ struct address_space *check_mapping; /* Check page->mapping if set */ pgoff_t first_index; /* Lowest page->index to unmap */ pgoff_t last_index; /* Highest page->index to unmap */ - spinlock_t *i_mmap_lock; /* For unmap_mapping_range: */ + struct rw_semaphore *i_mmap_sem; /* For unmap_mapping_range: */ unsigned long truncate_count; /* Compare vm_truncate_count */ }; diff --git a/kernel/fork.c b/kernel/fork.c --- a/kernel/fork.c +++ b/kernel/fork.c @@ -274,12 +274,12 @@ atomic_dec(&inode->i_writecount); /* insert tmp into the share list, just after mpnt */ - spin_lock(&file->f_mapping->i_mmap_lock); + down_write(&file->f_mapping->i_mmap_sem); tmp->vm_truncate_count = mpnt->vm_truncate_count; flush_dcache_mmap_lock(file->f_mapping); vma_prio_tree_add(tmp, mpnt); flush_dcache_mmap_unlock(file->f_mapping); - spin_unlock(&file->f_mapping->i_mmap_lock); + up_write(&file->f_mapping->i_mmap_sem); } /* diff --git a/mm/filemap.c b/mm/filemap.c --- a/mm/filemap.c +++ b/mm/filemap.c @@ -61,16 +61,16 @@ /* * Lock ordering: * - * ->i_mmap_lock (vmtruncate) + * ->i_mmap_sem (vmtruncate) * ->private_lock (__free_pte->__set_page_dirty_buffers) * ->swap_lock (exclusive_swap_page, others) * ->mapping->tree_lock * * ->i_mutex - * ->i_mmap_lock (truncate->unmap_mapping_range) + * ->i_mmap_sem (truncate->unmap_mapping_range) * * ->mmap_sem - * ->i_mmap_lock + * ->i_mmap_sem * ->page_table_lock or pte_lock (various, mainly in memory.c) * ->mapping->tree_lock (arch-dependent flush_dcache_mmap_lock) * @@ -87,7 +87,7 @@ * ->sb_lock (fs/fs-writeback.c) * ->mapping->tree_lock (__sync_single_inode) * - * ->i_mmap_lock + * ->i_mmap_sem * ->anon_vma.lock (vma_adjust) * * ->anon_vma.lock diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c --- a/mm/filemap_xip.c +++ b/mm/filemap_xip.c @@ -184,7 +184,7 @@ if (!page) return; - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) { mm = vma->vm_mm; address = vma->vm_start + @@ -204,7 +204,7 @@ page_cache_release(page); } } - spin_unlock(&mapping->i_mmap_lock); + up_read(&mapping->i_mmap_sem); } /* diff --git a/mm/fremap.c b/mm/fremap.c --- a/mm/fremap.c +++ b/mm/fremap.c @@ -206,13 +206,13 @@ } goto out; } - spin_lock(&mapping->i_mmap_lock); + down_write(&mapping->i_mmap_sem); flush_dcache_mmap_lock(mapping); vma->vm_flags |= VM_NONLINEAR; vma_prio_tree_remove(vma, &mapping->i_mmap); vma_nonlinear_insert(vma, &mapping->i_mmap_nonlinear); flush_dcache_mmap_unlock(mapping); - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); } mmu_notifier_invalidate_range_start(mm, start, start + size); diff --git a/mm/hugetlb.c b/mm/hugetlb.c --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -790,7 +790,7 @@ struct page *page; struct page *tmp; /* - * A page gathering list, protected by per file i_mmap_lock. The + * A page gathering list, protected by per file i_mmap_sem. The * lock is used to avoid list corruption from multiple unmapping * of the same page since we are using page->lru. */ @@ -840,9 +840,9 @@ * do nothing in this case. */ if (vma->vm_file) { - spin_lock(&vma->vm_file->f_mapping->i_mmap_lock); + down_write(&vma->vm_file->f_mapping->i_mmap_sem); __unmap_hugepage_range(vma, start, end); - spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock); + up_write(&vma->vm_file->f_mapping->i_mmap_sem); } } @@ -1085,7 +1085,7 @@ BUG_ON(address >= end); flush_cache_range(vma, address, end); - spin_lock(&vma->vm_file->f_mapping->i_mmap_lock); + down_write(&vma->vm_file->f_mapping->i_mmap_sem); spin_lock(&mm->page_table_lock); for (; address < end; address += HPAGE_SIZE) { ptep = huge_pte_offset(mm, address); @@ -1100,7 +1100,7 @@ } } spin_unlock(&mm->page_table_lock); - spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock); + up_write(&vma->vm_file->f_mapping->i_mmap_sem); flush_tlb_range(vma, start, end); } diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -838,7 +838,6 @@ unsigned long tlb_start = 0; /* For tlb_finish_mmu */ int tlb_start_valid = 0; unsigned long start = start_addr; - spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL; int fullmm = (*tlbp)->fullmm; for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) { @@ -875,22 +874,12 @@ } tlb_finish_mmu(*tlbp, tlb_start, start); - - if (need_resched() || - (i_mmap_lock && spin_needbreak(i_mmap_lock))) { - if (i_mmap_lock) { - *tlbp = NULL; - goto out; - } - cond_resched(); - } - + cond_resched(); *tlbp = tlb_gather_mmu(vma->vm_mm, fullmm); tlb_start_valid = 0; zap_work = ZAP_BLOCK_SIZE; } } -out: return start; /* which is now the end (or restart) address */ } @@ -1752,7 +1741,7 @@ /* * Helper functions for unmap_mapping_range(). * - * __ Notes on dropping i_mmap_lock to reduce latency while unmapping __ + * __ Notes on dropping i_mmap_sem to reduce latency while unmapping __ * * We have to restart searching the prio_tree whenever we drop the lock, * since the iterator is only valid while the lock is held, and anyway @@ -1771,7 +1760,7 @@ * can't efficiently keep all vmas in step with mapping->truncate_count: * so instead reset them all whenever it wraps back to 0 (then go to 1). * mapping->truncate_count and vma->vm_truncate_count are protected by - * i_mmap_lock. + * i_mmap_sem. * * In order to make forward progress despite repeatedly restarting some * large vma, note the restart_addr from unmap_vmas when it breaks out: @@ -1821,7 +1810,7 @@ restart_addr = zap_page_range(vma, start_addr, end_addr - start_addr, details); - need_break = need_resched() || spin_needbreak(details->i_mmap_lock); + need_break = need_resched(); if (restart_addr >= end_addr) { /* We have now completed this vma: mark it so */ @@ -1835,9 +1824,9 @@ goto again; } - spin_unlock(details->i_mmap_lock); + up_write(details->i_mmap_sem); cond_resched(); - spin_lock(details->i_mmap_lock); + down_write(details->i_mmap_sem); return -EINTR; } @@ -1931,9 +1920,9 @@ details.last_index = hba + hlen - 1; if (details.last_index < details.first_index) details.last_index = ULONG_MAX; - details.i_mmap_lock = &mapping->i_mmap_lock; + details.i_mmap_sem = &mapping->i_mmap_sem; - spin_lock(&mapping->i_mmap_lock); + down_write(&mapping->i_mmap_sem); /* Protect against endless unmapping loops */ mapping->truncate_count++; @@ -1948,7 +1937,7 @@ unmap_mapping_range_tree(&mapping->i_mmap, &details); if (unlikely(!list_empty(&mapping->i_mmap_nonlinear))) unmap_mapping_range_list(&mapping->i_mmap_nonlinear, &details); - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); } EXPORT_SYMBOL(unmap_mapping_range); diff --git a/mm/migrate.c b/mm/migrate.c --- a/mm/migrate.c +++ b/mm/migrate.c @@ -211,12 +211,12 @@ if (!mapping) return; - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) remove_migration_pte(vma, old, new); - spin_unlock(&mapping->i_mmap_lock); + up_read(&mapping->i_mmap_sem); } /* diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -187,7 +187,7 @@ } /* - * Requires inode->i_mapping->i_mmap_lock + * Requires inode->i_mapping->i_mmap_sem */ static void __remove_shared_vm_struct(struct vm_area_struct *vma, struct file *file, struct address_space *mapping) @@ -215,9 +215,9 @@ if (file) { struct address_space *mapping = file->f_mapping; - spin_lock(&mapping->i_mmap_lock); + down_write(&mapping->i_mmap_sem); __remove_shared_vm_struct(vma, file, mapping); - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); } } @@ -440,7 +440,7 @@ mapping = vma->vm_file->f_mapping; if (mapping) { - spin_lock(&mapping->i_mmap_lock); + down_write(&mapping->i_mmap_sem); vma->vm_truncate_count = mapping->truncate_count; } anon_vma_lock(vma); @@ -450,7 +450,7 @@ anon_vma_unlock(vma); if (mapping) - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); mm->map_count++; validate_mm(mm); @@ -537,7 +537,7 @@ mapping = file->f_mapping; if (!(vma->vm_flags & VM_NONLINEAR)) root = &mapping->i_mmap; - spin_lock(&mapping->i_mmap_lock); + down_write(&mapping->i_mmap_sem); if (importer && vma->vm_truncate_count != next->vm_truncate_count) { /* @@ -621,7 +621,7 @@ if (anon_vma) spin_unlock(&anon_vma->lock); if (mapping) - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); if (remove_next) { if (file) @@ -2065,7 +2065,7 @@ /* Insert vm structure into process list sorted by address * and into the inode's i_mmap tree. If vm_file is non-NULL - * then i_mmap_lock is taken here. + * then i_mmap_sem is taken here. */ int insert_vm_struct(struct mm_struct * mm, struct vm_area_struct * vma) { diff --git a/mm/mremap.c b/mm/mremap.c --- a/mm/mremap.c +++ b/mm/mremap.c @@ -88,7 +88,7 @@ * and we propagate stale pages into the dst afterward. */ mapping = vma->vm_file->f_mapping; - spin_lock(&mapping->i_mmap_lock); + down_write(&mapping->i_mmap_sem); if (new_vma->vm_truncate_count && new_vma->vm_truncate_count != vma->vm_truncate_count) new_vma->vm_truncate_count = 0; @@ -120,7 +120,7 @@ pte_unmap_nested(new_pte - 1); pte_unmap_unlock(old_pte - 1, old_ptl); if (mapping) - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end); } diff --git a/mm/rmap.c b/mm/rmap.c --- a/mm/rmap.c +++ b/mm/rmap.c @@ -24,7 +24,7 @@ * inode->i_alloc_sem (vmtruncate_range) * mm->mmap_sem * page->flags PG_locked (lock_page) - * mapping->i_mmap_lock + * mapping->i_mmap_sem * anon_vma->lock * mm->page_table_lock or pte_lock * zone->lru_lock (in mark_page_accessed, isolate_lru_page) @@ -373,14 +373,14 @@ * The page lock not only makes sure that page->mapping cannot * suddenly be NULLified by truncation, it makes sure that the * structure at mapping cannot be freed and reused yet, - * so we can safely take mapping->i_mmap_lock. + * so we can safely take mapping->i_mmap_sem. */ BUG_ON(!PageLocked(page)); - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); /* - * i_mmap_lock does not stabilize mapcount at all, but mapcount + * i_mmap_sem does not stabilize mapcount at all, but mapcount * is more likely to be accurate if we note it after spinning. */ mapcount = page_mapcount(page); @@ -403,7 +403,7 @@ break; } - spin_unlock(&mapping->i_mmap_lock); + up_read(&mapping->i_mmap_sem); return referenced; } @@ -489,12 +489,12 @@ BUG_ON(PageAnon(page)); - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) { if (vma->vm_flags & VM_SHARED) ret += page_mkclean_one(page, vma); } - spin_unlock(&mapping->i_mmap_lock); + up_read(&mapping->i_mmap_sem); return ret; } @@ -930,7 +930,7 @@ unsigned long max_nl_size = 0; unsigned int mapcount; - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) { ret = try_to_unmap_one(page, vma, migration); if (ret == SWAP_FAIL || !page_mapped(page)) @@ -967,7 +967,6 @@ mapcount = page_mapcount(page); if (!mapcount) goto out; - cond_resched_lock(&mapping->i_mmap_lock); max_nl_size = (max_nl_size + CLUSTER_SIZE - 1) & CLUSTER_MASK; if (max_nl_cursor == 0) @@ -989,7 +988,6 @@ } vma->vm_private_data = (void *) max_nl_cursor; } - cond_resched_lock(&mapping->i_mmap_lock); max_nl_cursor += CLUSTER_SIZE; } while (max_nl_cursor <= max_nl_size); @@ -1001,7 +999,7 @@ list_for_each_entry(vma, &mapping->i_mmap_nonlinear, shared.vm_set.list) vma->vm_private_data = NULL; out: - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); return ret; } From andrea at qumranet.com Wed Apr 2 14:30:06 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 02 Apr 2008 23:30:06 +0200 Subject: [ofa-general] [PATCH 5 of 8] We no longer abort unmapping in unmap vmas because we can reschedule while In-Reply-To: Message-ID: <316e5b1e4bf388ef0198.1207171806@duo.random> # HG changeset patch # User Andrea Arcangeli # Date 1207159055 -7200 # Node ID 316e5b1e4bf388ef0198c91b3067ed1e4171d7f6 # Parent 3c3787c496cab1fc590ba3f97e7904bdfaab5375 We no longer abort unmapping in unmap vmas because we can reschedule while unmapping since we are holding a semaphore. This would allow moving more of the tlb flusing into unmap_vmas reducing code in various places. Signed-off-by: Christoph Lameter diff --git a/include/linux/mm.h b/include/linux/mm.h --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -723,8 +723,7 @@ struct page *vm_normal_page(struct vm_area_struct *, unsigned long, pte_t); unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size, struct zap_details *); -unsigned long unmap_vmas(struct mmu_gather **tlb, - struct vm_area_struct *start_vma, unsigned long start_addr, +unsigned long unmap_vmas(struct vm_area_struct *start_vma, unsigned long start_addr, unsigned long end_addr, unsigned long *nr_accounted, struct zap_details *); diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -805,7 +805,6 @@ /** * unmap_vmas - unmap a range of memory covered by a list of vma's - * @tlbp: address of the caller's struct mmu_gather * @vma: the starting vma * @start_addr: virtual address at which to start unmapping * @end_addr: virtual address at which to end unmapping @@ -817,20 +816,13 @@ * Unmap all pages in the vma list. * * We aim to not hold locks for too long (for scheduling latency reasons). - * So zap pages in ZAP_BLOCK_SIZE bytecounts. This means we need to - * return the ending mmu_gather to the caller. + * So zap pages in ZAP_BLOCK_SIZE bytecounts. * * Only addresses between `start' and `end' will be unmapped. * * The VMA list must be sorted in ascending virtual address order. - * - * unmap_vmas() assumes that the caller will flush the whole unmapped address - * range after unmap_vmas() returns. So the only responsibility here is to - * ensure that any thus-far unmapped pages are flushed before unmap_vmas() - * drops the lock and schedules. */ -unsigned long unmap_vmas(struct mmu_gather **tlbp, - struct vm_area_struct *vma, unsigned long start_addr, +unsigned long unmap_vmas(struct vm_area_struct *vma, unsigned long start_addr, unsigned long end_addr, unsigned long *nr_accounted, struct zap_details *details) { @@ -838,7 +830,15 @@ unsigned long tlb_start = 0; /* For tlb_finish_mmu */ int tlb_start_valid = 0; unsigned long start = start_addr; - int fullmm = (*tlbp)->fullmm; + int fullmm; + struct mmu_gather *tlb; + struct mm_struct *mm = vma->vm_mm; + + mmu_notifier_invalidate_range_start(mm, start_addr, end_addr); + lru_add_drain(); + tlb = tlb_gather_mmu(mm, 0); + update_hiwater_rss(mm); + fullmm = tlb->fullmm; for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) { unsigned long end; @@ -865,7 +865,7 @@ (HPAGE_SIZE / PAGE_SIZE); start = end; } else - start = unmap_page_range(*tlbp, vma, + start = unmap_page_range(tlb, vma, start, end, &zap_work, details); if (zap_work > 0) { @@ -873,13 +873,15 @@ break; } - tlb_finish_mmu(*tlbp, tlb_start, start); + tlb_finish_mmu(tlb, tlb_start, start); cond_resched(); - *tlbp = tlb_gather_mmu(vma->vm_mm, fullmm); + tlb = tlb_gather_mmu(vma->vm_mm, fullmm); tlb_start_valid = 0; zap_work = ZAP_BLOCK_SIZE; } } + tlb_finish_mmu(tlb, start_addr, end_addr); + mmu_notifier_invalidate_range_end(mm, start_addr, end_addr); return start; /* which is now the end (or restart) address */ } @@ -893,20 +895,10 @@ unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size, struct zap_details *details) { - struct mm_struct *mm = vma->vm_mm; - struct mmu_gather *tlb; unsigned long end = address + size; unsigned long nr_accounted = 0; - lru_add_drain(); - tlb = tlb_gather_mmu(mm, 0); - update_hiwater_rss(mm); - mmu_notifier_invalidate_range_start(mm, address, end); - end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details); - mmu_notifier_invalidate_range_end(mm, address, end); - if (tlb) - tlb_finish_mmu(tlb, address, end); - return end; + return unmap_vmas(vma, address, end, &nr_accounted, details); } /* diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1742,19 +1742,12 @@ unsigned long start, unsigned long end) { struct vm_area_struct *next = prev? prev->vm_next: mm->mmap; - struct mmu_gather *tlb; unsigned long nr_accounted = 0; - lru_add_drain(); - tlb = tlb_gather_mmu(mm, 0); - update_hiwater_rss(mm); - mmu_notifier_invalidate_range_start(mm, start, end); - unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL); + unmap_vmas(vma, start, end, &nr_accounted, NULL); vm_unacct_memory(nr_accounted); - tlb_finish_mmu(tlb, start, end); free_pgtables(vma, prev? prev->vm_end: FIRST_USER_ADDRESS, next? next->vm_start: 0); - mmu_notifier_invalidate_range_end(mm, start, end); } /* @@ -2034,7 +2027,6 @@ /* Release all mmaps. */ void exit_mmap(struct mm_struct *mm) { - struct mmu_gather *tlb; struct vm_area_struct *vma = mm->mmap; unsigned long nr_accounted = 0; unsigned long end; @@ -2045,12 +2037,9 @@ lru_add_drain(); flush_cache_mm(mm); - tlb = tlb_gather_mmu(mm, 1); - /* Don't update_hiwater_rss(mm) here, do_exit already did */ - /* Use -1 here to ensure all VMAs in the mm are unmapped */ - end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL); + + end = unmap_vmas(vma, 0, -1, &nr_accounted, NULL); vm_unacct_memory(nr_accounted); - tlb_finish_mmu(tlb, 0, end); free_pgtables(vma, FIRST_USER_ADDRESS, 0); /* From andrea at qumranet.com Wed Apr 2 14:30:08 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 02 Apr 2008 23:30:08 +0200 Subject: [ofa-general] [PATCH 7 of 8] XPMEM would have used sys_madvise() except that madvise_dontneed() In-Reply-To: Message-ID: <31fc23193bd039cc595f.1207171808@duo.random> # HG changeset patch # User Andrea Arcangeli # Date 1207159059 -7200 # Node ID 31fc23193bd039cc595fba1ca149a9715f7d0fb2 # Parent dd918e267ce1d054e8364a53adcecf3c7439cff4 XPMEM would have used sys_madvise() except that madvise_dontneed() returns an -EINVAL if VM_PFNMAP is set, which is always true for the pages XPMEM imports from other partitions and is also true for uncached pages allocated locally via the mspec allocator. XPMEM needs zap_page_range() functionality for these types of pages as well as 'normal' pages. Signed-off-by: Dean Nelson diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -900,6 +900,7 @@ return unmap_vmas(vma, address, end, &nr_accounted, details); } +EXPORT_SYMBOL_GPL(zap_page_range); /* * Do a quick page-table lookup for a single page. From andrea at qumranet.com Wed Apr 2 14:30:09 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 02 Apr 2008 23:30:09 +0200 Subject: [ofa-general] [PATCH 8 of 8] This patch adds a lock ordering rule to avoid a potential deadlock when In-Reply-To: Message-ID: # HG changeset patch # User Andrea Arcangeli # Date 1207159059 -7200 # Node ID f3f119118b0abd9c4624263ef388dc7230d937fe # Parent 31fc23193bd039cc595fba1ca149a9715f7d0fb2 This patch adds a lock ordering rule to avoid a potential deadlock when multiple mmap_sems need to be locked. Signed-off-by: Dean Nelson diff --git a/mm/filemap.c b/mm/filemap.c --- a/mm/filemap.c +++ b/mm/filemap.c @@ -79,6 +79,9 @@ * * ->i_mutex (generic_file_buffered_write) * ->mmap_sem (fault_in_pages_readable->do_page_fault) + * + * When taking multiple mmap_sems, one should lock the lowest-addressed + * one first proceeding on up to the highest-addressed one. * * ->i_mutex * ->i_alloc_sem (various) From andrea at qumranet.com Wed Apr 2 14:30:07 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 02 Apr 2008 23:30:07 +0200 Subject: [ofa-general] [PATCH 6 of 8] Convert the anon_vma spinlock to a rw semaphore. This allows concurrent In-Reply-To: Message-ID: # HG changeset patch # User Andrea Arcangeli # Date 1207159058 -7200 # Node ID dd918e267ce1d054e8364a53adcecf3c7439cff4 # Parent 316e5b1e4bf388ef0198c91b3067ed1e4171d7f6 Convert the anon_vma spinlock to a rw semaphore. This allows concurrent traversal of reverse maps for try_to_unmap and page_mkclean. It also allows the calling of sleeping functions from reverse map traversal. An additional complication is that rcu is used in some context to guarantee the presence of the anon_vma while we acquire the lock. We cannot take a semaphore within an rcu critical section. Add a refcount to the anon_vma structure which allow us to give an existence guarantee for the anon_vma structure independent of the spinlock or the list contents. The refcount can then be taken within the RCU section. If it has been taken successfully then the refcount guarantees the existence of the anon_vma. The refcount in anon_vma also allows us to fix a nasty issue in page migration where we fudged by using rcu for a long code path to guarantee the existence of the anon_vma. The refcount in general allows a shortening of RCU critical sections since we can do a rcu_unlock after taking the refcount. This is particularly relevant if the anon_vma chains contain hundreds of entries. Issues: - Atomic overhead increases in situations where a new reference to the anon_vma has to be established or removed. Overhead also increases when a speculative reference is used (try_to_unmap, page_mkclean, page migration). There is also the more frequent processor change due to up_xxx letting waiting tasks run first. This results in f.e. the Aim9 brk performance test to got down by 10-15%. Signed-off-by: Christoph Lameter diff --git a/include/linux/rmap.h b/include/linux/rmap.h --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -25,7 +25,8 @@ * pointing to this anon_vma once its vma list is empty. */ struct anon_vma { - spinlock_t lock; /* Serialize access to vma list */ + atomic_t refcount; /* vmas on the list */ + struct rw_semaphore sem;/* Serialize access to vma list */ struct list_head head; /* List of private "related" vmas */ }; @@ -43,18 +44,31 @@ kmem_cache_free(anon_vma_cachep, anon_vma); } +struct anon_vma *grab_anon_vma(struct page *page); + +static inline void get_anon_vma(struct anon_vma *anon_vma) +{ + atomic_inc(&anon_vma->refcount); +} + +static inline void put_anon_vma(struct anon_vma *anon_vma) +{ + if (atomic_dec_and_test(&anon_vma->refcount)) + anon_vma_free(anon_vma); +} + static inline void anon_vma_lock(struct vm_area_struct *vma) { struct anon_vma *anon_vma = vma->anon_vma; if (anon_vma) - spin_lock(&anon_vma->lock); + down_write(&anon_vma->sem); } static inline void anon_vma_unlock(struct vm_area_struct *vma) { struct anon_vma *anon_vma = vma->anon_vma; if (anon_vma) - spin_unlock(&anon_vma->lock); + up_write(&anon_vma->sem); } /* diff --git a/mm/migrate.c b/mm/migrate.c --- a/mm/migrate.c +++ b/mm/migrate.c @@ -235,15 +235,16 @@ return; /* - * We hold the mmap_sem lock. So no need to call page_lock_anon_vma. + * We hold either the mmap_sem lock or a reference on the + * anon_vma. So no need to call page_lock_anon_vma. */ anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON); - spin_lock(&anon_vma->lock); + down_read(&anon_vma->sem); list_for_each_entry(vma, &anon_vma->head, anon_vma_node) remove_migration_pte(vma, old, new); - spin_unlock(&anon_vma->lock); + up_read(&anon_vma->sem); } /* @@ -623,7 +624,7 @@ int rc = 0; int *result = NULL; struct page *newpage = get_new_page(page, private, &result); - int rcu_locked = 0; + struct anon_vma *anon_vma = NULL; int charge = 0; if (!newpage) @@ -647,16 +648,14 @@ } /* * By try_to_unmap(), page->mapcount goes down to 0 here. In this case, - * we cannot notice that anon_vma is freed while we migrates a page. + * we cannot notice that anon_vma is freed while we migrate a page. * This rcu_read_lock() delays freeing anon_vma pointer until the end * of migration. File cache pages are no problem because of page_lock() * File Caches may use write_page() or lock_page() in migration, then, * just care Anon page here. */ - if (PageAnon(page)) { - rcu_read_lock(); - rcu_locked = 1; - } + if (PageAnon(page)) + anon_vma = grab_anon_vma(page); /* * Corner case handling: @@ -674,10 +673,7 @@ if (!PageAnon(page) && PagePrivate(page)) { /* * Go direct to try_to_free_buffers() here because - * a) that's what try_to_release_page() would do anyway - * b) we may be under rcu_read_lock() here, so we can't - * use GFP_KERNEL which is what try_to_release_page() - * needs to be effective. + * that's what try_to_release_page() would do anyway */ try_to_free_buffers(page); } @@ -698,8 +694,8 @@ } else if (charge) mem_cgroup_end_migration(newpage); rcu_unlock: - if (rcu_locked) - rcu_read_unlock(); + if (anon_vma) + put_anon_vma(anon_vma); unlock: diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -565,7 +565,7 @@ if (vma->anon_vma) anon_vma = vma->anon_vma; if (anon_vma) { - spin_lock(&anon_vma->lock); + down_write(&anon_vma->sem); /* * Easily overlooked: when mprotect shifts the boundary, * make sure the expanding vma has anon_vma set if the @@ -619,7 +619,7 @@ } if (anon_vma) - spin_unlock(&anon_vma->lock); + up_write(&anon_vma->sem); if (mapping) up_write(&mapping->i_mmap_sem); diff --git a/mm/rmap.c b/mm/rmap.c --- a/mm/rmap.c +++ b/mm/rmap.c @@ -69,7 +69,7 @@ if (anon_vma) { allocated = NULL; locked = anon_vma; - spin_lock(&locked->lock); + down_write(&locked->sem); } else { anon_vma = anon_vma_alloc(); if (unlikely(!anon_vma)) @@ -81,6 +81,7 @@ /* page_table_lock to protect against threads */ spin_lock(&mm->page_table_lock); if (likely(!vma->anon_vma)) { + get_anon_vma(anon_vma); vma->anon_vma = anon_vma; list_add_tail(&vma->anon_vma_node, &anon_vma->head); allocated = NULL; @@ -88,7 +89,7 @@ spin_unlock(&mm->page_table_lock); if (locked) - spin_unlock(&locked->lock); + up_write(&locked->sem); if (unlikely(allocated)) anon_vma_free(allocated); } @@ -99,14 +100,17 @@ { BUG_ON(vma->anon_vma != next->anon_vma); list_del(&next->anon_vma_node); + put_anon_vma(vma->anon_vma); } void __anon_vma_link(struct vm_area_struct *vma) { struct anon_vma *anon_vma = vma->anon_vma; - if (anon_vma) + if (anon_vma) { + get_anon_vma(anon_vma); list_add_tail(&vma->anon_vma_node, &anon_vma->head); + } } void anon_vma_link(struct vm_area_struct *vma) @@ -114,36 +118,32 @@ struct anon_vma *anon_vma = vma->anon_vma; if (anon_vma) { - spin_lock(&anon_vma->lock); + get_anon_vma(anon_vma); + down_write(&anon_vma->sem); list_add_tail(&vma->anon_vma_node, &anon_vma->head); - spin_unlock(&anon_vma->lock); + up_write(&anon_vma->sem); } } void anon_vma_unlink(struct vm_area_struct *vma) { struct anon_vma *anon_vma = vma->anon_vma; - int empty; if (!anon_vma) return; - spin_lock(&anon_vma->lock); + down_write(&anon_vma->sem); list_del(&vma->anon_vma_node); - - /* We must garbage collect the anon_vma if it's empty */ - empty = list_empty(&anon_vma->head); - spin_unlock(&anon_vma->lock); - - if (empty) - anon_vma_free(anon_vma); + up_write(&anon_vma->sem); + put_anon_vma(anon_vma); } static void anon_vma_ctor(struct kmem_cache *cachep, void *data) { struct anon_vma *anon_vma = data; - spin_lock_init(&anon_vma->lock); + init_rwsem(&anon_vma->sem); + atomic_set(&anon_vma->refcount, 0); INIT_LIST_HEAD(&anon_vma->head); } @@ -157,9 +157,9 @@ * Getting a lock on a stable anon_vma from a page off the LRU is * tricky: page_lock_anon_vma rely on RCU to guard against the races. */ -static struct anon_vma *page_lock_anon_vma(struct page *page) +struct anon_vma *grab_anon_vma(struct page *page) { - struct anon_vma *anon_vma; + struct anon_vma *anon_vma = NULL; unsigned long anon_mapping; rcu_read_lock(); @@ -170,17 +170,26 @@ goto out; anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); - spin_lock(&anon_vma->lock); - return anon_vma; + if (!atomic_inc_not_zero(&anon_vma->refcount)) + anon_vma = NULL; out: rcu_read_unlock(); - return NULL; + return anon_vma; +} + +static struct anon_vma *page_lock_anon_vma(struct page *page) +{ + struct anon_vma *anon_vma = grab_anon_vma(page); + + if (anon_vma) + down_read(&anon_vma->sem); + return anon_vma; } static void page_unlock_anon_vma(struct anon_vma *anon_vma) { - spin_unlock(&anon_vma->lock); - rcu_read_unlock(); + up_read(&anon_vma->sem); + put_anon_vma(anon_vma); } /* From andrea at qumranet.com Wed Apr 2 14:53:34 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 2 Apr 2008 23:53:34 +0200 Subject: [ofa-general] Re: [patch 1/9] EMM Notifier: The notifier calls In-Reply-To: References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> Message-ID: <20080402215334.GT19189@duo.random> On Wed, Apr 02, 2008 at 10:59:50AM -0700, Christoph Lameter wrote: > Did I see #v10? Could you start a new subject when you post please? Do > not respond to some old message otherwise the threading will be wrong. I wasn't clear enough, #v10 was in the works... I was thinking about the last two issues before posting it. > How exactly does the GRU corrupt memory? Jack added synchronize_rcu, I assume for a reason. > > > Another less obviously safe approach is to allow the register > > method to succeed only when mm_users=1 and the task is single > > threaded. This way if all the places where the mmu notifers aren't > > invoked on the mm not by the current task, are only doing > > invalidates after/before zapping ptes, if the istantiation of new > > ptes is single threaded too, we shouldn't worry if we miss an > > invalidate for a pte that is zero and doesn't point to any physical > > page. In the places where current->mm != mm I'm using > > invalidate_page 99% of the time, and that only follows the > > ptep_clear_flush. The problem are the range_begin that will happen > > before zapping the pte in places where current->mm != > > mm. Unfortunately in my incremental patch where I move all > > invalidate_page outside of the PT lock to prepare for allowing > > sleeping inside the mmu notifiers, I used range_begin/end in places > > like try_to_unmap_cluster where current->mm != mm. In general > > this solution looks more fragile than the seqlock. > > Hmmm... Okay that is one solution that would just require a BUG_ON in the > registration methods. Perhaps you didn't notice that this solution can't work if you call range_begin/end not in the "current" context and try_to_unmap_cluster does exactly that for both my patchset and yours. Missing an _end is ok, missing a _begin is never ok. > Well doesnt the requirement of just one execution thread also deal with > that issue? Yes, except again it can't work for try_to_unmap_cluster. This solution is only applicable to #v10 if I fix try_to_unmap_cluster to only call invalidate_page (relaying on the fact the VM holds a pin and a lock on any page that is being mmu-notifier-invalidated). You can't use the single threaded approach to solve either 1 or 2, because your _begin call is called anywhere and that's where you call the secondary-tlb flush and it's fatal to miss it. invalidate_page is called always after, so it enforced the tlb flush to be called _after_ and so it's inherently safe. From andrea at qumranet.com Wed Apr 2 14:56:04 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 2 Apr 2008 23:56:04 +0200 Subject: [ofa-general] Re: [patch 5/9] Convert anon_vma lock to rw_sem and refcount In-Reply-To: References: <20080401205531.986291575@sgi.com> <20080401205636.777127252@sgi.com> <20080402175058.GR19189@duo.random> Message-ID: <20080402215604.GU19189@duo.random> On Wed, Apr 02, 2008 at 11:15:26AM -0700, Christoph Lameter wrote: > On Wed, 2 Apr 2008, Andrea Arcangeli wrote: > > > On Tue, Apr 01, 2008 at 01:55:36PM -0700, Christoph Lameter wrote: > > > This results in f.e. the Aim9 brk performance test to got down by 10-15%. > > > > I guess it's more likely because of overscheduling for small crtitical > > sections, did you counted the total number of context switches? I > > guess there will be a lot more with your patch applied. That > > regression is a showstopper and it is the reason why I've suggested > > before to add a CONFIG_XPMEM or CONFIG_MMU_NOTIFIER_SLEEP config > > option to make the VM locks sleep capable only when XPMEM=y > > (PREEMPT_RT will enable it too). Thanks for doing the benchmark work! > > There are more context switches if locks are contended. > > But that has actually also some good aspects because we avoid busy loops > and can potentially continue work in another process. That would be the case if the "wait time" would be longer than the scheduling time, the whole point is that with anonvma the write side is so fast it's likely never worth scheduling (probably not even with preempt-rt for the write side, the read side is an entirely different matter but the read side can run concurrently if the system is heavy paging), hence the slowdown. What you benchmarked is the write side, which is also the fast path when the system is heavily CPU bound. I've to say aim is a great benchmark to test this regression. But I think a config option will solve all of this. From clameter at sgi.com Wed Apr 2 14:54:52 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 2 Apr 2008 14:54:52 -0700 (PDT) Subject: [ofa-general] Re: [patch 1/9] EMM Notifier: The notifier calls In-Reply-To: <20080402215334.GT19189@duo.random> References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> <20080402215334.GT19189@duo.random> Message-ID: On Wed, 2 Apr 2008, Andrea Arcangeli wrote: > > Hmmm... Okay that is one solution that would just require a BUG_ON in the > > registration methods. > > Perhaps you didn't notice that this solution can't work if you call > range_begin/end not in the "current" context and try_to_unmap_cluster > does exactly that for both my patchset and yours. Missing an _end is > ok, missing a _begin is never ok. If you look at the patch you will see a requirement of holding a writelock on mmap_sem which will keep out get_user_pages(). From clameter at sgi.com Wed Apr 2 14:56:25 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 2 Apr 2008 14:56:25 -0700 (PDT) Subject: [ofa-general] Re: [patch 5/9] Convert anon_vma lock to rw_sem and refcount In-Reply-To: <20080402215604.GU19189@duo.random> References: <20080401205531.986291575@sgi.com> <20080401205636.777127252@sgi.com> <20080402175058.GR19189@duo.random> <20080402215604.GU19189@duo.random> Message-ID: On Wed, 2 Apr 2008, Andrea Arcangeli wrote: > paging), hence the slowdown. What you benchmarked is the write side, > which is also the fast path when the system is heavily CPU bound. I've > to say aim is a great benchmark to test this regression. I am a bit surprised that brk performance is that important. There may be other measurement that have to be made to assess how this would impact a real load. From andrea at qumranet.com Wed Apr 2 15:01:48 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Thu, 3 Apr 2008 00:01:48 +0200 Subject: [ofa-general] Re: EMM: Require single threadedness for registration. In-Reply-To: References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> Message-ID: <20080402220148.GV19189@duo.random> On Wed, Apr 02, 2008 at 02:05:28PM -0700, Christoph Lameter wrote: > Here is a patch to require single threaded execution during emm_register. > This also allows an easy implementation of an unregister function and gets > rid of the races that Andrea worried about. That would work for #v10 if I remove the invalidate_range_start from try_to_unmap_cluster, it can't work for EMM because you've emm_invalidate_start firing anywhere outside the context of the current task (even regular rmap code, not just nonlinear corner case will trigger the race). In short the single threaded approach would be workable only thanks to the fact #v10 has the notion of invalidate_page for flushing the tlb _after_ and to avoid blocking the secondary page fault during swapping. In the kvm case I don't want to block the page fault for anything but madvise which is strictly only used after guest inflated the balloon, and the existence of invalidate_page allows that optimization, and allows not to serialize against the kvm page fault during all regular page faults when the invalidate_page is called while the page is pinned by the VM. The requirement for invalidate_page is that the pte and linux tlb are flushed _before_ and the page is freed _after_ the invalidate_page method. that's not the case for _begin/_end. The page is freed well before _end runs, hence the need of _begin and to block the secondary mmu page fault during the vma-mangling operations. #v10 takes care of all this, and despite I could perhaps fix the remaining two issues using the single-threaded enforcement I suggested, I preferred to go safe and spend an unsigned per-mm in case anybody needs to attach at runtime, the single threaded restriction didn't look very clean. From clameter at sgi.com Wed Apr 2 15:03:12 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 2 Apr 2008 15:03:12 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 2 of 8] Moves all mmu notifier methods outside the PT lock (first and not last In-Reply-To: References: Message-ID: On Wed, 2 Apr 2008, Andrea Arcangeli wrote: > diff --git a/mm/memory.c b/mm/memory.c > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1626,9 +1626,10 @@ > */ > page_table = pte_offset_map_lock(mm, pmd, address, > &ptl); > - page_cache_release(old_page); > + new_page = NULL; > if (!pte_same(*page_table, orig_pte)) > goto unlock; > + page_cache_release(old_page); > > page_mkwrite = 1; > } This is deferring frees and not moving the callouts. KVM specific? What exactly is this doing? A significant portion of this seems to be undoing what the first patch did. From clameter at sgi.com Wed Apr 2 15:06:19 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 2 Apr 2008 15:06:19 -0700 (PDT) Subject: [ofa-general] Re: EMM: Require single threadedness for registration. In-Reply-To: <20080402220148.GV19189@duo.random> References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> <20080402220148.GV19189@duo.random> Message-ID: On Thu, 3 Apr 2008, Andrea Arcangeli wrote: > That would work for #v10 if I remove the invalidate_range_start from > try_to_unmap_cluster, it can't work for EMM because you've > emm_invalidate_start firing anywhere outside the context of the > current task (even regular rmap code, not just nonlinear corner case > will trigger the race). In short the single threaded approach would be But in that case it will be firing for a callback to another mm_struct. The notifiers are bound to mm_structs and keep separate contexts. > The requirement for invalidate_page is that the pte and linux tlb are > flushed _before_ and the page is freed _after_ the invalidate_page > method. that's not the case for _begin/_end. The page is freed well > before _end runs, hence the need of _begin and to block the secondary > mmu page fault during the vma-mangling operations. You could flush in _begin and free on _end? I thought you are taking a refcount on the page? You can drop the refcount only on _end to ensure that the page does not go away before. From andrea at qumranet.com Wed Apr 2 15:09:36 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Thu, 3 Apr 2008 00:09:36 +0200 Subject: [ofa-general] Re: [patch 1/9] EMM Notifier: The notifier calls In-Reply-To: References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> <20080402215334.GT19189@duo.random> Message-ID: <20080402220936.GW19189@duo.random> On Wed, Apr 02, 2008 at 02:54:52PM -0700, Christoph Lameter wrote: > On Wed, 2 Apr 2008, Andrea Arcangeli wrote: > > > > Hmmm... Okay that is one solution that would just require a BUG_ON in the > > > registration methods. > > > > Perhaps you didn't notice that this solution can't work if you call > > range_begin/end not in the "current" context and try_to_unmap_cluster > > does exactly that for both my patchset and yours. Missing an _end is > > ok, missing a _begin is never ok. > > If you look at the patch you will see a requirement of holding a > writelock on mmap_sem which will keep out get_user_pages(). I said try_to_unmap_cluster, not get_user_pages. CPU0 CPU1 try_to_unmap_cluster: emm_invalidate_start in EMM (or mmu_notifier_invalidate_range_start in #v10) walking the list by hand in EMM (or with hlist cleaner in #v10) xpmem method invoked schedule for a long while inside invalidate_range_start while skbs are sent gru registers synchronize_rcu (sorry useless now) single threaded, so taking a page fault secondary tlb instantiated xpm method returns end of the list (didn't notice that it has to restart to flush the gru) zap pte free the page gru corrupts memory CPU 1 was single threaded, CPU0 doesn't hold any mmap_sem or any other lock that could ever serialize against the GRU as far as I can tell. In general my #v10 solution mixing seqlock + rcu looks more robust and allows multithreaded attachment of mmu notifers as well. I could have fixed it with the single threaded thanks to the fact the only place outside the mm->mmap_sem is try_to_unmap_cluster for me but it wasn't simple to convert, nor worth it, given nonlinear isn't worth optimizing for (not even the core VM cares about try_to_unmap_cluster which is infact the only place in the VM with a O(N) complexity for each try_to_unmap call, where N is the size of the mapping divided by page_size). From andrea at qumranet.com Wed Apr 2 15:12:28 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Thu, 3 Apr 2008 00:12:28 +0200 Subject: [ofa-general] Re: [patch 5/9] Convert anon_vma lock to rw_sem and refcount In-Reply-To: References: <20080401205531.986291575@sgi.com> <20080401205636.777127252@sgi.com> <20080402175058.GR19189@duo.random> <20080402215604.GU19189@duo.random> Message-ID: <20080402221228.GX19189@duo.random> On Wed, Apr 02, 2008 at 02:56:25PM -0700, Christoph Lameter wrote: > I am a bit surprised that brk performance is that important. There may be I think it's not brk but fork that is being slowed down, did you oprofile? AIM forks a lot... The write side fast path generating the overscheduling I guess is when the new vmas are created for the child and queued in the parent anon-vma in O(1), so immediate, even preempt-rt would be ok with it spinning and not scheduling, it's just a list_add (much faster than schedule() indeed). Every time there's a collision when multiple child forks simultaneously and they all try to queue in the same anon-vma, things will slowdown. From andrea at qumranet.com Wed Apr 2 15:17:16 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Thu, 3 Apr 2008 00:17:16 +0200 Subject: [ofa-general] Re: EMM: Require single threadedness for registration. In-Reply-To: References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> <20080402220148.GV19189@duo.random> Message-ID: <20080402221716.GY19189@duo.random> On Wed, Apr 02, 2008 at 03:06:19PM -0700, Christoph Lameter wrote: > On Thu, 3 Apr 2008, Andrea Arcangeli wrote: > > > That would work for #v10 if I remove the invalidate_range_start from > > try_to_unmap_cluster, it can't work for EMM because you've > > emm_invalidate_start firing anywhere outside the context of the > > current task (even regular rmap code, not just nonlinear corner case > > will trigger the race). In short the single threaded approach would be > > But in that case it will be firing for a callback to another mm_struct. > The notifiers are bound to mm_structs and keep separate contexts. Why can't it fire on the mm_struct where GRU just registered? That mm_struct existed way before GRU registered, and VM is free to unmap it w/o mmap_sem if there was any memory pressure. > You could flush in _begin and free on _end? I thought you are taking a > refcount on the page? You can drop the refcount only on _end to ensure > that the page does not go away before. we're going to lock + flush on begin and unlock on _end w/o refcounting to microoptimize. Free is done by unmap_vmas/madvise/munmap at will. That's a very slow path, inflating the balloon is not problematic. But invalidate_page allows to avoid blocking page faults during swapping so minor faults can happen and refresh the pte young bits etc... When the VM unmaps the page while holding the page pin, there's no race and that's where invalidate_page is being used to generate lower invalidation overhead. From clameter at sgi.com Wed Apr 2 15:34:01 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 2 Apr 2008 15:34:01 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 1 of 8] Core of mmu notifiers In-Reply-To: References: Message-ID: On Wed, 2 Apr 2008, Andrea Arcangeli wrote: > + void (*invalidate_page)(struct mmu_notifier *mn, > + struct mm_struct *mm, > + unsigned long address); > + > + void (*invalidate_range_start)(struct mmu_notifier *mn, > + struct mm_struct *mm, > + unsigned long start, unsigned long end); > + void (*invalidate_range_end)(struct mmu_notifier *mn, > + struct mm_struct *mm, > + unsigned long start, unsigned long end); Still two methods ... > +void __mmu_notifier_release(struct mm_struct *mm) > +{ > + struct mmu_notifier *mn; > + unsigned seq; > + > + seq = read_seqbegin(&mm->mmu_notifier_lock); > + while (unlikely(!hlist_empty(&mm->mmu_notifier_list))) { > + mn = hlist_entry(mm->mmu_notifier_list.first, > + struct mmu_notifier, > + hlist); > + hlist_del(&mn->hlist); > + if (mn->ops->release) > + mn->ops->release(mn, mm); > + BUG_ON(read_seqretry(&mm->mmu_notifier_lock, seq)); > + } > +} seqlock just taken for checking if everything is ok? > + > +/* > + * If no young bitflag is supported by the hardware, ->clear_flush_young can > + * unmap the address and return 1 or 0 depending if the mapping previously > + * existed or not. > + */ > +int __mmu_notifier_clear_flush_young(struct mm_struct *mm, > + unsigned long address) > +{ > + struct mmu_notifier *mn; > + struct hlist_node *n; > + int young = 0; > + unsigned seq; > + > + seq = read_seqbegin(&mm->mmu_notifier_lock); > + do { > + hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) { > + if (mn->ops->clear_flush_young) > + young |= mn->ops->clear_flush_young(mn, mm, > + address); > + } > + } while (read_seqretry(&mm->mmu_notifier_lock, seq)); > + The critical section could be run multiple times for one callback which could result in multiple callbacks to clear the young bit. Guess not that big of an issue? > +void __mmu_notifier_invalidate_page(struct mm_struct *mm, > + unsigned long address) > +{ > + struct mmu_notifier *mn; > + struct hlist_node *n; > + unsigned seq; > + > + seq = read_seqbegin(&mm->mmu_notifier_lock); > + do { > + hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) { > + if (mn->ops->invalidate_page) > + mn->ops->invalidate_page(mn, mm, address); > + } > + } while (read_seqretry(&mm->mmu_notifier_lock, seq)); > +} Ok. Retry would try to invalidate the page a second time which is not a problem unless you would drop the refcount or make other state changes that require correspondence with mapping. I guess this is the reason that you stopped adding a refcount? > +void __mmu_notifier_invalidate_range_start(struct mm_struct *mm, > + unsigned long start, unsigned long end) > +{ > + struct mmu_notifier *mn; > + struct hlist_node *n; > + unsigned seq; > + > + seq = read_seqbegin(&mm->mmu_notifier_lock); > + do { > + hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) { > + if (mn->ops->invalidate_range_start) > + mn->ops->invalidate_range_start(mn, mm, > + start, end); > + } > + } while (read_seqretry(&mm->mmu_notifier_lock, seq)); > +} Multiple invalidate_range_starts on the same range? This means the driver needs to be able to deal with the situation and ignore the repeated call? > +void __mmu_notifier_invalidate_range_end(struct mm_struct *mm, > + unsigned long start, unsigned long end) > +{ > + struct mmu_notifier *mn; > + struct hlist_node *n; > + unsigned seq; > + > + seq = read_seqbegin(&mm->mmu_notifier_lock); > + do { > + hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) { > + if (mn->ops->invalidate_range_end) > + mn->ops->invalidate_range_end(mn, mm, > + start, end); > + } > + } while (read_seqretry(&mm->mmu_notifier_lock, seq)); > +} Retry can lead to multiple invalidate_range callbacks with the same parameters? Driver needs to ignore if the range is already clear? From clameter at sgi.com Wed Apr 2 15:41:34 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 2 Apr 2008 15:41:34 -0700 (PDT) Subject: [ofa-general] Re: EMM: Require single threadedness for registration. In-Reply-To: <20080402221716.GY19189@duo.random> References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> <20080402220148.GV19189@duo.random> <20080402221716.GY19189@duo.random> Message-ID: On Thu, 3 Apr 2008, Andrea Arcangeli wrote: > Why can't it fire on the mm_struct where GRU just registered? That > mm_struct existed way before GRU registered, and VM is free to unmap > it w/o mmap_sem if there was any memory pressure. Right. Hmmm... Bad situation. We would have invalidate_start take a lock to prevent registration until _end has run. We could use stop_machine_run to register the notifier.... ;-). From ralph.campbell at qlogic.com Wed Apr 2 15:49:01 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:49:01 -0700 Subject: [ofa-general] [PATCH 0/20] IB/ipath -- DDR HCA patches in for-roland for 2.6.26 Message-ID: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> The following patches add the remaining changes needed to fully support the QLogic 7220 DDR HCAs. These also will make 2.6.26 match what is in OFED-1.3 plus some recent minor fixes and code style clean up. These can also be pulled into Roland's infiniband.git for-2.6.26 repo using: git pull git://git.qlogic.com/ipath-linux-2.6 for-roland From ralph.campbell at qlogic.com Wed Apr 2 15:49:06 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:49:06 -0700 Subject: [ofa-general] [PATCH 01/20] IB/ipath - Allow old and new diagnostic packet formats In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: <20080402224906.28598.75040.stgit@eng-46.mv.qlogic.com> From: Michael Albaugh This patch checks for old and new format writes to send a packet via the diagnostic interface. Signed-off-by: Michael Albaugh --- drivers/infiniband/hw/ipath/ipath_diag.c | 9 +++++++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_diag.c b/drivers/infiniband/hw/ipath/ipath_diag.c index af59bf3..c9bfd82 100644 --- a/drivers/infiniband/hw/ipath/ipath_diag.c +++ b/drivers/infiniband/hw/ipath/ipath_diag.c @@ -332,12 +332,17 @@ static ssize_t ipath_diagpkt_write(struct file *fp, u64 val; u32 l_state, lt_state; /* LinkState, LinkTrainingState */ - if (count != sizeof(dp)) { + if (count < sizeof(odp)) { ret = -EINVAL; goto bail; } - if (copy_from_user(&dp, data, sizeof(dp))) { + if (count == sizeof(dp)) { + if (copy_from_user(&dp, data, sizeof(dp))) { + ret = -EFAULT; + goto bail; + } + } else if (copy_from_user(&odp, data, sizeof(odp))) { ret = -EFAULT; goto bail; } From ralph.campbell at qlogic.com Wed Apr 2 15:49:11 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:49:11 -0700 Subject: [ofa-general] [PATCH 02/20] IB/ipath - fix some white space and code style issues In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: <20080402224911.28598.34434.stgit@eng-46.mv.qlogic.com> This patch makes some white space changes and minor non-functional changes to more closely match the code in OFED-1.3. Signed-off-by: Ralph Campbell --- drivers/infiniband/hw/ipath/ipath_driver.c | 29 ++++++++++++++----------- drivers/infiniband/hw/ipath/ipath_init_chip.c | 7 +++--- drivers/infiniband/hw/ipath/ipath_intr.c | 16 +++++++------- drivers/infiniband/hw/ipath/ipath_kernel.h | 4 ++- drivers/infiniband/hw/ipath/ipath_registers.h | 4 ++- drivers/infiniband/hw/ipath/ipath_stats.c | 13 ++++++----- 6 files changed, 38 insertions(+), 35 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index dfa009a..f79d9cc 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -627,7 +627,8 @@ static int __devinit ipath_init_one(struct pci_dev *pdev, goto bail; bail_irqsetup: - if (pdev->irq) free_irq(pdev->irq, dd); + if (pdev->irq) + free_irq(pdev->irq, dd); bail_iounmap: iounmap((volatile void __iomem *) dd->ipath_kregbase); @@ -1704,7 +1705,10 @@ bail: void ipath_cancel_sends(struct ipath_devdata *dd, int restore_sendctrl) { ipath_dbg("Cancelling all in-progress send buffers\n"); - dd->ipath_lastcancel = jiffies+HZ/2; /* skip armlaunch errs a bit */ + + /* skip armlaunch errs for a while */ + dd->ipath_lastcancel = jiffies + HZ / 2; + /* * the abort bit is auto-clearing. We read scratch to be sure * that cancels and the abort have taken effect in the chip. @@ -2070,9 +2074,8 @@ void ipath_set_led_override(struct ipath_devdata *dd, unsigned int val) dd->ipath_led_override_timer.data = (unsigned long) dd; dd->ipath_led_override_timer.expires = jiffies + 1; add_timer(&dd->ipath_led_override_timer); - } else { + } else atomic_dec(&dd->ipath_led_override_timer_active); - } } /** @@ -2220,12 +2223,12 @@ void ipath_free_pddata(struct ipath_devdata *dd, struct ipath_portdata *pd) "ipath_port0_skbinfo @ %p\n", pd->port_port, skbinfo); for (e = 0; e < dd->ipath_rcvegrcnt; e++) - if (skbinfo[e].skb) { - pci_unmap_single(dd->pcidev, skbinfo[e].phys, - dd->ipath_ibmaxlen, - PCI_DMA_FROMDEVICE); - dev_kfree_skb(skbinfo[e].skb); - } + if (skbinfo[e].skb) { + pci_unmap_single(dd->pcidev, skbinfo[e].phys, + dd->ipath_ibmaxlen, + PCI_DMA_FROMDEVICE); + dev_kfree_skb(skbinfo[e].skb); + } vfree(skbinfo); } kfree(pd->port_tid_pg_list); @@ -2468,10 +2471,10 @@ void ipath_hol_event(unsigned long opaque) int ipath_set_rx_pol_inv(struct ipath_devdata *dd, u8 new_pol_inv) { u64 val; - if ( new_pol_inv > INFINIPATH_XGXS_RX_POL_MASK ) { + + if (new_pol_inv > INFINIPATH_XGXS_RX_POL_MASK) return -1; - } - if ( dd->ipath_rx_pol_inv != new_pol_inv ) { + if (dd->ipath_rx_pol_inv != new_pol_inv) { dd->ipath_rx_pol_inv = new_pol_inv; val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_xgxsconfig); val &= ~(INFINIPATH_XGXS_RX_POL_MASK << diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c index 786a5e0..94f938f 100644 --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c @@ -645,7 +645,6 @@ done: return ret; } - /** * ipath_init_chip - do the actual initialization sequence on the chip * @dd: the infinipath device @@ -754,7 +753,7 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit) dd->ipath_f_early_init(dd); /* - * cancel any possible active sends from early driver load. + * Cancel any possible active sends from early driver load. * Follows early_init because some chips have to initialize * PIO buffers in early_init to avoid false parity errors. */ @@ -884,7 +883,7 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit) &dd->pcidev->dev, pd->port_rcvhdrq_size, &dd->ipath_dummy_hdrq_phys, gfp_flags); - if (!dd->ipath_dummy_hdrq ) { + if (!dd->ipath_dummy_hdrq) { dev_info(&dd->pcidev->dev, "Couldn't allocate 0x%lx bytes for dummy hdrq\n", pd->port_rcvhdrq_size); @@ -899,7 +898,7 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit) */ ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, 0ULL); - if(!dd->ipath_stats_timer_active) { + if (!dd->ipath_stats_timer_active) { /* * first init, or after an admin disable/enable * set up stats retrieval timer, even if we had errors diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c index d1e13a4..41329e7 100644 --- a/drivers/infiniband/hw/ipath/ipath_intr.c +++ b/drivers/infiniband/hw/ipath/ipath_intr.c @@ -590,18 +590,19 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs) * ones on this particular interrupt, which also isn't great */ dd->ipath_maskederrs |= dd->ipath_lasterror | errs; + dd->ipath_errormask &= ~dd->ipath_maskederrs; ipath_write_kreg(dd, dd->ipath_kregs->kr_errormask, - dd->ipath_errormask); + dd->ipath_errormask); s_iserr = ipath_decode_err(msg, sizeof msg, - dd->ipath_maskederrs); + dd->ipath_maskederrs); if (dd->ipath_maskederrs & - ~(INFINIPATH_E_RRCVEGRFULL | - INFINIPATH_E_RRCVHDRFULL | INFINIPATH_E_PKTERRS)) + ~(INFINIPATH_E_RRCVEGRFULL | + INFINIPATH_E_RRCVHDRFULL | INFINIPATH_E_PKTERRS)) ipath_dev_err(dd, "Temporarily disabling " "error(s) %llx reporting; too frequent (%s)\n", - (unsigned long long)dd->ipath_maskederrs, + (unsigned long long) dd->ipath_maskederrs, msg); else { /* @@ -786,7 +787,6 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs) return chkerrpkts; } - /* * try to cleanup as much as possible for anything that might have gone * wrong while in freeze mode, such as pio buffers being written by user @@ -974,6 +974,7 @@ static void handle_urcv(struct ipath_devdata *dd, u32 istat) dd->ipath_i_rcvurg_mask); for (i = 1; i < dd->ipath_cfgports; i++) { struct ipath_portdata *pd = dd->ipath_pd[i]; + if (portr & (1 << i) && pd && pd->port_cnt) { if (test_and_clear_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag)) { @@ -1095,8 +1096,7 @@ irqreturn_t ipath_intr(int irq, void *data) gpiostatus = ipath_read_kreg32( dd, dd->ipath_kregs->kr_gpio_status); - /* First the error-counter case. - */ + /* First the error-counter case. */ if ((gpiostatus & IPATH_GPIO_ERRINTR_MASK) && (dd->ipath_flags & IPATH_GPIO_ERRINTRS)) { /* want to clear the bits we see asserted. */ diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index f10442f..8018383 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -812,7 +812,7 @@ void ipath_hol_event(unsigned long); */ /* chip can report link latency (IB 1.2) */ #define IPATH_HAS_LINK_LATENCY 0x1 -/* The chip is up and initted */ + /* The chip is up and initted */ #define IPATH_INITTED 0x2 /* set if any user code has set kr_rcvhdrsize */ #define IPATH_RCVHDRSZ_SET 0x4 @@ -1148,7 +1148,7 @@ extern struct mutex ipath_mutex; # define __IPATH_DBG_WHICH(which,fmt,...) \ do { \ - if(unlikely(ipath_debug&(which))) \ + if (unlikely(ipath_debug & (which))) \ printk(KERN_DEBUG IPATH_DRV_NAME ": %s: " fmt, \ __func__,##__VA_ARGS__); \ } while(0) diff --git a/drivers/infiniband/hw/ipath/ipath_registers.h b/drivers/infiniband/hw/ipath/ipath_registers.h index 61e5621..f49f184 100644 --- a/drivers/infiniband/hw/ipath/ipath_registers.h +++ b/drivers/infiniband/hw/ipath/ipath_registers.h @@ -186,8 +186,8 @@ #define INFINIPATH_IBCC_LINKINITCMD_SLEEP 3 #define INFINIPATH_IBCC_LINKINITCMD_SHIFT 16 #define INFINIPATH_IBCC_LINKCMD_MASK 0x3ULL -#define INFINIPATH_IBCC_LINKCMD_DOWN 1 /* move to 0x11 */ -#define INFINIPATH_IBCC_LINKCMD_ARMED 2 /* move to 0x21 */ +#define INFINIPATH_IBCC_LINKCMD_DOWN 1 /* move to 0x11 */ +#define INFINIPATH_IBCC_LINKCMD_ARMED 2 /* move to 0x21 */ #define INFINIPATH_IBCC_LINKCMD_ACTIVE 3 /* move to 0x31 */ #define INFINIPATH_IBCC_LINKCMD_SHIFT 18 #define INFINIPATH_IBCC_MAXPKTLEN_MASK 0x7FFULL diff --git a/drivers/infiniband/hw/ipath/ipath_stats.c b/drivers/infiniband/hw/ipath/ipath_stats.c index d2725cd..57eb1d5 100644 --- a/drivers/infiniband/hw/ipath/ipath_stats.c +++ b/drivers/infiniband/hw/ipath/ipath_stats.c @@ -293,8 +293,8 @@ void ipath_get_faststats(unsigned long opaque) iserr = ipath_decode_err(ebuf, sizeof ebuf, dd->ipath_maskederrs); if (dd->ipath_maskederrs & - ~(INFINIPATH_E_RRCVEGRFULL | INFINIPATH_E_RRCVHDRFULL | - INFINIPATH_E_PKTERRS )) + ~(INFINIPATH_E_RRCVEGRFULL | INFINIPATH_E_RRCVHDRFULL | + INFINIPATH_E_PKTERRS)) ipath_dev_err(dd, "Re-enabling masked errors " "(%s)\n", ebuf); else { @@ -306,17 +306,18 @@ void ipath_get_faststats(unsigned long opaque) * level. */ if (iserr) - ipath_dbg("Re-enabling queue full errors (%s)\n", - ebuf); + ipath_dbg( + "Re-enabling queue full errors (%s)\n", + ebuf); else ipath_cdbg(ERRPKT, "Re-enabling packet" - " problem interrupt (%s)\n", ebuf); + " problem interrupt (%s)\n", ebuf); } /* re-enable masked errors */ dd->ipath_errormask |= dd->ipath_maskederrs; ipath_write_kreg(dd, dd->ipath_kregs->kr_errormask, - dd->ipath_errormask); + dd->ipath_errormask); dd->ipath_maskederrs = 0; } From ralph.campbell at qlogic.com Wed Apr 2 15:49:16 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:49:16 -0700 Subject: [ofa-general] [PATCH 03/20] IB/ipath - add support for 7220 receive queue changes In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: <20080402224916.28598.52413.stgit@eng-46.mv.qlogic.com> Newer HCAs have a HW option to write a sequence number to each receive queue entry and avoid a separate DMA of the tail register to memory. This patch adds support for these changes. Signed-off-by: Ralph Campbell --- drivers/infiniband/hw/ipath/ipath_common.h | 31 ++++ drivers/infiniband/hw/ipath/ipath_driver.c | 194 ++++++++++++++----------- drivers/infiniband/hw/ipath/ipath_file_ops.c | 34 ++-- drivers/infiniband/hw/ipath/ipath_iba6110.c | 2 drivers/infiniband/hw/ipath/ipath_iba6120.c | 2 drivers/infiniband/hw/ipath/ipath_init_chip.c | 152 +++++++++++--------- drivers/infiniband/hw/ipath/ipath_intr.c | 46 +++--- drivers/infiniband/hw/ipath/ipath_kernel.h | 53 +++++-- drivers/infiniband/hw/ipath/ipath_registers.h | 2 drivers/infiniband/hw/ipath/ipath_stats.c | 14 +- 10 files changed, 305 insertions(+), 225 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_common.h b/drivers/infiniband/hw/ipath/ipath_common.h index 591901a..edd4183 100644 --- a/drivers/infiniband/hw/ipath/ipath_common.h +++ b/drivers/infiniband/hw/ipath/ipath_common.h @@ -198,7 +198,7 @@ typedef enum _ipath_ureg { #define IPATH_RUNTIME_FORCE_WC_ORDER 0x4 #define IPATH_RUNTIME_RCVHDR_COPY 0x8 #define IPATH_RUNTIME_MASTER 0x10 -/* 0x20 and 0x40 are no longer used, but are reserved for ABI compatibility */ +#define IPATH_RUNTIME_NODMA_RTAIL 0x80 #define IPATH_RUNTIME_FORCE_PIOAVAIL 0x400 #define IPATH_RUNTIME_PIO_REGSWAPPED 0x800 @@ -662,8 +662,12 @@ struct infinipath_counters { #define INFINIPATH_RHF_LENGTH_SHIFT 0 #define INFINIPATH_RHF_RCVTYPE_MASK 0x7 #define INFINIPATH_RHF_RCVTYPE_SHIFT 11 -#define INFINIPATH_RHF_EGRINDEX_MASK 0x7FF +#define INFINIPATH_RHF_EGRINDEX_MASK 0xFFF #define INFINIPATH_RHF_EGRINDEX_SHIFT 16 +#define INFINIPATH_RHF_SEQ_MASK 0xF +#define INFINIPATH_RHF_SEQ_SHIFT 0 +#define INFINIPATH_RHF_HDRQ_OFFSET_MASK 0x7FF +#define INFINIPATH_RHF_HDRQ_OFFSET_SHIFT 4 #define INFINIPATH_RHF_H_ICRCERR 0x80000000 #define INFINIPATH_RHF_H_VCRCERR 0x40000000 #define INFINIPATH_RHF_H_PARITYERR 0x20000000 @@ -673,6 +677,8 @@ struct infinipath_counters { #define INFINIPATH_RHF_H_TIDERR 0x02000000 #define INFINIPATH_RHF_H_MKERR 0x01000000 #define INFINIPATH_RHF_H_IBERR 0x00800000 +#define INFINIPATH_RHF_H_ERR_MASK 0xFF800000 +#define INFINIPATH_RHF_L_USE_EGR 0x80000000 #define INFINIPATH_RHF_L_SWA 0x00008000 #define INFINIPATH_RHF_L_SWB 0x00004000 @@ -696,6 +702,7 @@ struct infinipath_counters { /* SendPIO per-buffer control */ #define INFINIPATH_SP_TEST 0x40 #define INFINIPATH_SP_TESTEBP 0x20 +#define INFINIPATH_SP_TRIGGER_SHIFT 15 /* SendPIOAvail bits */ #define INFINIPATH_SENDPIOAVAIL_BUSY_SHIFT 1 @@ -762,6 +769,7 @@ struct ether_header { #define IPATH_MSN_MASK 0xFFFFFF #define IPATH_QPN_MASK 0xFFFFFF #define IPATH_MULTICAST_LID_BASE 0xC000 +#define IPATH_EAGER_TID_ID INFINIPATH_I_TID_MASK #define IPATH_MULTICAST_QPN 0xFFFFFF /* Receive Header Queue: receive type (from infinipath) */ @@ -781,7 +789,7 @@ struct ether_header { */ static inline __u32 ipath_hdrget_err_flags(const __le32 * rbuf) { - return __le32_to_cpu(rbuf[1]); + return __le32_to_cpu(rbuf[1]) & INFINIPATH_RHF_H_ERR_MASK; } static inline __u32 ipath_hdrget_rcv_type(const __le32 * rbuf) @@ -802,6 +810,23 @@ static inline __u32 ipath_hdrget_index(const __le32 * rbuf) & INFINIPATH_RHF_EGRINDEX_MASK; } +static inline __u32 ipath_hdrget_seq(const __le32 *rbuf) +{ + return (__le32_to_cpu(rbuf[1]) >> INFINIPATH_RHF_SEQ_SHIFT) + & INFINIPATH_RHF_SEQ_MASK; +} + +static inline __u32 ipath_hdrget_offset(const __le32 *rbuf) +{ + return (__le32_to_cpu(rbuf[1]) >> INFINIPATH_RHF_HDRQ_OFFSET_SHIFT) + & INFINIPATH_RHF_HDRQ_OFFSET_MASK; +} + +static inline __u32 ipath_hdrget_use_egr_buf(const __le32 *rbuf) +{ + return __le32_to_cpu(rbuf[0]) & INFINIPATH_RHF_L_USE_EGR; +} + static inline __u32 ipath_hdrget_ipath_ver(__le32 hdrword) { return (__le32_to_cpu(hdrword) >> INFINIPATH_I_VERS_SHIFT) diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index f79d9cc..eef2599 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -41,7 +41,6 @@ #include "ipath_kernel.h" #include "ipath_verbs.h" -#include "ipath_common.h" static void ipath_update_pio_bufs(struct ipath_devdata *); @@ -720,6 +719,8 @@ static void __devexit cleanup_device(struct ipath_devdata *dd) tmpp = dd->ipath_pageshadow; dd->ipath_pageshadow = NULL; vfree(tmpp); + + dd->ipath_egrtidbase = NULL; } /* @@ -1078,18 +1079,17 @@ static void ipath_rcv_hdrerr(struct ipath_devdata *dd, u32 eflags, u32 l, u32 etail, - u64 *rc) + __le32 *rhf_addr, + struct ipath_message_header *hdr) { char emsg[128]; - struct ipath_message_header *hdr; get_rhf_errstring(eflags, emsg, sizeof emsg); - hdr = (struct ipath_message_header *)&rc[1]; ipath_cdbg(PKT, "RHFerrs %x hdrqtail=%x typ=%u " "tlen=%x opcode=%x egridx=%x: %s\n", eflags, l, - ipath_hdrget_rcv_type((__le32 *) rc), - ipath_hdrget_length_in_bytes((__le32 *) rc), + ipath_hdrget_rcv_type(rhf_addr), + ipath_hdrget_length_in_bytes(rhf_addr), be32_to_cpu(hdr->bth[0]) >> 24, etail, emsg); @@ -1114,55 +1114,52 @@ static void ipath_rcv_hdrerr(struct ipath_devdata *dd, */ void ipath_kreceive(struct ipath_portdata *pd) { - u64 *rc; struct ipath_devdata *dd = pd->port_dd; + __le32 *rhf_addr; void *ebuf; const u32 rsize = dd->ipath_rcvhdrentsize; /* words */ const u32 maxcnt = dd->ipath_rcvhdrcnt * rsize; /* words */ u32 etail = -1, l, hdrqtail; struct ipath_message_header *hdr; - u32 eflags, i, etype, tlen, pkttot = 0, updegr=0, reloop=0; + u32 eflags, i, etype, tlen, pkttot = 0, updegr = 0, reloop = 0; static u64 totcalls; /* stats, may eventually remove */ - - if (!dd->ipath_hdrqtailptr) { - ipath_dev_err(dd, - "hdrqtailptr not set, can't do receives\n"); - goto bail; - } + int last; l = pd->port_head; - hdrqtail = ipath_get_rcvhdrtail(pd); - if (l == hdrqtail) - goto bail; + rhf_addr = (__le32 *) pd->port_rcvhdrq + l + dd->ipath_rhf_offset; + if (dd->ipath_flags & IPATH_NODMA_RTAIL) { + u32 seq = ipath_hdrget_seq(rhf_addr); -reloop: - for (i = 0; l != hdrqtail; i++) { - u32 qp; - u8 *bthbytes; - - rc = (u64 *) (pd->port_rcvhdrq + (l << 2)); - hdr = (struct ipath_message_header *)&rc[1]; - /* - * could make a network order version of IPATH_KD_QP, and - * do the obvious shift before masking to speed this up. - */ - qp = ntohl(hdr->bth[1]) & 0xffffff; - bthbytes = (u8 *) hdr->bth; + if (seq != pd->port_seq_cnt) + goto bail; + hdrqtail = 0; + } else { + hdrqtail = ipath_get_rcvhdrtail(pd); + if (l == hdrqtail) + goto bail; + smp_rmb(); + } - eflags = ipath_hdrget_err_flags((__le32 *) rc); - etype = ipath_hdrget_rcv_type((__le32 *) rc); +reloop: + for (last = 0, i = 1; !last; i++) { + hdr = dd->ipath_f_get_msgheader(dd, rhf_addr); + eflags = ipath_hdrget_err_flags(rhf_addr); + etype = ipath_hdrget_rcv_type(rhf_addr); /* total length */ - tlen = ipath_hdrget_length_in_bytes((__le32 *) rc); + tlen = ipath_hdrget_length_in_bytes(rhf_addr); ebuf = NULL; - if (etype != RCVHQ_RCV_TYPE_EXPECTED) { + if ((dd->ipath_flags & IPATH_NODMA_RTAIL) ? + ipath_hdrget_use_egr_buf(rhf_addr) : + (etype != RCVHQ_RCV_TYPE_EXPECTED)) { /* - * it turns out that the chips uses an eager buffer + * It turns out that the chip uses an eager buffer * for all non-expected packets, whether it "needs" * one or not. So always get the index, but don't * set ebuf (so we try to copy data) unless the * length requires it. */ - etail = ipath_hdrget_index((__le32 *) rc); + etail = ipath_hdrget_index(rhf_addr); + updegr = 1; if (tlen > sizeof(*hdr) || etype == RCVHQ_RCV_TYPE_NON_KD) ebuf = ipath_get_egrbuf(dd, etail); @@ -1173,75 +1170,91 @@ reloop: * packets; only ipathhdrerr should be set. */ - if (etype != RCVHQ_RCV_TYPE_NON_KD && etype != - RCVHQ_RCV_TYPE_ERROR && ipath_hdrget_ipath_ver( - hdr->iph.ver_port_tid_offset) != - IPS_PROTO_VERSION) { + if (etype != RCVHQ_RCV_TYPE_NON_KD && + etype != RCVHQ_RCV_TYPE_ERROR && + ipath_hdrget_ipath_ver(hdr->iph.ver_port_tid_offset) != + IPS_PROTO_VERSION) ipath_cdbg(PKT, "Bad InfiniPath protocol version " "%x\n", etype); - } if (unlikely(eflags)) - ipath_rcv_hdrerr(dd, eflags, l, etail, rc); + ipath_rcv_hdrerr(dd, eflags, l, etail, rhf_addr, hdr); else if (etype == RCVHQ_RCV_TYPE_NON_KD) { - ipath_ib_rcv(dd->verbs_dev, rc + 1, ebuf, tlen); + ipath_ib_rcv(dd->verbs_dev, (u32 *)hdr, ebuf, tlen); if (dd->ipath_lli_counter) dd->ipath_lli_counter--; + } else if (etype == RCVHQ_RCV_TYPE_EAGER) { + u8 opcode = be32_to_cpu(hdr->bth[0]) >> 24; + u32 qp = be32_to_cpu(hdr->bth[1]) & 0xffffff; ipath_cdbg(PKT, "typ %x, opcode %x (eager, " "qp=%x), len %x; ignored\n", - etype, bthbytes[0], qp, tlen); + etype, opcode, qp, tlen); } - else if (etype == RCVHQ_RCV_TYPE_EAGER) - ipath_cdbg(PKT, "typ %x, opcode %x (eager, " - "qp=%x), len %x; ignored\n", - etype, bthbytes[0], qp, tlen); else if (etype == RCVHQ_RCV_TYPE_EXPECTED) ipath_dbg("Bug: Expected TID, opcode %x; ignored\n", - be32_to_cpu(hdr->bth[0]) & 0xff); + be32_to_cpu(hdr->bth[0]) >> 24); else { /* * error packet, type of error unknown. * Probably type 3, but we don't know, so don't * even try to print the opcode, etc. + * Usually caused by a "bad packet", that has no + * BTH, when the LRH says it should. */ - ipath_dbg("Error Pkt, but no eflags! egrbuf %x, " - "len %x\nhdrq@%lx;hdrq+%x rhf: %llx; " - "hdr %llx %llx %llx %llx %llx\n", - etail, tlen, (unsigned long) rc, l, - (unsigned long long) rc[0], - (unsigned long long) rc[1], - (unsigned long long) rc[2], - (unsigned long long) rc[3], - (unsigned long long) rc[4], - (unsigned long long) rc[5]); + ipath_cdbg(ERRPKT, "Error Pkt, but no eflags! egrbuf" + " %x, len %x hdrq+%x rhf: %Lx\n", + etail, tlen, l, + le64_to_cpu(*(__le64 *) rhf_addr)); + if (ipath_debug & __IPATH_ERRPKTDBG) { + u32 j, *d, dw = rsize-2; + if (rsize > (tlen>>2)) + dw = tlen>>2; + d = (u32 *)hdr; + printk(KERN_DEBUG "EPkt rcvhdr(%x dw):\n", + dw); + for (j = 0; j < dw; j++) + printk(KERN_DEBUG "%8x%s", d[j], + (j%8) == 7 ? "\n" : " "); + printk(KERN_DEBUG ".\n"); + } } l += rsize; if (l >= maxcnt) l = 0; - if (etype != RCVHQ_RCV_TYPE_EXPECTED) - updegr = 1; + rhf_addr = (__le32 *) pd->port_rcvhdrq + + l + dd->ipath_rhf_offset; + if (dd->ipath_flags & IPATH_NODMA_RTAIL) { + u32 seq = ipath_hdrget_seq(rhf_addr); + + if (++pd->port_seq_cnt > 13) + pd->port_seq_cnt = 1; + if (seq != pd->port_seq_cnt) + last = 1; + } else if (l == hdrqtail) + last = 1; /* * update head regs on last packet, and every 16 packets. * Reduce bus traffic, while still trying to prevent * rcvhdrq overflows, for when the queue is nearly full */ - if (l == hdrqtail || (i && !(i&0xf))) { - u64 lval; - if (l == hdrqtail) - /* request IBA6120 interrupt only on last */ - lval = dd->ipath_rhdrhead_intr_off | l; - else - lval = l; - ipath_write_ureg(dd, ur_rcvhdrhead, lval, 0); + if (last || !(i & 0xf)) { + u64 lval = l; + + /* request IBA6120 and 7220 interrupt only on last */ + if (last) + lval |= dd->ipath_rhdrhead_intr_off; + ipath_write_ureg(dd, ur_rcvhdrhead, lval, + pd->port_port); if (updegr) { ipath_write_ureg(dd, ur_rcvegrindexhead, - etail, 0); + etail, pd->port_port); updegr = 0; } } } - if (!dd->ipath_rhdrhead_intr_off && !reloop) { + if (!dd->ipath_rhdrhead_intr_off && !reloop && + !(dd->ipath_flags & IPATH_NODMA_RTAIL)) { /* IBA6110 workaround; we can have a race clearing chip * interrupt with another interrupt about to be delivered, * and can clear it before it is delivered on the GPIO @@ -1638,19 +1651,27 @@ int ipath_create_rcvhdrq(struct ipath_devdata *dd, ret = -ENOMEM; goto bail; } - pd->port_rcvhdrtail_kvaddr = dma_alloc_coherent( - &dd->pcidev->dev, PAGE_SIZE, &phys_hdrqtail, GFP_KERNEL); - if (!pd->port_rcvhdrtail_kvaddr) { - ipath_dev_err(dd, "attempt to allocate 1 page " - "for port %u rcvhdrqtailaddr failed\n", - pd->port_port); - ret = -ENOMEM; - dma_free_coherent(&dd->pcidev->dev, amt, - pd->port_rcvhdrq, pd->port_rcvhdrq_phys); - pd->port_rcvhdrq = NULL; - goto bail; + + if (!(dd->ipath_flags & IPATH_NODMA_RTAIL)) { + pd->port_rcvhdrtail_kvaddr = dma_alloc_coherent( + &dd->pcidev->dev, PAGE_SIZE, &phys_hdrqtail, + GFP_KERNEL); + if (!pd->port_rcvhdrtail_kvaddr) { + ipath_dev_err(dd, "attempt to allocate 1 page " + "for port %u rcvhdrqtailaddr " + "failed\n", pd->port_port); + ret = -ENOMEM; + dma_free_coherent(&dd->pcidev->dev, amt, + pd->port_rcvhdrq, + pd->port_rcvhdrq_phys); + pd->port_rcvhdrq = NULL; + goto bail; + } + pd->port_rcvhdrqtailaddr_phys = phys_hdrqtail; + ipath_cdbg(VERBOSE, "port %d hdrtailaddr, %llx " + "physical\n", pd->port_port, + (unsigned long long) phys_hdrqtail); } - pd->port_rcvhdrqtailaddr_phys = phys_hdrqtail; pd->port_rcvhdrq_size = amt; @@ -1660,10 +1681,6 @@ int ipath_create_rcvhdrq(struct ipath_devdata *dd, (unsigned long) pd->port_rcvhdrq_phys, (unsigned long) pd->port_rcvhdrq_size, pd->port_port); - - ipath_cdbg(VERBOSE, "port %d hdrtailaddr, %llx physical\n", - pd->port_port, - (unsigned long long) phys_hdrqtail); } else ipath_cdbg(VERBOSE, "reuse port %d rcvhdrq @%p %llx phys; " @@ -1687,7 +1704,6 @@ int ipath_create_rcvhdrq(struct ipath_devdata *dd, ipath_write_kreg_port(dd, dd->ipath_kregs->kr_rcvhdraddr, pd->port_port, pd->port_rcvhdrq_phys); - ret = 0; bail: return ret; } @@ -2222,7 +2238,7 @@ void ipath_free_pddata(struct ipath_devdata *dd, struct ipath_portdata *pd) ipath_cdbg(VERBOSE, "free closed port %d " "ipath_port0_skbinfo @ %p\n", pd->port_port, skbinfo); - for (e = 0; e < dd->ipath_rcvegrcnt; e++) + for (e = 0; e < dd->ipath_p0_rcvegrcnt; e++) if (skbinfo[e].skb) { pci_unmap_single(dd->pcidev, skbinfo[e].phys, dd->ipath_ibmaxlen, diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c index 1b232b2..17d4e97 100644 --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c @@ -1930,22 +1930,25 @@ static int ipath_do_user_init(struct file *fp, pd->port_hdrqfull_poll = pd->port_hdrqfull; /* - * now enable the port; the tail registers will be written to memory - * by the chip as soon as it sees the write to - * dd->ipath_kregs->kr_rcvctrl. The update only happens on - * transition from 0 to 1, so clear it first, then set it as part of - * enabling the port. This will (very briefly) affect any other - * open ports, but it shouldn't be long enough to be an issue. - * We explictly set the in-memory copy to 0 beforehand, so we don't - * have to wait to be sure the DMA update has happened. + * Now enable the port for receive. + * For chips that are set to DMA the tail register to memory + * when they change (and when the update bit transitions from + * 0 to 1. So for those chips, we turn it off and then back on. + * This will (very briefly) affect any other open ports, but the + * duration is very short, and therefore isn't an issue. We + * explictly set the in-memory tail copy to 0 beforehand, so we + * don't have to wait to be sure the DMA update has happened + * (chip resets head/tail to 0 on transition to enable). */ - if (pd->port_rcvhdrtail_kvaddr) - ipath_clear_rcvhdrtail(pd); set_bit(dd->ipath_r_portenable_shift + pd->port_port, &dd->ipath_rcvctrl); - ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, + if (!(dd->ipath_flags & IPATH_NODMA_RTAIL)) { + if (pd->port_rcvhdrtail_kvaddr) + ipath_clear_rcvhdrtail(pd); + ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, dd->ipath_rcvctrl & ~(1ULL << dd->ipath_r_tailupd_shift)); + } ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, dd->ipath_rcvctrl); /* Notify any waiting slaves */ @@ -1973,14 +1976,15 @@ static void unlock_expected_tids(struct ipath_portdata *pd) ipath_cdbg(VERBOSE, "Port %u unlocking any locked expTID pages\n", pd->port_port); for (i = port_tidbase; i < maxtid; i++) { - if (!dd->ipath_pageshadow[i]) + struct page *ps = dd->ipath_pageshadow[i]; + + if (!ps) continue; + dd->ipath_pageshadow[i] = NULL; pci_unmap_page(dd->pcidev, dd->ipath_physshadow[i], PAGE_SIZE, PCI_DMA_FROMDEVICE); - ipath_release_user_pages_on_close(&dd->ipath_pageshadow[i], - 1); - dd->ipath_pageshadow[i] = NULL; + ipath_release_user_pages_on_close(&ps, 1); cnt++; ipath_stats.sps_pageunlocks++; } diff --git a/drivers/infiniband/hw/ipath/ipath_iba6110.c b/drivers/infiniband/hw/ipath/ipath_iba6110.c index d241f1c..02831ad 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba6110.c +++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c @@ -306,7 +306,9 @@ static const struct ipath_cregs ipath_ht_cregs = { /* kr_intstatus, kr_intclear, kr_intmask bits */ #define INFINIPATH_I_RCVURG_MASK ((1U<<9)-1) +#define INFINIPATH_I_RCVURG_SHIFT 0 #define INFINIPATH_I_RCVAVAIL_MASK ((1U<<9)-1) +#define INFINIPATH_I_RCVAVAIL_SHIFT 12 /* kr_hwerrclear, kr_hwerrmask, kr_hwerrstatus, bits */ #define INFINIPATH_HWE_HTCMEMPARITYERR_SHIFT 0 diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c index ce0f40f..907b61b 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba6120.c +++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c @@ -316,7 +316,9 @@ static const struct ipath_cregs ipath_pe_cregs = { /* kr_intstatus, kr_intclear, kr_intmask bits */ #define INFINIPATH_I_RCVURG_MASK ((1U<<5)-1) +#define INFINIPATH_I_RCVURG_SHIFT 0 #define INFINIPATH_I_RCVAVAIL_MASK ((1U<<5)-1) +#define INFINIPATH_I_RCVAVAIL_SHIFT 12 /* kr_hwerrclear, kr_hwerrmask, kr_hwerrstatus, bits */ #define INFINIPATH_HWE_PCIEMEMPARITYERR_MASK 0x000000000000003fULL diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c index 94f938f..720ff4d 100644 --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c @@ -219,14 +219,14 @@ static struct ipath_portdata *create_portdata0(struct ipath_devdata *dd) pd->port_cnt = 1; /* The port 0 pkey table is used by the layer interface. */ pd->port_pkeys[0] = IPATH_DEFAULT_P_KEY; + pd->port_seq_cnt = 1; } return pd; } -static int init_chip_first(struct ipath_devdata *dd, - struct ipath_portdata **pdp) +static int init_chip_first(struct ipath_devdata *dd) { - struct ipath_portdata *pd = NULL; + struct ipath_portdata *pd; int ret = 0; u64 val; @@ -242,12 +242,14 @@ static int init_chip_first(struct ipath_devdata *dd, else if (ipath_cfgports <= dd->ipath_portcnt) { dd->ipath_cfgports = ipath_cfgports; ipath_dbg("Configured to use %u ports out of %u in chip\n", - dd->ipath_cfgports, dd->ipath_portcnt); + dd->ipath_cfgports, ipath_read_kreg32(dd, + dd->ipath_kregs->kr_portcnt)); } else { dd->ipath_cfgports = dd->ipath_portcnt; ipath_dbg("Tried to configured to use %u ports; chip " "only supports %u\n", ipath_cfgports, - dd->ipath_portcnt); + ipath_read_kreg32(dd, + dd->ipath_kregs->kr_portcnt)); } /* * Allocate full portcnt array, rather than just cfgports, because @@ -324,36 +326,39 @@ static int init_chip_first(struct ipath_devdata *dd, mutex_init(&dd->ipath_eep_lock); done: - *pdp = pd; return ret; } /** * init_chip_reset - re-initialize after a reset, or enable * @dd: the infinipath device - * @pdp: output for port data * * sanity check at least some of the values after reset, and * ensure no receive or transmit (explictly, in case reset * failed */ -static int init_chip_reset(struct ipath_devdata *dd, - struct ipath_portdata **pdp) +static int init_chip_reset(struct ipath_devdata *dd) { u32 rtmp; + int i; + + /* + * ensure chip does no sends or receives, tail updates, or + * pioavail updates while we re-initialize + */ + dd->ipath_rcvctrl &= ~(1ULL << dd->ipath_r_tailupd_shift); + for (i = 0; i < dd->ipath_portcnt; i++) { + clear_bit(dd->ipath_r_portenable_shift + i, + &dd->ipath_rcvctrl); + clear_bit(dd->ipath_r_intravail_shift + i, + &dd->ipath_rcvctrl); + } + ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, + dd->ipath_rcvctrl); - *pdp = dd->ipath_pd[0]; - /* ensure chip does no sends or receives while we re-initialize */ - dd->ipath_control = dd->ipath_sendctrl = dd->ipath_rcvctrl = 0U; - ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, dd->ipath_rcvctrl); ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, dd->ipath_sendctrl); ipath_write_kreg(dd, dd->ipath_kregs->kr_control, dd->ipath_control); - rtmp = ipath_read_kreg32(dd, dd->ipath_kregs->kr_portcnt); - if (dd->ipath_portcnt != rtmp) - dev_info(&dd->pcidev->dev, "portcnt was %u before " - "reset, now %u, using original\n", - dd->ipath_portcnt, rtmp); rtmp = ipath_read_kreg32(dd, dd->ipath_kregs->kr_rcvtidcnt); if (rtmp != dd->ipath_rcvtidcnt) dev_info(&dd->pcidev->dev, "tidcnt was %u before " @@ -456,10 +461,10 @@ static void init_shadow_tids(struct ipath_devdata *dd) dd->ipath_physshadow = addrs; } -static void enable_chip(struct ipath_devdata *dd, - struct ipath_portdata *pd, int reinit) +static void enable_chip(struct ipath_devdata *dd, int reinit) { u32 val; + u64 rcvmask; unsigned long flags; int i; @@ -478,12 +483,15 @@ static void enable_chip(struct ipath_devdata *dd, spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags); /* - * enable port 0 receive, and receive interrupt. other ports - * done as user opens and inits them. + * Enable kernel ports' receive and receive interrupt. + * Other ports done as user opens and inits them. */ - dd->ipath_rcvctrl = (1ULL << dd->ipath_r_tailupd_shift) | - (1ULL << dd->ipath_r_portenable_shift) | - (1ULL << dd->ipath_r_intravail_shift); + rcvmask = 1ULL; + dd->ipath_rcvctrl |= (rcvmask << dd->ipath_r_portenable_shift) | + (rcvmask << dd->ipath_r_intravail_shift); + if (!(dd->ipath_flags & IPATH_NODMA_RTAIL)) + dd->ipath_rcvctrl |= (1ULL << dd->ipath_r_tailupd_shift); + ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, dd->ipath_rcvctrl); @@ -494,8 +502,8 @@ static void enable_chip(struct ipath_devdata *dd, dd->ipath_flags |= IPATH_INITTED; /* - * init our shadow copies of head from tail values, and write - * head values to match. + * Init our shadow copies of head from tail values, + * and write head values to match. */ val = ipath_read_ureg32(dd, ur_rcvegrindextail, 0); ipath_write_ureg(dd, ur_rcvegrindexhead, val, 0); @@ -529,8 +537,7 @@ static void enable_chip(struct ipath_devdata *dd, dd->ipath_flags |= IPATH_PRESENT; } -static int init_housekeeping(struct ipath_devdata *dd, - struct ipath_portdata **pdp, int reinit) +static int init_housekeeping(struct ipath_devdata *dd, int reinit) { char boardn[32]; int ret = 0; @@ -591,18 +598,9 @@ static int init_housekeeping(struct ipath_devdata *dd, ipath_write_kreg(dd, dd->ipath_kregs->kr_errorclear, INFINIPATH_E_RESET); - if (reinit) - ret = init_chip_reset(dd, pdp); - else - ret = init_chip_first(dd, pdp); - - if (ret) - goto done; - - ipath_cdbg(VERBOSE, "Revision %llx (PCI %x), %u ports, %u tids, " - "%u egrtids\n", (unsigned long long) dd->ipath_revision, - dd->ipath_pcirev, dd->ipath_portcnt, dd->ipath_rcvtidcnt, - dd->ipath_rcvegrcnt); + ipath_cdbg(VERBOSE, "Revision %llx (PCI %x)\n", + (unsigned long long) dd->ipath_revision, + dd->ipath_pcirev); if (((dd->ipath_revision >> INFINIPATH_R_SOFTWARE_SHIFT) & INFINIPATH_R_SOFTWARE_MASK) != IPATH_CHIP_SWVERSION) { @@ -641,6 +639,14 @@ static int init_housekeeping(struct ipath_devdata *dd, ipath_dbg("%s", dd->ipath_boardversion); + if (ret) + goto done; + + if (reinit) + ret = init_chip_reset(dd); + else + ret = init_chip_first(dd); + done: return ret; } @@ -666,11 +672,11 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit) u32 val32, kpiobufs; u32 piobufs, uports; u64 val; - struct ipath_portdata *pd = NULL; /* keep gcc4 happy */ + struct ipath_portdata *pd; gfp_t gfp_flags = GFP_USER | __GFP_COMP; unsigned long flags; - ret = init_housekeeping(dd, &pd, reinit); + ret = init_housekeeping(dd, reinit); if (ret) goto done; @@ -690,7 +696,7 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit) * we now use routines that backend onto __get_free_pages, the * rest would be wasted. */ - dd->ipath_rcvhdrcnt = dd->ipath_rcvegrcnt; + dd->ipath_rcvhdrcnt = max(dd->ipath_p0_rcvegrcnt, dd->ipath_rcvegrcnt); ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvhdrcnt, dd->ipath_rcvhdrcnt); @@ -721,8 +727,8 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit) if (kpiobufs + (uports * IPATH_MIN_USER_PORT_BUFCNT) > piobufs) { int i = (int) piobufs - (int) (uports * IPATH_MIN_USER_PORT_BUFCNT); - if (i < 0) - i = 0; + if (i < 1) + i = 1; dev_info(&dd->pcidev->dev, "Allocating %d PIO bufs of " "%d for kernel leaves too few for %d user ports " "(%d each); using %u\n", kpiobufs, @@ -741,6 +747,7 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit) ipath_dbg("allocating %u pbufs/port leaves %u unused, " "add to kernel\n", dd->ipath_pbufsport, val32); dd->ipath_lastport_piobuf -= val32; + kpiobufs += val32; ipath_dbg("%u pbufs/port leaves %u unused, add to kernel\n", dd->ipath_pbufsport, val32); } @@ -759,8 +766,10 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit) */ ipath_cancel_sends(dd, 0); - /* early_init sets rcvhdrentsize and rcvhdrsize, so this must be - * done after early_init */ + /* + * Early_init sets rcvhdrentsize and rcvhdrsize, so this must be + * done after early_init. + */ dd->ipath_hdrqlast = dd->ipath_rcvhdrentsize * (dd->ipath_rcvhdrcnt - 1); ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvhdrentsize, @@ -835,58 +844,65 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit) /* enable errors that are masked, at least this first time. */ ipath_write_kreg(dd, dd->ipath_kregs->kr_errormask, ~dd->ipath_maskederrs); - dd->ipath_errormask = ipath_read_kreg64(dd, - dd->ipath_kregs->kr_errormask); + dd->ipath_maskederrs = 0; /* don't re-enable ignored in timer */ + dd->ipath_errormask = + ipath_read_kreg64(dd, dd->ipath_kregs->kr_errormask); /* clear any interrupts up to this point (ints still not enabled) */ ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, -1LL); + dd->ipath_f_tidtemplate(dd); + /* * Set up the port 0 (kernel) rcvhdr q and egr TIDs. If doing * re-init, the simplest way to handle this is to free * existing, and re-allocate. * Need to re-create rest of port 0 portdata as well. */ + pd = dd->ipath_pd[0]; if (reinit) { - /* Alloc and init new ipath_portdata for port0, + struct ipath_portdata *npd; + + /* + * Alloc and init new ipath_portdata for port0, * Then free old pd. Could lead to fragmentation, but also * makes later support for hot-swap easier. */ - struct ipath_portdata *npd; npd = create_portdata0(dd); if (npd) { ipath_free_pddata(dd, pd); - dd->ipath_pd[0] = pd = npd; + dd->ipath_pd[0] = npd; + pd = npd; } else { - ipath_dev_err(dd, "Unable to allocate portdata for" - " port 0, failing\n"); + ipath_dev_err(dd, "Unable to allocate portdata" + " for port 0, failing\n"); ret = -ENOMEM; goto done; } } - dd->ipath_f_tidtemplate(dd); ret = ipath_create_rcvhdrq(dd, pd); - if (!ret) { - dd->ipath_hdrqtailptr = - (volatile __le64 *)pd->port_rcvhdrtail_kvaddr; + if (!ret) ret = create_port0_egr(dd); - } - if (ret) - ipath_dev_err(dd, "failed to allocate port 0 (kernel) " + if (ret) { + ipath_dev_err(dd, "failed to allocate kernel port's " "rcvhdrq and/or egr bufs\n"); + goto done; + } else - enable_chip(dd, pd, reinit); + enable_chip(dd, reinit); - - if (!ret && !reinit) { - /* used when we close a port, for DMA already in flight at close */ + if (!reinit) { + /* + * Used when we close a port, for DMA already in flight + * at close. + */ dd->ipath_dummy_hdrq = dma_alloc_coherent( - &dd->pcidev->dev, pd->port_rcvhdrq_size, + &dd->pcidev->dev, dd->ipath_pd[0]->port_rcvhdrq_size, &dd->ipath_dummy_hdrq_phys, gfp_flags); if (!dd->ipath_dummy_hdrq) { dev_info(&dd->pcidev->dev, "Couldn't allocate 0x%lx bytes for dummy hdrq\n", - pd->port_rcvhdrq_size); + dd->ipath_pd[0]->port_rcvhdrq_size); /* fallback to just 0'ing */ dd->ipath_dummy_hdrq_phys = 0UL; } diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c index 41329e7..826b96b 100644 --- a/drivers/infiniband/hw/ipath/ipath_intr.c +++ b/drivers/infiniband/hw/ipath/ipath_intr.c @@ -695,8 +695,7 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs) struct ipath_portdata *pd = dd->ipath_pd[i]; if (i == 0) { hd = pd->port_head; - tl = (u32) le64_to_cpu( - *dd->ipath_hdrqtailptr); + tl = ipath_get_hdrqtail(pd); } else if (pd && pd->port_cnt && pd->port_rcvhdrtail_kvaddr) { /* @@ -732,8 +731,7 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs) * vs user) */ ipath_stats.sps_etidfull++; - if (pd->port_head != - (u32) le64_to_cpu(*dd->ipath_hdrqtailptr)) + if (pd->port_head != ipath_get_hdrqtail(pd)) chkerrpkts = 1; } @@ -952,7 +950,7 @@ set: * process was waiting for a packet to arrive, and didn't want * to poll */ -static void handle_urcv(struct ipath_devdata *dd, u32 istat) +static void handle_urcv(struct ipath_devdata *dd, u64 istat) { u64 portr; int i; @@ -968,10 +966,10 @@ static void handle_urcv(struct ipath_devdata *dd, u32 istat) * and ipath_poll_next()... */ rmb(); - portr = ((istat >> INFINIPATH_I_RCVAVAIL_SHIFT) & - dd->ipath_i_rcvavail_mask) - | ((istat >> INFINIPATH_I_RCVURG_SHIFT) & - dd->ipath_i_rcvurg_mask); + portr = ((istat >> dd->ipath_i_rcvavail_shift) & + dd->ipath_i_rcvavail_mask) | + ((istat >> dd->ipath_i_rcvurg_shift) & + dd->ipath_i_rcvurg_mask); for (i = 1; i < dd->ipath_cfgports; i++) { struct ipath_portdata *pd = dd->ipath_pd[i]; @@ -991,7 +989,7 @@ static void handle_urcv(struct ipath_devdata *dd, u32 istat) } if (rcvdint) { /* only want to take one interrupt, so turn off the rcv - * interrupt for all the ports that we did the wakeup on + * interrupt for all the ports that we set the rcv_waiting * (but never for kernel port) */ ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, @@ -1006,8 +1004,7 @@ irqreturn_t ipath_intr(int irq, void *data) ipath_err_t estat = 0; irqreturn_t ret; static unsigned unexpected = 0; - static const u32 port0rbits = (1U<ipath_kregs->kr_intclear, istat); /* - * handle port0 receive before checking for pio buffers available, - * since receives can overflow; piobuf waiters can afford a few - * extra cycles, since they were waiting anyway, and user's waiting - * for receive are at the bottom. + * Handle kernel receive queues before checking for pio buffers + * available since receives can overflow; piobuf waiters can afford + * a few extra cycles, since they were waiting anyway, and user's + * waiting for receive are at the bottom. */ - if (chk0rcv) { + kportrbits = (1ULL << dd->ipath_i_rcvavail_shift) | + (1ULL << dd->ipath_i_rcvurg_shift); + if (chk0rcv || (istat & kportrbits)) { + istat &= ~kportrbits; ipath_kreceive(dd->ipath_pd[0]); - istat &= ~port0rbits; } - if (istat & ((dd->ipath_i_rcvavail_mask << - INFINIPATH_I_RCVAVAIL_SHIFT) - | (dd->ipath_i_rcvurg_mask << - INFINIPATH_I_RCVURG_SHIFT))) + if (istat & ((dd->ipath_i_rcvavail_mask << dd->ipath_i_rcvavail_shift) | + (dd->ipath_i_rcvurg_mask << dd->ipath_i_rcvurg_shift))) handle_urcv(dd, istat); if (istat & INFINIPATH_I_SPIOBUFAVAIL) { diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index 8018383..7fae888 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -175,6 +175,8 @@ struct ipath_portdata { u16 poll_type; /* port rcvhdrq head offset */ u32 port_head; + /* receive packet sequence counter */ + u32 port_seq_cnt; }; struct sk_buff; @@ -224,11 +226,6 @@ struct ipath_devdata { unsigned long ipath_physaddr; /* base of memory alloced for ipath_kregbase, for free */ u64 *ipath_kregalloc; - /* - * virtual address where port0 rcvhdrqtail updated for this unit. - * only written to by the chip, not the driver. - */ - volatile __le64 *ipath_hdrqtailptr; /* ipath_cfgports pointers */ struct ipath_portdata **ipath_pd; /* sk_buffs used by port 0 eager receive queue */ @@ -286,6 +283,7 @@ struct ipath_devdata { /* per chip actions needed for IB Link up/down changes */ int (*ipath_f_ib_updown)(struct ipath_devdata *, int, u64); + unsigned ipath_lastegr_idx; struct ipath_ibdev *verbs_dev; struct timer_list verbs_timer; /* total dwords sent (summed from counter) */ @@ -593,14 +591,6 @@ struct ipath_devdata { u8 ipath_minrev; /* board rev, from ipath_revision */ u8 ipath_boardrev; - - u8 ipath_r_portenable_shift; - u8 ipath_r_intravail_shift; - u8 ipath_r_tailupd_shift; - u8 ipath_r_portcfg_shift; - - /* unit # of this chip, if present */ - int ipath_unit; /* saved for restore after reset */ u8 ipath_pci_cacheline; /* LID mask control */ @@ -616,6 +606,14 @@ struct ipath_devdata { /* Rx Polarity inversion (compensate for ~tx on partner) */ u8 ipath_rx_pol_inv; + u8 ipath_r_portenable_shift; + u8 ipath_r_intravail_shift; + u8 ipath_r_tailupd_shift; + u8 ipath_r_portcfg_shift; + + /* unit # of this chip, if present */ + int ipath_unit; + /* local link integrity counter */ u32 ipath_lli_counter; /* local link integrity errors */ @@ -645,8 +643,8 @@ struct ipath_devdata { * Below should be computable from number of ports, * since they are never modified. */ - u32 ipath_i_rcvavail_mask; - u32 ipath_i_rcvurg_mask; + u64 ipath_i_rcvavail_mask; + u64 ipath_i_rcvurg_mask; u16 ipath_i_rcvurg_shift; u16 ipath_i_rcvavail_shift; @@ -836,6 +834,8 @@ void ipath_hol_event(unsigned long); #define IPATH_LINKUNK 0x400 /* Write combining flush needed for PIO */ #define IPATH_PIO_FLUSH_WC 0x1000 + /* DMA Receive tail pointer */ +#define IPATH_NODMA_RTAIL 0x2000 /* no IB cable, or no device on IB cable */ #define IPATH_NOCABLE 0x4000 /* Supports port zero per packet receive interrupts via @@ -846,9 +846,9 @@ void ipath_hol_event(unsigned long); /* packet/word counters are 32 bit, else those 4 counters * are 64bit */ #define IPATH_32BITCOUNTERS 0x20000 - /* can miss port0 rx interrupts */ /* Interrupt register is 64 bits */ #define IPATH_INTREG_64 0x40000 + /* can miss port0 rx interrupts */ #define IPATH_DISABLED 0x80000 /* administratively disabled */ /* Use GPIO interrupts for new counters */ #define IPATH_GPIO_ERRINTRS 0x100000 @@ -1036,6 +1036,27 @@ static inline u32 ipath_get_rcvhdrtail(const struct ipath_portdata *pd) pd->port_rcvhdrtail_kvaddr)); } +static inline u32 ipath_get_hdrqtail(const struct ipath_portdata *pd) +{ + const struct ipath_devdata *dd = pd->port_dd; + u32 hdrqtail; + + if (dd->ipath_flags & IPATH_NODMA_RTAIL) { + __le32 *rhf_addr; + u32 seq; + + rhf_addr = (__le32 *) pd->port_rcvhdrq + + pd->port_head + dd->ipath_rhf_offset; + seq = ipath_hdrget_seq(rhf_addr); + hdrqtail = pd->port_head; + if (seq == pd->port_seq_cnt) + hdrqtail++; + } else + hdrqtail = ipath_get_rcvhdrtail(pd); + + return hdrqtail; +} + static inline u64 ipath_read_ireg(const struct ipath_devdata *dd, ipath_kreg r) { return (dd->ipath_flags & IPATH_INTREG_64) ? diff --git a/drivers/infiniband/hw/ipath/ipath_registers.h b/drivers/infiniband/hw/ipath/ipath_registers.h index f49f184..b7d87d3 100644 --- a/drivers/infiniband/hw/ipath/ipath_registers.h +++ b/drivers/infiniband/hw/ipath/ipath_registers.h @@ -86,8 +86,6 @@ #define INFINIPATH_R_QPMAP_ENABLE (1ULL << 38) /* kr_intstatus, kr_intclear, kr_intmask bits */ -#define INFINIPATH_I_RCVURG_SHIFT 0 -#define INFINIPATH_I_RCVAVAIL_SHIFT 12 #define INFINIPATH_I_ERROR 0x80000000 #define INFINIPATH_I_SPIOSENT 0x40000000 #define INFINIPATH_I_SPIOBUFAVAIL 0x20000000 diff --git a/drivers/infiniband/hw/ipath/ipath_stats.c b/drivers/infiniband/hw/ipath/ipath_stats.c index 57eb1d5..adff2f1 100644 --- a/drivers/infiniband/hw/ipath/ipath_stats.c +++ b/drivers/infiniband/hw/ipath/ipath_stats.c @@ -136,6 +136,7 @@ static void ipath_qcheck(struct ipath_devdata *dd) struct ipath_portdata *pd = dd->ipath_pd[0]; size_t blen = 0; char buf[128]; + u32 hdrqtail; *buf = 0; if (pd->port_hdrqfull != dd->ipath_p0_hdrqfull) { @@ -174,17 +175,18 @@ static void ipath_qcheck(struct ipath_devdata *dd) if (blen) ipath_dbg("%s\n", buf); - if (pd->port_head != (u32) - le64_to_cpu(*dd->ipath_hdrqtailptr)) { + hdrqtail = ipath_get_hdrqtail(pd); + if (pd->port_head != hdrqtail) { if (dd->ipath_lastport0rcv_cnt == ipath_stats.sps_port0pkts) { ipath_cdbg(PKT, "missing rcv interrupts? " - "port0 hd=%llx tl=%x; port0pkts %llx\n", - (unsigned long long) - le64_to_cpu(*dd->ipath_hdrqtailptr), - pd->port_head, + "port0 hd=%x tl=%x; port0pkts %llx; write" + " hd (w/intr)\n", + pd->port_head, hdrqtail, (unsigned long long) ipath_stats.sps_port0pkts); + ipath_write_ureg(dd, ur_rcvhdrhead, hdrqtail | + dd->ipath_rhdrhead_intr_off, pd->port_port); } dd->ipath_lastport0rcv_cnt = ipath_stats.sps_port0pkts; } From ralph.campbell at qlogic.com Wed Apr 2 15:49:22 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:49:22 -0700 Subject: [ofa-general] [PATCH 04/20] IB/ipath - Make link state transition code ignore (transient) link recovery In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: <20080402224921.28598.25504.stgit@eng-46.mv.qlogic.com> From: Dave Olson The hardware-based recovery doesn't need any intervention, and in a few cases we can get a bit confused about state and skip steps such as turning off the link state LED when we consider recovery to be "down". So ignore this transition, and either we recover in hardware, or we transition to down, and will handle it then. Signed-off-by: Dave Olson --- drivers/infiniband/hw/ipath/ipath_intr.c | 16 +++++++++++++++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c index 826b96b..3bad601 100644 --- a/drivers/infiniband/hw/ipath/ipath_intr.c +++ b/drivers/infiniband/hw/ipath/ipath_intr.c @@ -300,6 +300,18 @@ static void handle_e_ibstatuschanged(struct ipath_devdata *dd, ltstate = ipath_ib_linktrstate(dd, ibcs); /* linktrainingtate */ /* + * Since going into a recovery state causes the link state to go + * down and since recovery is transitory, it is better if we "miss" + * ever seeing the link training state go into recovery (i.e., + * ignore this transition for link state special handling purposes) + * without even updating ipath_lastibcstat. + */ + if ((ltstate == INFINIPATH_IBCS_LT_STATE_RECOVERRETRAIN) || + (ltstate == INFINIPATH_IBCS_LT_STATE_RECOVERWAITRMT) || + (ltstate == INFINIPATH_IBCS_LT_STATE_RECOVERIDLE)) + goto done; + + /* * if linkstate transitions into INIT from any of the various down * states, or if it transitions from any of the up (INIT or better) * states into any of the down states (except link recovery), then @@ -316,7 +328,7 @@ static void handle_e_ibstatuschanged(struct ipath_devdata *dd, } } else if ((lastlstate >= INFINIPATH_IBCS_L_STATE_INIT || (dd->ipath_flags & IPATH_IB_FORCE_NOTIFY)) && - ltstate <= INFINIPATH_IBCS_LT_STATE_CFGDEBOUNCE && + ltstate <= INFINIPATH_IBCS_LT_STATE_CFGWAITRMT && ltstate != INFINIPATH_IBCS_LT_STATE_LINKUP) { int handled; handled = dd->ipath_f_ib_updown(dd, 0, ibcs); @@ -460,6 +472,8 @@ static void handle_e_ibstatuschanged(struct ipath_devdata *dd, skip_ibchange: dd->ipath_lastibcstat = ibcs; +done: + return; } static void handle_supp_msgs(struct ipath_devdata *dd, From ralph.campbell at qlogic.com Wed Apr 2 15:49:27 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:49:27 -0700 Subject: [ofa-general] [PATCH 05/20] IB/ipath - Add support for IBTA 1.2 Heartbeat In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: <20080402224927.28598.57384.stgit@eng-46.mv.qlogic.com> From: Dave Olson This patch adds code to enable/disable the IBTA 1.2 heartbeat for testing if the HCA supports it. Signed-off-by: Dave Olson --- drivers/infiniband/hw/ipath/ipath_common.h | 2 ++ drivers/infiniband/hw/ipath/ipath_driver.c | 31 +++++++++++++++++++++++++--- 2 files changed, 30 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_common.h b/drivers/infiniband/hw/ipath/ipath_common.h index edd4183..3c05d4b 100644 --- a/drivers/infiniband/hw/ipath/ipath_common.h +++ b/drivers/infiniband/hw/ipath/ipath_common.h @@ -80,6 +80,8 @@ #define IPATH_IB_LINKDOWN_DISABLE 5 #define IPATH_IB_LINK_LOOPBACK 6 /* enable local loopback */ #define IPATH_IB_LINK_EXTERNAL 7 /* normal, disable local loopback */ +#define IPATH_IB_LINK_NO_HRTBT 8 /* disable Heartbeat, e.g. for loopback */ +#define IPATH_IB_LINK_HRTBT 9 /* enable heartbeat, normal, non-loopback */ /* * These 3 values (SDR and DDR may be ORed for auto-speed diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index eef2599..58aa255 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -1880,16 +1880,41 @@ int ipath_set_linkstate(struct ipath_devdata *dd, u8 newstate) dd->ipath_ibcctrl |= INFINIPATH_IBCC_LOOPBACK; ipath_write_kreg(dd, dd->ipath_kregs->kr_ibcctrl, dd->ipath_ibcctrl); + + /* turn heartbeat off, as it causes loopback to fail */ + dd->ipath_f_set_ib_cfg(dd, IPATH_IB_CFG_HRTBT, + IPATH_IB_HRTBT_OFF); + /* don't wait */ ret = 0; - goto bail; // no state change to wait for + goto bail; case IPATH_IB_LINK_EXTERNAL: - dev_info(&dd->pcidev->dev, "Disabling IB local loopback (normal)\n"); + dev_info(&dd->pcidev->dev, + "Disabling IB local loopback (normal)\n"); + dd->ipath_f_set_ib_cfg(dd, IPATH_IB_CFG_HRTBT, + IPATH_IB_HRTBT_ON); dd->ipath_ibcctrl &= ~INFINIPATH_IBCC_LOOPBACK; ipath_write_kreg(dd, dd->ipath_kregs->kr_ibcctrl, dd->ipath_ibcctrl); + /* don't wait */ ret = 0; - goto bail; // no state change to wait for + goto bail; + + /* + * Heartbeat can be explicitly enabled by the user via + * "hrtbt_enable" "file", and if disabled, trying to enable here + * will have no effect. Implicit changes (heartbeat off when + * loopback on, and vice versa) are included to ease testing. + */ + case IPATH_IB_LINK_HRTBT: + ret = dd->ipath_f_set_ib_cfg(dd, IPATH_IB_CFG_HRTBT, + IPATH_IB_HRTBT_ON); + goto bail; + + case IPATH_IB_LINK_NO_HRTBT: + ret = dd->ipath_f_set_ib_cfg(dd, IPATH_IB_CFG_HRTBT, + IPATH_IB_HRTBT_OFF); + goto bail; default: ipath_dbg("Invalid linkstate 0x%x requested\n", newstate); From ralph.campbell at qlogic.com Wed Apr 2 15:49:32 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:49:32 -0700 Subject: [ofa-general] [PATCH 06/20] IB/ipath - set LID filtering for HCAs that support it. In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: <20080402224932.28598.58514.stgit@eng-46.mv.qlogic.com> From: Dave Olson Whenever the LID is set, notify the HCA specific code so that the appropriate HW registers can be updated. Also log the info on the console at low priority. Signed-off-by: Dave Olson --- drivers/infiniband/hw/ipath/ipath_driver.c | 9 +++++++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index 58aa255..53f8ae4 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -2006,11 +2006,16 @@ bail: return ret; } -int ipath_set_lid(struct ipath_devdata *dd, u32 arg, u8 lmc) +int ipath_set_lid(struct ipath_devdata *dd, u32 lid, u8 lmc) { - dd->ipath_lid = arg; + dd->ipath_lid = lid; dd->ipath_lmc = lmc; + dd->ipath_f_set_ib_cfg(dd, IPATH_IB_CFG_LIDLMC, lid | + (~((1U << lmc) - 1)) << 16); + + dev_info(&dd->pcidev->dev, "We got a lid: 0x%x\n", lid); + return 0; } From ralph.campbell at qlogic.com Wed Apr 2 15:49:37 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:49:37 -0700 Subject: [ofa-general] [PATCH 07/20] IB/ipath - Enable reduced PIO updated for HCAs that support it. In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: <20080402224937.28598.35413.stgit@eng-46.mv.qlogic.com> From: Dave Olson Newer HCAs have a threshold counter to reduce the number of DMAs the chip makes to update the PIO buffer availability status bits. This patch enables the feature. Signed-off-by: Dave Olson --- drivers/infiniband/hw/ipath/ipath_file_ops.c | 23 +++++++++++++++++++++++ drivers/infiniband/hw/ipath/ipath_init_chip.c | 22 +++++++++++++++++++++- drivers/infiniband/hw/ipath/ipath_kernel.h | 1 + 3 files changed, 45 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c index 17d4e97..eab69df 100644 --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c @@ -184,6 +184,29 @@ static int ipath_get_base_info(struct file *fp, kinfo->spi_piobufbase = (u64) pd->port_piobufs + dd->ipath_palign * kinfo->spi_piocnt * slave; } + + /* + * Set the PIO avail update threshold to no larger + * than the number of buffers per process. Note that + * we decrease it here, but won't ever increase it. + */ + if (dd->ipath_pioupd_thresh && + kinfo->spi_piocnt < dd->ipath_pioupd_thresh) { + unsigned long flags; + + dd->ipath_pioupd_thresh = kinfo->spi_piocnt; + ipath_dbg("Decreased pio update threshold to %u\n", + dd->ipath_pioupd_thresh); + spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags); + dd->ipath_sendctrl &= ~(INFINIPATH_S_UPDTHRESH_MASK + << INFINIPATH_S_UPDTHRESH_SHIFT); + dd->ipath_sendctrl |= dd->ipath_pioupd_thresh + << INFINIPATH_S_UPDTHRESH_SHIFT; + ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, + dd->ipath_sendctrl); + spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags); + } + if (shared) { kinfo->spi_port_uregbase = (u64) dd->ipath_uregbase + dd->ipath_ureg_align * pd->port_port; diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c index 720ff4d..1adafa9 100644 --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c @@ -341,6 +341,7 @@ static int init_chip_reset(struct ipath_devdata *dd) { u32 rtmp; int i; + unsigned long flags; /* * ensure chip does no sends or receives, tail updates, or @@ -356,8 +357,13 @@ static int init_chip_reset(struct ipath_devdata *dd) ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, dd->ipath_rcvctrl); + spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags); + dd->ipath_sendctrl = 0U; /* no sdma, etc */ ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, dd->ipath_sendctrl); - ipath_write_kreg(dd, dd->ipath_kregs->kr_control, dd->ipath_control); + ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); + spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags); + + ipath_write_kreg(dd, dd->ipath_kregs->kr_control, 0ULL); rtmp = ipath_read_kreg32(dd, dd->ipath_kregs->kr_rcvtidcnt); if (rtmp != dd->ipath_rcvtidcnt) @@ -478,6 +484,14 @@ static void enable_chip(struct ipath_devdata *dd, int reinit) /* Enable PIO send, and update of PIOavail regs to memory. */ dd->ipath_sendctrl = INFINIPATH_S_PIOENABLE | INFINIPATH_S_PIOBUFAVAILUPD; + + /* + * Set the PIO avail update threshold to host memory + * on chips that support it. + */ + if (dd->ipath_pioupd_thresh) + dd->ipath_sendctrl |= dd->ipath_pioupd_thresh + << INFINIPATH_S_UPDTHRESH_SHIFT; ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, dd->ipath_sendctrl); ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags); @@ -757,6 +771,12 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit) ipath_cdbg(VERBOSE, "%d PIO bufs for kernel out of %d total %u " "each for %u user ports\n", kpiobufs, piobufs, dd->ipath_pbufsport, uports); + if (dd->ipath_pioupd_thresh) { + if (dd->ipath_pbufsport < dd->ipath_pioupd_thresh) + dd->ipath_pioupd_thresh = dd->ipath_pbufsport; + if (kpiobufs < dd->ipath_pioupd_thresh) + dd->ipath_pioupd_thresh = kpiobufs; + } dd->ipath_f_early_init(dd); /* diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index 7fae888..e96eec2 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -349,6 +349,7 @@ struct ipath_devdata { u32 ipath_lastrpkts; /* pio bufs allocated per port */ u32 ipath_pbufsport; + u32 ipath_pioupd_thresh; /* update threshold, some chips */ /* * number of ports configured as max; zero is set to number chip * supports, less gives more pio bufs/port, etc. From ralph.campbell at qlogic.com Wed Apr 2 15:49:42 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:49:42 -0700 Subject: [ofa-general] [PATCH 08/20] IB/ipath - fix check for no interrupts to reliably fallback to INTx In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: <20080402224942.28598.29782.stgit@eng-46.mv.qlogic.com> From: Dave Olson Newer HCAs support MSI interrupts and also INTx interrupts. Fix the code so that INTx can be reliably enabled if MSI interrupts are not working. Signed-off-by: Dave Olson --- drivers/infiniband/hw/ipath/ipath_driver.c | 23 +++------------- drivers/infiniband/hw/ipath/ipath_init_chip.c | 36 +++++++++++++++++++++++++ drivers/infiniband/hw/ipath/ipath_kernel.h | 5 +-- 3 files changed, 42 insertions(+), 22 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index 53f8ae4..b4a69ef 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -138,19 +138,6 @@ static struct pci_driver ipath_driver = { }, }; -static void ipath_check_status(struct work_struct *work) -{ - struct ipath_devdata *dd = container_of(work, struct ipath_devdata, - status_work.work); - - /* - * If we don't have any interrupts, let the user know and - * don't bother checking again. - */ - if (dd->ipath_int_counter == 0) - dev_err(&dd->pcidev->dev, "No interrupts detected.\n"); -} - static inline void read_bars(struct ipath_devdata *dd, struct pci_dev *dev, u32 *bar0, u32 *bar1) { @@ -218,8 +205,6 @@ static struct ipath_devdata *ipath_alloc_devdata(struct pci_dev *pdev) dd->pcidev = pdev; pci_set_drvdata(pdev, dd); - INIT_DELAYED_WORK(&dd->status_work, ipath_check_status); - list_add(&dd->ipath_list, &ipath_dev_list); bail_unlock: @@ -620,9 +605,6 @@ static int __devinit ipath_init_one(struct pci_dev *pdev, ipath_diag_add(dd); ipath_register_ib_device(dd); - /* Check that card status in STATUS_TIMEOUT seconds. */ - schedule_delayed_work(&dd->status_work, HZ * STATUS_TIMEOUT); - goto bail; bail_irqsetup: @@ -753,7 +735,6 @@ static void __devexit ipath_remove_one(struct pci_dev *pdev) */ ipath_shutdown_device(dd); - cancel_delayed_work(&dd->status_work); flush_scheduled_work(); if (dd->verbs_dev) @@ -2195,6 +2176,10 @@ void ipath_shutdown_device(struct ipath_devdata *dd) del_timer_sync(&dd->ipath_stats_timer); dd->ipath_stats_timer_active = 0; } + if (dd->ipath_intrchk_timer.data) { + del_timer_sync(&dd->ipath_intrchk_timer); + dd->ipath_intrchk_timer.data = 0; + } /* * clear all interrupts and errors, so that the next time the driver diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c index 1adafa9..0db19c1 100644 --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c @@ -665,6 +665,28 @@ done: return ret; } +static void verify_interrupt(unsigned long opaque) +{ + struct ipath_devdata *dd = (struct ipath_devdata *) opaque; + + if (!dd) + return; /* being torn down */ + + /* + * If we don't have any interrupts, let the user know and + * don't bother checking again. + */ + if (dd->ipath_int_counter == 0) { + if (!dd->ipath_f_intr_fallback(dd)) + dev_err(&dd->pcidev->dev, "No interrupts detected, " + "not usable.\n"); + else /* re-arm the timer to see if fallback works */ + mod_timer(&dd->ipath_intrchk_timer, jiffies + HZ/2); + } else + ipath_cdbg(VERBOSE, "%u interrupts at timer check\n", + dd->ipath_int_counter); +} + /** * ipath_init_chip - do the actual initialization sequence on the chip * @dd: the infinipath device @@ -968,6 +990,20 @@ done: 0ULL); /* chip is usable; mark it as initialized */ *dd->ipath_statusp |= IPATH_STATUS_INITTED; + + /* + * setup to verify we get an interrupt, and fallback + * to an alternate if necessary and possible + */ + if (!reinit) { + init_timer(&dd->ipath_intrchk_timer); + dd->ipath_intrchk_timer.function = + verify_interrupt; + dd->ipath_intrchk_timer.data = + (unsigned long) dd; + } + dd->ipath_intrchk_timer.expires = jiffies + HZ/2; + add_timer(&dd->ipath_intrchk_timer); } else ipath_dev_err(dd, "No interrupts enabled, couldn't " "setup interrupt address\n"); diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index e96eec2..90bbbc7 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -426,6 +426,8 @@ struct ipath_devdata { struct class_device *diag_class_dev; /* timer used to prevent stats overflow, error throttling, etc. */ struct timer_list ipath_stats_timer; + /* timer to verify interrupts work, and fallback if possible */ + struct timer_list ipath_intrchk_timer; void *ipath_dummy_hdrq; /* used after port close */ dma_addr_t ipath_dummy_hdrq_phys; @@ -629,9 +631,6 @@ struct ipath_devdata { u32 ipath_overrun_thresh_errs; u32 ipath_lli_errs; - /* status check work */ - struct delayed_work status_work; - /* * Not all devices managed by a driver instance are the same * type, so these fields must be per-device. From ralph.campbell at qlogic.com Wed Apr 2 15:49:47 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:49:47 -0700 Subject: [ofa-general] [PATCH 09/20] IB/ipath - fix up error handling In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: <20080402224947.28598.52232.stgit@eng-46.mv.qlogic.com> This patch makes chip reset more robust and reduces lock contention between user and kernel TID register updates. Signed-off-by: Ralph Campbell --- drivers/infiniband/hw/ipath/ipath_iba6120.c | 79 ++++++++++++++++++++----- drivers/infiniband/hw/ipath/ipath_init_chip.c | 2 - drivers/infiniband/hw/ipath/ipath_kernel.h | 2 - 3 files changed, 66 insertions(+), 17 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c index 907b61b..c8d8f1a 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba6120.c +++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c @@ -558,12 +558,40 @@ static void ipath_pe_handle_hwerrors(struct ipath_devdata *dd, char *msg, dd->ipath_hwerrmask); } - if (*msg) + if (hwerrs) { + /* + * if any set that we aren't ignoring; only + * make the complaint once, in case it's stuck + * or recurring, and we get here multiple + * times. + */ ipath_dev_err(dd, "%s hardware error\n", msg); - if (isfatal && !ipath_diag_inuse && dd->ipath_freezemsg) { + if (dd->ipath_flags & IPATH_INITTED) { + ipath_set_linkstate(dd, IPATH_IB_LINKDOWN); + ipath_setup_pe_setextled(dd, + INFINIPATH_IBCS_L_STATE_DOWN, + INFINIPATH_IBCS_LT_STATE_DISABLED); + ipath_dev_err(dd, "Fatal Hardware Error (freeze " + "mode), no longer usable, SN %.16s\n", + dd->ipath_serial); + isfatal = 1; + } + *dd->ipath_statusp &= ~IPATH_STATUS_IB_READY; + /* mark as having had error */ + *dd->ipath_statusp |= IPATH_STATUS_HWERROR; + /* + * mark as not usable, at a minimum until driver + * is reloaded, probably until reboot, since no + * other reset is possible. + */ + dd->ipath_flags &= ~IPATH_INITTED; + } else + *msg = 0; /* recovered from all of them */ + + if (isfatal && !ipath_diag_inuse && dd->ipath_freezemsg && msg) { /* - * for /sys status file ; if no trailing } is copied, we'll - * know it was truncated. + * for /sys status file ; if no trailing brace is copied, + * we'll know it was truncated. */ snprintf(dd->ipath_freezemsg, dd->ipath_freezelen, "{%s}", msg); @@ -1127,10 +1155,7 @@ static void ipath_init_pe_variables(struct ipath_devdata *dd) INFINIPATH_HWE_RXEMEMPARITYERR_MASK << INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT; - dd->ipath_eep_st_masks[2].errs_to_log = - INFINIPATH_E_INVALIDADDR | INFINIPATH_E_RESET; - - + dd->ipath_eep_st_masks[2].errs_to_log = INFINIPATH_E_RESET; dd->delay_mult = 2; /* SDR, 4X, can't change */ } @@ -1204,6 +1229,9 @@ static int ipath_setup_pe_reset(struct ipath_devdata *dd) u64 val; int i; int ret; + u16 cmdval; + + pci_read_config_word(dd->pcidev, PCI_COMMAND, &cmdval); /* Use ERROR so it shows up in logs, etc. */ ipath_dev_err(dd, "Resetting InfiniPath unit %u\n", dd->ipath_unit); @@ -1231,10 +1259,14 @@ static int ipath_setup_pe_reset(struct ipath_devdata *dd) ipath_dev_err(dd, "rewrite of BAR1 failed: %d\n", r); /* now re-enable memory access */ + pci_write_config_word(dd->pcidev, PCI_COMMAND, cmdval); if ((r = pci_enable_device(dd->pcidev))) ipath_dev_err(dd, "pci_enable_device failed after " "reset: %d\n", r); - /* whether it worked or not, mark as present, again */ + /* + * whether it fully enabled or not, mark as present, + * again (but not INITTED) + */ dd->ipath_flags |= IPATH_PRESENT; val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_revision); if (val == dd->ipath_revision) { @@ -1273,6 +1305,11 @@ static void ipath_pe_put_tid(struct ipath_devdata *dd, u64 __iomem *tidptr, { u32 __iomem *tidp32 = (u32 __iomem *)tidptr; unsigned long flags = 0; /* keep gcc quiet */ + int tidx; + spinlock_t *tidlockp; + + if (!dd->ipath_kregbase) + return; if (pa != dd->ipath_tidinvalid) { if (pa & ((1U << 11) - 1)) { @@ -1302,14 +1339,22 @@ static void ipath_pe_put_tid(struct ipath_devdata *dd, u64 __iomem *tidptr, * call can be done from interrupt level for the port 0 eager TIDs, * so we have to use irqsave locks. */ - spin_lock_irqsave(&dd->ipath_tid_lock, flags); + /* + * Assumes tidptr always > ipath_egrtidbase + * if type == RCVHQ_RCV_TYPE_EAGER. + */ + tidx = tidptr - dd->ipath_egrtidbase; + + tidlockp = (type == RCVHQ_RCV_TYPE_EAGER && tidx < dd->ipath_rcvegrcnt) + ? &dd->ipath_kernel_tid_lock : &dd->ipath_user_tid_lock; + spin_lock_irqsave(tidlockp, flags); ipath_write_kreg(dd, dd->ipath_kregs->kr_scratch, 0xfeeddeaf); - if (dd->ipath_kregbase) - writel(pa, tidp32); + writel(pa, tidp32); ipath_write_kreg(dd, dd->ipath_kregs->kr_scratch, 0xdeadbeef); mmiowb(); - spin_unlock_irqrestore(&dd->ipath_tid_lock, flags); + spin_unlock_irqrestore(tidlockp, flags); } + /** * ipath_pe_put_tid_2 - write a TID in chip, Revision 2 or higher * @dd: the infinipath device @@ -1325,6 +1370,10 @@ static void ipath_pe_put_tid_2(struct ipath_devdata *dd, u64 __iomem *tidptr, u32 type, unsigned long pa) { u32 __iomem *tidp32 = (u32 __iomem *)tidptr; + u32 tidx; + + if (!dd->ipath_kregbase) + return; if (pa != dd->ipath_tidinvalid) { if (pa & ((1U << 11) - 1)) { @@ -1344,8 +1393,8 @@ static void ipath_pe_put_tid_2(struct ipath_devdata *dd, u64 __iomem *tidptr, else /* for now, always full 4KB page */ pa |= 2 << 29; } - if (dd->ipath_kregbase) - writel(pa, tidp32); + tidx = tidptr - dd->ipath_egrtidbase; + writel(pa, tidp32); mmiowb(); } diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c index 0db19c1..8d8e572 100644 --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c @@ -319,7 +319,7 @@ static int init_chip_first(struct ipath_devdata *dd) else ipath_dbg("%u 2k piobufs @ %p\n", dd->ipath_piobcnt2k, dd->ipath_pio2kbase); - spin_lock_init(&dd->ipath_tid_lock); + spin_lock_init(&dd->ipath_user_tid_lock); spin_lock_init(&dd->ipath_sendctrl_lock); spin_lock_init(&dd->ipath_gpio_lock); spin_lock_init(&dd->ipath_eep_st_lock); diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index 90bbbc7..0504937 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -407,7 +407,7 @@ struct ipath_devdata { u64 __iomem *ipath_egrtidbase; /* lock to workaround chip bug 9437 and others */ spinlock_t ipath_kernel_tid_lock; - spinlock_t ipath_tid_lock; + spinlock_t ipath_user_tid_lock; spinlock_t ipath_sendctrl_lock; /* From ralph.campbell at qlogic.com Wed Apr 2 15:49:52 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:49:52 -0700 Subject: [ofa-general] [PATCH 10/20] IB/ipath - Header file changes to support IBA7220 In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: <20080402224952.28598.48402.stgit@eng-46.mv.qlogic.com> This is part of a patch series to add support for a new HCA. This patch adds new fields to the header files. Signed-off-by: Ralph Campbell --- drivers/infiniband/hw/ipath/ipath_common.h | 3 drivers/infiniband/hw/ipath/ipath_kernel.h | 165 ++++++++++++++++++++++++- drivers/infiniband/hw/ipath/ipath_registers.h | 138 +++++++++++++++------ drivers/infiniband/hw/ipath/ipath_verbs.h | 32 ++++- 4 files changed, 284 insertions(+), 54 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_common.h b/drivers/infiniband/hw/ipath/ipath_common.h index 3c05d4b..02fd310 100644 --- a/drivers/infiniband/hw/ipath/ipath_common.h +++ b/drivers/infiniband/hw/ipath/ipath_common.h @@ -201,6 +201,7 @@ typedef enum _ipath_ureg { #define IPATH_RUNTIME_RCVHDR_COPY 0x8 #define IPATH_RUNTIME_MASTER 0x10 #define IPATH_RUNTIME_NODMA_RTAIL 0x80 +#define IPATH_RUNTIME_SDMA 0x200 #define IPATH_RUNTIME_FORCE_PIOAVAIL 0x400 #define IPATH_RUNTIME_PIO_REGSWAPPED 0x800 @@ -539,7 +540,7 @@ struct ipath_diag_pkt { /* The second diag_pkt struct is the expanded version that allows * more control over the packet, specifically, by allowing a custom - * pbc (+ extra) qword, so that special modes and deliberate + * pbc (+ static rate) qword, so that special modes and deliberate * changes to CRCs can be used. The elements were also re-ordered * for better alignment and to avoid padding issues. */ diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index 0504937..8cdeab8 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -42,6 +42,8 @@ #include #include #include +#include +#include #include #include @@ -180,6 +182,8 @@ struct ipath_portdata { }; struct sk_buff; +struct ipath_sge_state; +struct ipath_verbs_txreq; /* * control information for layered drivers @@ -193,6 +197,37 @@ struct ipath_skbinfo { dma_addr_t phys; }; +struct ipath_sdma_txreq { + int flags; + int sg_count; + union { + struct scatterlist *sg; + void *map_addr; + }; + void (*callback)(void *, int); + void *callback_cookie; + int callback_status; + u16 start_idx; /* sdma private */ + u16 next_descq_idx; /* sdma private */ + struct list_head list; /* sdma private */ +}; + +struct ipath_sdma_desc { + __le64 qw[2]; +}; + +#define IPATH_SDMA_TXREQ_F_USELARGEBUF 0x1 +#define IPATH_SDMA_TXREQ_F_HEADTOHOST 0x2 +#define IPATH_SDMA_TXREQ_F_INTREQ 0x4 +#define IPATH_SDMA_TXREQ_F_FREEBUF 0x8 +#define IPATH_SDMA_TXREQ_F_FREEDESC 0x10 +#define IPATH_SDMA_TXREQ_F_VL15 0x20 + +#define IPATH_SDMA_TXREQ_S_OK 0 +#define IPATH_SDMA_TXREQ_S_SENDERROR 1 +#define IPATH_SDMA_TXREQ_S_ABORTED 2 +#define IPATH_SDMA_TXREQ_S_SHUTDOWN 3 + /* max dwords in small buffer packet */ #define IPATH_SMALLBUF_DWORDS (dd->ipath_piosize2k >> 2) @@ -385,6 +420,15 @@ struct ipath_devdata { u32 ipath_pcibar0; /* so we can rewrite it after a chip reset */ u32 ipath_pcibar1; + u32 ipath_x1_fix_tries; + u32 ipath_autoneg_tries; + u32 serdes_first_init_done; + + struct ipath_relock { + atomic_t ipath_relock_timer_active; + struct timer_list ipath_relock_timer; + unsigned int ipath_relock_interval; /* in jiffies */ + } ipath_relock_singleton; /* interrupt number */ int ipath_irq; @@ -431,8 +475,38 @@ struct ipath_devdata { void *ipath_dummy_hdrq; /* used after port close */ dma_addr_t ipath_dummy_hdrq_phys; + /* SendDMA related entries */ + spinlock_t ipath_sdma_lock; + u64 ipath_sdma_status; + unsigned long ipath_sdma_abort_jiffies; + unsigned long ipath_sdma_abort_intr_timeout; + unsigned long ipath_sdma_buf_jiffies; + struct ipath_sdma_desc *ipath_sdma_descq; + u64 ipath_sdma_descq_added; + u64 ipath_sdma_descq_removed; + int ipath_sdma_desc_nreserved; + u16 ipath_sdma_descq_cnt; + u16 ipath_sdma_descq_tail; + u16 ipath_sdma_descq_head; + u16 ipath_sdma_next_intr; + u16 ipath_sdma_reset_wait; + u8 ipath_sdma_generation; + struct tasklet_struct ipath_sdma_abort_task; + struct tasklet_struct ipath_sdma_notify_task; + struct list_head ipath_sdma_activelist; + struct list_head ipath_sdma_notifylist; + atomic_t ipath_sdma_vl15_count; + struct timer_list ipath_sdma_vl15_timer; + + dma_addr_t ipath_sdma_descq_phys; + volatile __le64 *ipath_sdma_head_dma; + dma_addr_t ipath_sdma_head_phys; + unsigned long ipath_ureg_align; /* user register alignment */ + struct delayed_work ipath_autoneg_work; + wait_queue_head_t ipath_autoneg_wait; + /* HoL blocking / user app forward-progress state */ unsigned ipath_hol_state; unsigned ipath_hol_next; @@ -485,6 +559,8 @@ struct ipath_devdata { u64 ipath_intconfig; /* kr_sendpiobufbase value */ u64 ipath_piobufbase; + /* kr_ibcddrctrl shadow */ + u64 ipath_ibcddrctrl; /* these are the "32 bit" regs */ @@ -501,7 +577,10 @@ struct ipath_devdata { unsigned long ipath_rcvctrl; /* shadow kr_sendctrl */ unsigned long ipath_sendctrl; - unsigned long ipath_lastcancel; /* to not count armlaunch after cancel */ + /* to not count armlaunch after cancel */ + unsigned long ipath_lastcancel; + /* count cases where special trigger was needed (double write) */ + unsigned long ipath_spectriggerhit; /* value we put in kr_rcvhdrcnt */ u32 ipath_rcvhdrcnt; @@ -523,6 +602,7 @@ struct ipath_devdata { u32 ipath_piobcnt4k; /* size in bytes of "4KB" PIO buffers */ u32 ipath_piosize4k; + u32 ipath_pioreserved; /* reserved special-inkernel; */ /* kr_rcvegrbase value */ u32 ipath_rcvegrbase; /* kr_rcvegrcnt value */ @@ -586,7 +666,7 @@ struct ipath_devdata { */ u8 ipath_serial[16]; /* human readable board version */ - u8 ipath_boardversion[80]; + u8 ipath_boardversion[96]; u8 ipath_lbus_info[32]; /* human readable localbus info */ /* chip major rev, from ipath_revision */ u8 ipath_majrev; @@ -715,6 +795,13 @@ struct ipath_devdata { /* interrupt mitigation reload register info */ u16 ipath_jint_idle_ticks; /* idle clock ticks */ u16 ipath_jint_max_packets; /* max packets across all ports */ + + /* + * lock for access to SerDes, and flags to sequence preset + * versus steady-state. 7220-only at the moment. + */ + spinlock_t ipath_sdepb_lock; + u8 ipath_presets_needed; /* Set if presets to be restored next DOWN */ }; /* ipath_hol_state values (stopping/starting user proc, send flushing) */ @@ -724,11 +811,35 @@ struct ipath_devdata { #define IPATH_HOL_DOWNSTOP 0 #define IPATH_HOL_DOWNCONT 1 +/* bit positions for sdma_status */ +#define IPATH_SDMA_ABORTING 0 +#define IPATH_SDMA_DISARMED 1 +#define IPATH_SDMA_DISABLED 2 +#define IPATH_SDMA_LAYERBUF 3 +#define IPATH_SDMA_RUNNING 62 +#define IPATH_SDMA_SHUTDOWN 63 + +/* bit combinations that correspond to abort states */ +#define IPATH_SDMA_ABORT_NONE 0 +#define IPATH_SDMA_ABORT_ABORTING (1UL << IPATH_SDMA_ABORTING) +#define IPATH_SDMA_ABORT_DISARMED ((1UL << IPATH_SDMA_ABORTING) | \ + (1UL << IPATH_SDMA_DISARMED)) +#define IPATH_SDMA_ABORT_DISABLED ((1UL << IPATH_SDMA_ABORTING) | \ + (1UL << IPATH_SDMA_DISABLED)) +#define IPATH_SDMA_ABORT_ABORTED ((1UL << IPATH_SDMA_ABORTING) | \ + (1UL << IPATH_SDMA_DISARMED) | (1UL << IPATH_SDMA_DISABLED)) +#define IPATH_SDMA_ABORT_MASK ((1UL<private_data)->pd @@ -804,6 +919,8 @@ void ipath_hol_event(unsigned long); ((struct ipath_filedata *)(fp)->private_data)->subport #define tidcursor_fp(fp) \ ((struct ipath_filedata *)(fp)->private_data)->tidcursor +#define user_sdma_queue_fp(fp) \ + ((struct ipath_filedata *)(fp)->private_data)->pq /* * values for ipath_flags @@ -853,9 +970,16 @@ void ipath_hol_event(unsigned long); /* Use GPIO interrupts for new counters */ #define IPATH_GPIO_ERRINTRS 0x100000 #define IPATH_SWAP_PIOBUFS 0x200000 + /* Supports Send DMA */ +#define IPATH_HAS_SEND_DMA 0x400000 + /* Supports Send Count (not just word count) in PBC */ +#define IPATH_HAS_PBC_CNT 0x800000 /* Suppress heartbeat, even if turning off loopback */ #define IPATH_NO_HRTBT 0x1000000 +#define IPATH_HAS_THRESH_UPDATE 0x4000000 #define IPATH_HAS_MULT_IB_SPEED 0x8000000 +#define IPATH_IB_AUTONEG_INPROG 0x10000000 +#define IPATH_IB_AUTONEG_FAILED 0x20000000 /* Linkdown-disable intentionally, Do not attempt to bring up */ #define IPATH_IB_LINK_DISABLED 0x40000000 #define IPATH_IB_FORCE_NOTIFY 0x80000000 /* force notify on next ib change */ @@ -880,6 +1004,7 @@ void ipath_free_data(struct ipath_portdata *dd); u32 __iomem *ipath_getpiobuf(struct ipath_devdata *, u32, u32 *); void ipath_chg_pioavailkernel(struct ipath_devdata *dd, unsigned start, unsigned len, int avail); +void ipath_init_iba7220_funcs(struct ipath_devdata *); void ipath_init_iba6120_funcs(struct ipath_devdata *); void ipath_init_iba6110_funcs(struct ipath_devdata *); void ipath_get_eeprom_info(struct ipath_devdata *); @@ -898,6 +1023,33 @@ void signal_ib_event(struct ipath_devdata *dd, enum ib_event_type ev); #define IPATH_LED_LOG 2 /* Logical (link) YELLOW LED */ void ipath_set_led_override(struct ipath_devdata *dd, unsigned int val); +/* send dma routines */ +int setup_sdma(struct ipath_devdata *); +void teardown_sdma(struct ipath_devdata *); +void ipath_sdma_intr(struct ipath_devdata *); +int ipath_sdma_verbs_send(struct ipath_devdata *, struct ipath_sge_state *, + u32, struct ipath_verbs_txreq *); +/* ipath_sdma_lock should be locked before calling this. */ +int ipath_sdma_make_progress(struct ipath_devdata *dd); + +/* must be called under ipath_sdma_lock */ +static inline u16 ipath_sdma_descq_freecnt(const struct ipath_devdata *dd) +{ + return dd->ipath_sdma_descq_cnt - + (dd->ipath_sdma_descq_added - dd->ipath_sdma_descq_removed) - + 1 - dd->ipath_sdma_desc_nreserved; +} + +static inline void ipath_sdma_desc_reserve(struct ipath_devdata *dd, u16 cnt) +{ + dd->ipath_sdma_desc_nreserved += cnt; +} + +static inline void ipath_sdma_desc_unreserve(struct ipath_devdata *dd, u16 cnt) +{ + dd->ipath_sdma_desc_nreserved -= cnt; +} + /* * number of words used for protocol header if not set by ipath_userinit(); */ @@ -926,8 +1078,7 @@ void ipath_write_kreg_port(const struct ipath_devdata *, ipath_kreg, /* * At the moment, none of the s-registers are writable, so no - * ipath_write_sreg(), and none of the c-registers are writable, so no - * ipath_write_creg(). + * ipath_write_sreg(). */ /** @@ -1124,6 +1275,7 @@ int ipathfs_remove_device(struct ipath_devdata *); dma_addr_t ipath_map_page(struct pci_dev *, struct page *, unsigned long, size_t, int); dma_addr_t ipath_map_single(struct pci_dev *, void *, size_t, int); +const char *ipath_get_unit_name(int unit); /* * Flush write combining store buffers (if present) and perform a write @@ -1138,11 +1290,6 @@ dma_addr_t ipath_map_single(struct pci_dev *, void *, size_t, int); extern unsigned ipath_debug; /* debugging bit mask */ extern unsigned ipath_linkrecovery; extern unsigned ipath_mtu4096; - -#define IPATH_MAX_PARITY_ATTEMPTS 10000 /* max times to try recovery */ - -const char *ipath_get_unit_name(int unit); - extern struct mutex ipath_mutex; #define IPATH_DRV_NAME "ib_ipath" diff --git a/drivers/infiniband/hw/ipath/ipath_registers.h b/drivers/infiniband/hw/ipath/ipath_registers.h index b7d87d3..8f44d0c 100644 --- a/drivers/infiniband/hw/ipath/ipath_registers.h +++ b/drivers/infiniband/hw/ipath/ipath_registers.h @@ -73,56 +73,82 @@ #define IPATH_S_PIOINTBUFAVAIL 1 #define IPATH_S_PIOBUFAVAILUPD 2 #define IPATH_S_PIOENABLE 3 +#define IPATH_S_SDMAINTENABLE 9 +#define IPATH_S_SDMASINGLEDESCRIPTOR 10 +#define IPATH_S_SDMAENABLE 11 +#define IPATH_S_SDMAHALT 12 #define IPATH_S_DISARM 31 #define INFINIPATH_S_ABORT (1U << IPATH_S_ABORT) #define INFINIPATH_S_PIOINTBUFAVAIL (1U << IPATH_S_PIOINTBUFAVAIL) #define INFINIPATH_S_PIOBUFAVAILUPD (1U << IPATH_S_PIOBUFAVAILUPD) #define INFINIPATH_S_PIOENABLE (1U << IPATH_S_PIOENABLE) +#define INFINIPATH_S_SDMAINTENABLE (1U << IPATH_S_SDMAINTENABLE) +#define INFINIPATH_S_SDMASINGLEDESCRIPTOR \ + (1U << IPATH_S_SDMASINGLEDESCRIPTOR) +#define INFINIPATH_S_SDMAENABLE (1U << IPATH_S_SDMAENABLE) +#define INFINIPATH_S_SDMAHALT (1U << IPATH_S_SDMAHALT) #define INFINIPATH_S_DISARM (1U << IPATH_S_DISARM) -/* kr_rcvctrl bits */ +/* kr_rcvctrl bits that are the same on multiple chips */ #define INFINIPATH_R_PORTENABLE_SHIFT 0 #define INFINIPATH_R_QPMAP_ENABLE (1ULL << 38) /* kr_intstatus, kr_intclear, kr_intmask bits */ -#define INFINIPATH_I_ERROR 0x80000000 -#define INFINIPATH_I_SPIOSENT 0x40000000 -#define INFINIPATH_I_SPIOBUFAVAIL 0x20000000 -#define INFINIPATH_I_GPIO 0x10000000 +#define INFINIPATH_I_SDMAINT 0x8000000000000000ULL +#define INFINIPATH_I_SDMADISABLED 0x4000000000000000ULL +#define INFINIPATH_I_ERROR 0x0000000080000000ULL +#define INFINIPATH_I_SPIOSENT 0x0000000040000000ULL +#define INFINIPATH_I_SPIOBUFAVAIL 0x0000000020000000ULL +#define INFINIPATH_I_GPIO 0x0000000010000000ULL +#define INFINIPATH_I_JINT 0x0000000004000000ULL /* kr_errorstatus, kr_errorclear, kr_errormask bits */ -#define INFINIPATH_E_RFORMATERR 0x0000000000000001ULL -#define INFINIPATH_E_RVCRC 0x0000000000000002ULL -#define INFINIPATH_E_RICRC 0x0000000000000004ULL -#define INFINIPATH_E_RMINPKTLEN 0x0000000000000008ULL -#define INFINIPATH_E_RMAXPKTLEN 0x0000000000000010ULL -#define INFINIPATH_E_RLONGPKTLEN 0x0000000000000020ULL -#define INFINIPATH_E_RSHORTPKTLEN 0x0000000000000040ULL -#define INFINIPATH_E_RUNEXPCHAR 0x0000000000000080ULL -#define INFINIPATH_E_RUNSUPVL 0x0000000000000100ULL -#define INFINIPATH_E_REBP 0x0000000000000200ULL -#define INFINIPATH_E_RIBFLOW 0x0000000000000400ULL -#define INFINIPATH_E_RBADVERSION 0x0000000000000800ULL -#define INFINIPATH_E_RRCVEGRFULL 0x0000000000001000ULL -#define INFINIPATH_E_RRCVHDRFULL 0x0000000000002000ULL -#define INFINIPATH_E_RBADTID 0x0000000000004000ULL -#define INFINIPATH_E_RHDRLEN 0x0000000000008000ULL -#define INFINIPATH_E_RHDR 0x0000000000010000ULL -#define INFINIPATH_E_RIBLOSTLINK 0x0000000000020000ULL -#define INFINIPATH_E_SMINPKTLEN 0x0000000020000000ULL -#define INFINIPATH_E_SMAXPKTLEN 0x0000000040000000ULL -#define INFINIPATH_E_SUNDERRUN 0x0000000080000000ULL -#define INFINIPATH_E_SPKTLEN 0x0000000100000000ULL -#define INFINIPATH_E_SDROPPEDSMPPKT 0x0000000200000000ULL -#define INFINIPATH_E_SDROPPEDDATAPKT 0x0000000400000000ULL -#define INFINIPATH_E_SPIOARMLAUNCH 0x0000000800000000ULL -#define INFINIPATH_E_SUNEXPERRPKTNUM 0x0000001000000000ULL -#define INFINIPATH_E_SUNSUPVL 0x0000002000000000ULL -#define INFINIPATH_E_IBSTATUSCHANGED 0x0001000000000000ULL -#define INFINIPATH_E_INVALIDADDR 0x0002000000000000ULL -#define INFINIPATH_E_RESET 0x0004000000000000ULL -#define INFINIPATH_E_HARDWARE 0x0008000000000000ULL +#define INFINIPATH_E_RFORMATERR 0x0000000000000001ULL +#define INFINIPATH_E_RVCRC 0x0000000000000002ULL +#define INFINIPATH_E_RICRC 0x0000000000000004ULL +#define INFINIPATH_E_RMINPKTLEN 0x0000000000000008ULL +#define INFINIPATH_E_RMAXPKTLEN 0x0000000000000010ULL +#define INFINIPATH_E_RLONGPKTLEN 0x0000000000000020ULL +#define INFINIPATH_E_RSHORTPKTLEN 0x0000000000000040ULL +#define INFINIPATH_E_RUNEXPCHAR 0x0000000000000080ULL +#define INFINIPATH_E_RUNSUPVL 0x0000000000000100ULL +#define INFINIPATH_E_REBP 0x0000000000000200ULL +#define INFINIPATH_E_RIBFLOW 0x0000000000000400ULL +#define INFINIPATH_E_RBADVERSION 0x0000000000000800ULL +#define INFINIPATH_E_RRCVEGRFULL 0x0000000000001000ULL +#define INFINIPATH_E_RRCVHDRFULL 0x0000000000002000ULL +#define INFINIPATH_E_RBADTID 0x0000000000004000ULL +#define INFINIPATH_E_RHDRLEN 0x0000000000008000ULL +#define INFINIPATH_E_RHDR 0x0000000000010000ULL +#define INFINIPATH_E_RIBLOSTLINK 0x0000000000020000ULL +#define INFINIPATH_E_SENDSPECIALTRIGGER 0x0000000008000000ULL +#define INFINIPATH_E_SDMADISABLED 0x0000000010000000ULL +#define INFINIPATH_E_SMINPKTLEN 0x0000000020000000ULL +#define INFINIPATH_E_SMAXPKTLEN 0x0000000040000000ULL +#define INFINIPATH_E_SUNDERRUN 0x0000000080000000ULL +#define INFINIPATH_E_SPKTLEN 0x0000000100000000ULL +#define INFINIPATH_E_SDROPPEDSMPPKT 0x0000000200000000ULL +#define INFINIPATH_E_SDROPPEDDATAPKT 0x0000000400000000ULL +#define INFINIPATH_E_SPIOARMLAUNCH 0x0000000800000000ULL +#define INFINIPATH_E_SUNEXPERRPKTNUM 0x0000001000000000ULL +#define INFINIPATH_E_SUNSUPVL 0x0000002000000000ULL +#define INFINIPATH_E_SENDBUFMISUSE 0x0000004000000000ULL +#define INFINIPATH_E_SDMAGENMISMATCH 0x0000008000000000ULL +#define INFINIPATH_E_SDMAOUTOFBOUND 0x0000010000000000ULL +#define INFINIPATH_E_SDMATAILOUTOFBOUND 0x0000020000000000ULL +#define INFINIPATH_E_SDMABASE 0x0000040000000000ULL +#define INFINIPATH_E_SDMA1STDESC 0x0000080000000000ULL +#define INFINIPATH_E_SDMARPYTAG 0x0000100000000000ULL +#define INFINIPATH_E_SDMADWEN 0x0000200000000000ULL +#define INFINIPATH_E_SDMAMISSINGDW 0x0000400000000000ULL +#define INFINIPATH_E_SDMAUNEXPDATA 0x0000800000000000ULL +#define INFINIPATH_E_IBSTATUSCHANGED 0x0001000000000000ULL +#define INFINIPATH_E_INVALIDADDR 0x0002000000000000ULL +#define INFINIPATH_E_RESET 0x0004000000000000ULL +#define INFINIPATH_E_HARDWARE 0x0008000000000000ULL +#define INFINIPATH_E_SDMADESCADDRMISALIGN 0x0010000000000000ULL +#define INFINIPATH_E_INVALIDEEPCMD 0x0020000000000000ULL /* * this is used to print "common" packet errors only when the @@ -133,6 +159,17 @@ | INFINIPATH_E_RICRC | INFINIPATH_E_RSHORTPKTLEN \ | INFINIPATH_E_REBP ) +/* Convenience for decoding Send DMA errors */ +#define INFINIPATH_E_SDMAERRS ( \ + INFINIPATH_E_SDMAGENMISMATCH | INFINIPATH_E_SDMAOUTOFBOUND | \ + INFINIPATH_E_SDMATAILOUTOFBOUND | INFINIPATH_E_SDMABASE | \ + INFINIPATH_E_SDMA1STDESC | INFINIPATH_E_SDMARPYTAG | \ + INFINIPATH_E_SDMADWEN | INFINIPATH_E_SDMAMISSINGDW | \ + INFINIPATH_E_SDMAUNEXPDATA | \ + INFINIPATH_E_SDMADESCADDRMISALIGN | \ + INFINIPATH_E_SDMADISABLED | \ + INFINIPATH_E_SENDBUFMISUSE) + /* kr_hwerrclear, kr_hwerrmask, kr_hwerrstatus, bits */ /* TXEMEMPARITYERR bit 0: PIObuf, 1: PIOpbc, 2: launchfifo * RXEMEMPARITYERR bit 0: rcvbuf, 1: lookupq, 2: expTID, 3: eagerTID @@ -157,7 +194,7 @@ #define INFINIPATH_HWE_RXEMEMPARITYERR_HDRINFO 0x40ULL /* waldo specific -- find the rest in ipath_6110.c */ #define INFINIPATH_HWE_RXDSYNCMEMPARITYERR 0x0000000400000000ULL -/* monty specific -- find the rest in ipath_6120.c */ +/* 6120/7220 specific -- find the rest in ipath_6120.c and ipath_7220.c */ #define INFINIPATH_HWE_MEMBISTFAILED 0x0040000000000000ULL /* kr_hwdiagctrl bits */ @@ -202,7 +239,7 @@ /* kr_ibcstatus bits */ #define INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT 0 #define INFINIPATH_IBCS_LINKSTATE_MASK 0x7 -#define INFINIPATH_IBCS_LINKSTATE_SHIFT 4 + #define INFINIPATH_IBCS_TXREADY 0x40000000 #define INFINIPATH_IBCS_TXCREDITOK 0x80000000 /* link training states (shift by @@ -267,7 +304,7 @@ /* L1 Power down; use with RXDETECT, Otherwise not used on IB side */ #define INFINIPATH_SERDC0_L1PWR_DN 0xF0ULL -/* kr_xgxsconfig bits */ +/* common kr_xgxsconfig bits (or safe in all, even if not implemented) */ #define INFINIPATH_XGXS_RX_POL_SHIFT 19 #define INFINIPATH_XGXS_RX_POL_MASK 0xfULL @@ -397,6 +434,29 @@ struct ipath_kregs { ipath_kreg kr_pcieq1serdesconfig0; ipath_kreg kr_pcieq1serdesconfig1; ipath_kreg kr_pcieq1serdesstatus; + ipath_kreg kr_hrtbt_guid; + ipath_kreg kr_ibcddrctrl; + ipath_kreg kr_ibcddrstatus; + ipath_kreg kr_jintreload; + + /* send dma related regs */ + ipath_kreg kr_senddmabase; + ipath_kreg kr_senddmalengen; + ipath_kreg kr_senddmatail; + ipath_kreg kr_senddmahead; + ipath_kreg kr_senddmaheadaddr; + ipath_kreg kr_senddmabufmask0; + ipath_kreg kr_senddmabufmask1; + ipath_kreg kr_senddmabufmask2; + ipath_kreg kr_senddmastatus; + + /* SerDes related regs (IBA7220-only) */ + ipath_kreg kr_ibserdesctrl; + ipath_kreg kr_ib_epbacc; + ipath_kreg kr_ib_epbtrans; + ipath_kreg kr_pcie_epbacc; + ipath_kreg kr_pcie_epbtrans; + ipath_kreg kr_ib_ddsrxeq; }; struct ipath_cregs { diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h index 3d59736..056e741 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.h +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h @@ -138,6 +138,11 @@ struct ipath_ib_header { } u; } __attribute__ ((packed)); +struct ipath_pio_header { + __le32 pbc[2]; + struct ipath_ib_header hdr; +} __attribute__ ((packed)); + /* * There is one struct ipath_mcast for each multicast GID. * All attached QPs are then stored as a list of @@ -319,6 +324,7 @@ struct ipath_sge_state { struct ipath_sge *sg_list; /* next SGE to be used if any */ struct ipath_sge sge; /* progress state for the current SGE */ u8 num_sge; + u8 static_rate; }; /* @@ -356,6 +362,7 @@ struct ipath_qp { struct tasklet_struct s_task; struct ipath_mmap_info *ip; struct ipath_sge_state *s_cur_sge; + struct ipath_verbs_txreq *s_tx; struct ipath_sge_state s_sge; /* current send request data */ struct ipath_ack_entry s_ack_queue[IPATH_MAX_RDMA_ATOMIC + 1]; struct ipath_sge_state s_ack_rdma_sge; @@ -363,7 +370,8 @@ struct ipath_qp { struct ipath_sge_state r_sge; /* current receive data */ spinlock_t s_lock; unsigned long s_busy; - u32 s_hdrwords; /* size of s_hdr in 32 bit words */ + u16 s_pkt_delay; + u16 s_hdrwords; /* size of s_hdr in 32 bit words */ u32 s_cur_size; /* size of send packet in bytes */ u32 s_len; /* total length of s_sge */ u32 s_rdma_read_len; /* total length of s_rdma_read_sge */ @@ -387,7 +395,6 @@ struct ipath_qp { u8 r_nak_state; /* non-zero if NAK is pending */ u8 r_min_rnr_timer; /* retry timeout value for RNR NAKs */ u8 r_reuse_sge; /* for UC receive errors */ - u8 r_sge_inx; /* current index into sg_list */ u8 r_wrid_valid; /* r_wrid set but CQ entry not yet made */ u8 r_max_rd_atomic; /* max number of RDMA read/atomic to receive */ u8 r_head_ack_queue; /* index into s_ack_queue[] */ @@ -403,6 +410,7 @@ struct ipath_qp { u8 s_num_rd_atomic; /* number of RDMA read/atomic pending */ u8 s_tail_ack_queue; /* index into s_ack_queue[] */ u8 s_flags; + u8 s_dmult; u8 timeout; /* Timeout for this QP */ enum ib_mtu path_mtu; u32 remote_qpn; @@ -510,6 +518,8 @@ struct ipath_ibdev { struct ipath_lkey_table lk_table; struct list_head pending[3]; /* FIFO of QPs waiting for ACKs */ struct list_head piowait; /* list for wait PIO buf */ + struct list_head txreq_free; + void *txreq_bufs; /* list of QPs waiting for RNR timer */ struct list_head rnrwait; spinlock_t pending_lock; @@ -570,6 +580,7 @@ struct ipath_ibdev { u32 n_rdma_dup_busy; u32 n_piowait; u32 n_no_piobuf; + u32 n_unaligned; u32 port_cap_flags; u32 pma_sample_start; u32 pma_sample_interval; @@ -581,7 +592,6 @@ struct ipath_ibdev { u16 pending_index; /* which pending queue is active */ u8 pma_sample_status; u8 subnet_timeout; - u8 link_width_enabled; u8 vl_high_limit; struct ipath_opcode_stats opstats[128]; }; @@ -602,6 +612,16 @@ struct ipath_verbs_counters { u32 vl15_dropped; }; +struct ipath_verbs_txreq { + struct ipath_qp *qp; + struct ipath_swqe *wqe; + u32 map_len; + u32 len; + struct ipath_sge_state *ss; + struct ipath_pio_header hdr; + struct ipath_sdma_txreq txreq; +}; + static inline struct ipath_mr *to_imr(struct ib_mr *ibmr) { return container_of(ibmr, struct ipath_mr, ibmr); @@ -694,11 +714,13 @@ void ipath_sqerror_qp(struct ipath_qp *qp, struct ib_wc *wc); void ipath_get_credit(struct ipath_qp *qp, u32 aeth); +unsigned ipath_ib_rate_to_mult(enum ib_rate rate); + +enum ib_rate ipath_mult_to_ib_rate(unsigned mult); + int ipath_verbs_send(struct ipath_qp *qp, struct ipath_ib_header *hdr, u32 hdrwords, struct ipath_sge_state *ss, u32 len); -void ipath_cq_enter(struct ipath_cq *cq, struct ib_wc *entry, int sig); - void ipath_copy_sge(struct ipath_sge_state *ss, void *data, u32 length); void ipath_skip_sge(struct ipath_sge_state *ss, u32 length); From ralph.campbell at qlogic.com Wed Apr 2 15:49:57 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:49:57 -0700 Subject: [ofa-general] [PATCH 11/20] IB/ipath - isolate 7220-specific content In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: <20080402224957.28598.51916.stgit@eng-46.mv.qlogic.com> From: Michael Albaugh This patch adds a new ASIC-specific header file for the HCAs using the IBA7220. Signed-off-by: Michael Albaugh --- drivers/infiniband/hw/ipath/ipath_7220.h | 57 ++++++++++++++++++++++++++++++ 1 files changed, 57 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_7220.h b/drivers/infiniband/hw/ipath/ipath_7220.h new file mode 100644 index 0000000..74fa5cc --- /dev/null +++ b/drivers/infiniband/hw/ipath/ipath_7220.h @@ -0,0 +1,57 @@ +#ifndef _IPATH_7220_H +#define _IPATH_7220_H +/* + * Copyright (c) 2007 QLogic Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +/* + * This header file provides the declarations and common definitions + * for (mostly) manipulation of the SerDes blocks within the IBA7220. + * the functions declared should only be called from within other + * 7220-related files such as ipath_iba7220.c or ipath_sd7220.c. + */ +int ipath_sd7220_presets(struct ipath_devdata *dd); +int ipath_sd7220_init(struct ipath_devdata *dd, int was_reset); +int ipath_sd7220_prog_ld(struct ipath_devdata *dd, int sdnum, u8 *img, + int len, int offset); +int ipath_sd7220_prog_vfy(struct ipath_devdata *dd, int sdnum, const u8 *img, + int len, int offset); +/* + * Below used for sdnum parameter, selecting one of the two sections + * used for PCIe, or the single SerDes used for IB, which is the + * only one currently used + */ +#define IB_7220_SERDES 2 + +int ipath_sd7220_ib_load(struct ipath_devdata *dd); +int ipath_sd7220_ib_vfy(struct ipath_devdata *dd); + +#endif /* _IPATH_7220_H */ From ralph.campbell at qlogic.com Wed Apr 2 15:50:02 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:50:02 -0700 Subject: [ofa-general] [PATCH 12/20] IB/ipath - HCA specific code to support IBA7220 In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: <20080402225002.28598.23449.stgit@eng-46.mv.qlogic.com> This patch adds the HCA specific code for the IBA7220 HCA. Signed-off-by: Ralph Campbell --- drivers/infiniband/hw/ipath/ipath_iba7220.c | 2571 +++++++++++++++++++++++++++ 1 files changed, 2571 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_iba7220.c b/drivers/infiniband/hw/ipath/ipath_iba7220.c new file mode 100644 index 0000000..1b2de2c --- /dev/null +++ b/drivers/infiniband/hw/ipath/ipath_iba7220.c @@ -0,0 +1,2571 @@ +/* + * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved. + * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +/* + * This file contains all of the code that is specific to the + * InfiniPath 7220 chip (except that specific to the SerDes) + */ + +#include +#include +#include +#include +#include + +#include "ipath_kernel.h" +#include "ipath_registers.h" +#include "ipath_7220.h" + +static void ipath_setup_7220_setextled(struct ipath_devdata *, u64, u64); + +static unsigned ipath_compat_ddr_negotiate = 1; + +module_param_named(compat_ddr_negotiate, ipath_compat_ddr_negotiate, uint, + S_IWUSR | S_IRUGO); +MODULE_PARM_DESC(compat_ddr_negotiate, + "Attempt pre-IBTA 1.2 DDR speed negotiation"); + +static unsigned ipath_sdma_fetch_arb = 1; +module_param_named(fetch_arb, ipath_sdma_fetch_arb, uint, S_IRUGO); +MODULE_PARM_DESC(fetch_arb, "IBA7220: change SDMA descriptor arbitration"); + +/* + * This file contains almost all the chip-specific register information and + * access functions for the QLogic InfiniPath 7220 PCI-Express chip, with the + * exception of SerDes support, which in in ipath_sd7220.c. + * + * This lists the InfiniPath registers, in the actual chip layout. + * This structure should never be directly accessed. + */ +struct _infinipath_do_not_use_kernel_regs { + unsigned long long Revision; + unsigned long long Control; + unsigned long long PageAlign; + unsigned long long PortCnt; + unsigned long long DebugPortSelect; + unsigned long long DebugSigsIntSel; /* was Reserved0;*/ + unsigned long long SendRegBase; + unsigned long long UserRegBase; + unsigned long long CounterRegBase; + unsigned long long Scratch; + unsigned long long EEPROMAddrCmd; /* was Reserved1; */ + unsigned long long EEPROMData; /* was Reserved2; */ + unsigned long long IntBlocked; + unsigned long long IntMask; + unsigned long long IntStatus; + unsigned long long IntClear; + unsigned long long ErrorMask; + unsigned long long ErrorStatus; + unsigned long long ErrorClear; + unsigned long long HwErrMask; + unsigned long long HwErrStatus; + unsigned long long HwErrClear; + unsigned long long HwDiagCtrl; + unsigned long long MDIO; + unsigned long long IBCStatus; + unsigned long long IBCCtrl; + unsigned long long ExtStatus; + unsigned long long ExtCtrl; + unsigned long long GPIOOut; + unsigned long long GPIOMask; + unsigned long long GPIOStatus; + unsigned long long GPIOClear; + unsigned long long RcvCtrl; + unsigned long long RcvBTHQP; + unsigned long long RcvHdrSize; + unsigned long long RcvHdrCnt; + unsigned long long RcvHdrEntSize; + unsigned long long RcvTIDBase; + unsigned long long RcvTIDCnt; + unsigned long long RcvEgrBase; + unsigned long long RcvEgrCnt; + unsigned long long RcvBufBase; + unsigned long long RcvBufSize; + unsigned long long RxIntMemBase; + unsigned long long RxIntMemSize; + unsigned long long RcvPartitionKey; + unsigned long long RcvQPMulticastPort; + unsigned long long RcvPktLEDCnt; + unsigned long long IBCDDRCtrl; + unsigned long long HRTBT_GUID; + unsigned long long IB_SDTEST_IF_TX; + unsigned long long IB_SDTEST_IF_RX; + unsigned long long IBCDDRCtrl2; + unsigned long long IBCDDRStatus; + unsigned long long JIntReload; + unsigned long long IBNCModeCtrl; + unsigned long long SendCtrl; + unsigned long long SendBufBase; + unsigned long long SendBufSize; + unsigned long long SendBufCnt; + unsigned long long SendAvailAddr; + unsigned long long TxIntMemBase; + unsigned long long TxIntMemSize; + unsigned long long SendDmaBase; + unsigned long long SendDmaLenGen; + unsigned long long SendDmaTail; + unsigned long long SendDmaHead; + unsigned long long SendDmaHeadAddr; + unsigned long long SendDmaBufMask0; + unsigned long long SendDmaBufMask1; + unsigned long long SendDmaBufMask2; + unsigned long long SendDmaStatus; + unsigned long long SendBufferError; + unsigned long long SendBufferErrorCONT1; + unsigned long long SendBufErr2; /* was Reserved6SBE[0/6] */ + unsigned long long Reserved6L[2]; + unsigned long long AvailUpdCount; + unsigned long long RcvHdrAddr0; + unsigned long long RcvHdrAddrs[16]; /* Why enumerate? */ + unsigned long long Reserved7hdtl; /* Align next to 300 */ + unsigned long long RcvHdrTailAddr0; /* 300, like others */ + unsigned long long RcvHdrTailAddrs[16]; + unsigned long long Reserved9SW[7]; /* was [8]; we have 17 ports */ + unsigned long long IbsdEpbAccCtl; /* IB Serdes EPB access control */ + unsigned long long IbsdEpbTransReg; /* IB Serdes EPB Transaction */ + unsigned long long Reserved10sds; /* was SerdesStatus on */ + unsigned long long XGXSConfig; + unsigned long long IBSerDesCtrl; /* Was IBPLLCfg on Monty */ + unsigned long long EEPCtlStat; /* for "boot" EEPROM/FLASH */ + unsigned long long EEPAddrCmd; + unsigned long long EEPData; + unsigned long long PcieEpbAccCtl; + unsigned long long PcieEpbTransCtl; + unsigned long long EfuseCtl; /* E-Fuse control */ + unsigned long long EfuseData[4]; + unsigned long long ProcMon; + /* this chip moves following two from previous 200, 208 */ + unsigned long long PCIeRBufTestReg0; + unsigned long long PCIeRBufTestReg1; + /* added for this chip */ + unsigned long long PCIeRBufTestReg2; + unsigned long long PCIeRBufTestReg3; + /* added for this chip, debug only */ + unsigned long long SPC_JTAG_ACCESS_REG; + unsigned long long LAControlReg; + unsigned long long GPIODebugSelReg; + unsigned long long DebugPortValueReg; + /* added for this chip, DMA */ + unsigned long long SendDmaBufUsed[3]; + unsigned long long SendDmaReqTagUsed; + /* + * added for this chip, EFUSE: note that these program 64-bit + * words 2 and 3 */ + unsigned long long efuse_pgm_data[2]; + unsigned long long Reserved11LAalign[10]; /* Skip 4B0..4F8 */ + /* we have 30 regs for DDS and RXEQ in IB SERDES */ + unsigned long long SerDesDDSRXEQ[30]; + unsigned long long Reserved12LAalign[2]; /* Skip 5F0, 5F8 */ + /* added for LA debug support */ + unsigned long long LAMemory[32]; +}; + +struct _infinipath_do_not_use_counters { + __u64 LBIntCnt; + __u64 LBFlowStallCnt; + __u64 TxSDmaDescCnt; /* was Reserved1 */ + __u64 TxUnsupVLErrCnt; + __u64 TxDataPktCnt; + __u64 TxFlowPktCnt; + __u64 TxDwordCnt; + __u64 TxLenErrCnt; + __u64 TxMaxMinLenErrCnt; + __u64 TxUnderrunCnt; + __u64 TxFlowStallCnt; + __u64 TxDroppedPktCnt; + __u64 RxDroppedPktCnt; + __u64 RxDataPktCnt; + __u64 RxFlowPktCnt; + __u64 RxDwordCnt; + __u64 RxLenErrCnt; + __u64 RxMaxMinLenErrCnt; + __u64 RxICRCErrCnt; + __u64 RxVCRCErrCnt; + __u64 RxFlowCtrlErrCnt; + __u64 RxBadFormatCnt; + __u64 RxLinkProblemCnt; + __u64 RxEBPCnt; + __u64 RxLPCRCErrCnt; + __u64 RxBufOvflCnt; + __u64 RxTIDFullErrCnt; + __u64 RxTIDValidErrCnt; + __u64 RxPKeyMismatchCnt; + __u64 RxP0HdrEgrOvflCnt; + __u64 RxP1HdrEgrOvflCnt; + __u64 RxP2HdrEgrOvflCnt; + __u64 RxP3HdrEgrOvflCnt; + __u64 RxP4HdrEgrOvflCnt; + __u64 RxP5HdrEgrOvflCnt; + __u64 RxP6HdrEgrOvflCnt; + __u64 RxP7HdrEgrOvflCnt; + __u64 RxP8HdrEgrOvflCnt; + __u64 RxP9HdrEgrOvflCnt; /* was Reserved6 */ + __u64 RxP10HdrEgrOvflCnt; /* was Reserved7 */ + __u64 RxP11HdrEgrOvflCnt; /* new for IBA7220 */ + __u64 RxP12HdrEgrOvflCnt; /* new for IBA7220 */ + __u64 RxP13HdrEgrOvflCnt; /* new for IBA7220 */ + __u64 RxP14HdrEgrOvflCnt; /* new for IBA7220 */ + __u64 RxP15HdrEgrOvflCnt; /* new for IBA7220 */ + __u64 RxP16HdrEgrOvflCnt; /* new for IBA7220 */ + __u64 IBStatusChangeCnt; + __u64 IBLinkErrRecoveryCnt; + __u64 IBLinkDownedCnt; + __u64 IBSymbolErrCnt; + /* The following are new for IBA7220 */ + __u64 RxVL15DroppedPktCnt; + __u64 RxOtherLocalPhyErrCnt; + __u64 PcieRetryBufDiagQwordCnt; + __u64 ExcessBufferOvflCnt; + __u64 LocalLinkIntegrityErrCnt; + __u64 RxVlErrCnt; + __u64 RxDlidFltrCnt; + __u64 Reserved8[7]; + __u64 PSStat; + __u64 PSStart; + __u64 PSInterval; + __u64 PSRcvDataCount; + __u64 PSRcvPktsCount; + __u64 PSXmitDataCount; + __u64 PSXmitPktsCount; + __u64 PSXmitWaitCount; +}; + +#define IPATH_KREG_OFFSET(field) (offsetof( \ + struct _infinipath_do_not_use_kernel_regs, field) / sizeof(u64)) +#define IPATH_CREG_OFFSET(field) (offsetof( \ + struct _infinipath_do_not_use_counters, field) / sizeof(u64)) + +static const struct ipath_kregs ipath_7220_kregs = { + .kr_control = IPATH_KREG_OFFSET(Control), + .kr_counterregbase = IPATH_KREG_OFFSET(CounterRegBase), + .kr_debugportselect = IPATH_KREG_OFFSET(DebugPortSelect), + .kr_errorclear = IPATH_KREG_OFFSET(ErrorClear), + .kr_errormask = IPATH_KREG_OFFSET(ErrorMask), + .kr_errorstatus = IPATH_KREG_OFFSET(ErrorStatus), + .kr_extctrl = IPATH_KREG_OFFSET(ExtCtrl), + .kr_extstatus = IPATH_KREG_OFFSET(ExtStatus), + .kr_gpio_clear = IPATH_KREG_OFFSET(GPIOClear), + .kr_gpio_mask = IPATH_KREG_OFFSET(GPIOMask), + .kr_gpio_out = IPATH_KREG_OFFSET(GPIOOut), + .kr_gpio_status = IPATH_KREG_OFFSET(GPIOStatus), + .kr_hwdiagctrl = IPATH_KREG_OFFSET(HwDiagCtrl), + .kr_hwerrclear = IPATH_KREG_OFFSET(HwErrClear), + .kr_hwerrmask = IPATH_KREG_OFFSET(HwErrMask), + .kr_hwerrstatus = IPATH_KREG_OFFSET(HwErrStatus), + .kr_ibcctrl = IPATH_KREG_OFFSET(IBCCtrl), + .kr_ibcstatus = IPATH_KREG_OFFSET(IBCStatus), + .kr_intblocked = IPATH_KREG_OFFSET(IntBlocked), + .kr_intclear = IPATH_KREG_OFFSET(IntClear), + .kr_intmask = IPATH_KREG_OFFSET(IntMask), + .kr_intstatus = IPATH_KREG_OFFSET(IntStatus), + .kr_mdio = IPATH_KREG_OFFSET(MDIO), + .kr_pagealign = IPATH_KREG_OFFSET(PageAlign), + .kr_partitionkey = IPATH_KREG_OFFSET(RcvPartitionKey), + .kr_portcnt = IPATH_KREG_OFFSET(PortCnt), + .kr_rcvbthqp = IPATH_KREG_OFFSET(RcvBTHQP), + .kr_rcvbufbase = IPATH_KREG_OFFSET(RcvBufBase), + .kr_rcvbufsize = IPATH_KREG_OFFSET(RcvBufSize), + .kr_rcvctrl = IPATH_KREG_OFFSET(RcvCtrl), + .kr_rcvegrbase = IPATH_KREG_OFFSET(RcvEgrBase), + .kr_rcvegrcnt = IPATH_KREG_OFFSET(RcvEgrCnt), + .kr_rcvhdrcnt = IPATH_KREG_OFFSET(RcvHdrCnt), + .kr_rcvhdrentsize = IPATH_KREG_OFFSET(RcvHdrEntSize), + .kr_rcvhdrsize = IPATH_KREG_OFFSET(RcvHdrSize), + .kr_rcvintmembase = IPATH_KREG_OFFSET(RxIntMemBase), + .kr_rcvintmemsize = IPATH_KREG_OFFSET(RxIntMemSize), + .kr_rcvtidbase = IPATH_KREG_OFFSET(RcvTIDBase), + .kr_rcvtidcnt = IPATH_KREG_OFFSET(RcvTIDCnt), + .kr_revision = IPATH_KREG_OFFSET(Revision), + .kr_scratch = IPATH_KREG_OFFSET(Scratch), + .kr_sendbuffererror = IPATH_KREG_OFFSET(SendBufferError), + .kr_sendctrl = IPATH_KREG_OFFSET(SendCtrl), + .kr_sendpioavailaddr = IPATH_KREG_OFFSET(SendAvailAddr), + .kr_sendpiobufbase = IPATH_KREG_OFFSET(SendBufBase), + .kr_sendpiobufcnt = IPATH_KREG_OFFSET(SendBufCnt), + .kr_sendpiosize = IPATH_KREG_OFFSET(SendBufSize), + .kr_sendregbase = IPATH_KREG_OFFSET(SendRegBase), + .kr_txintmembase = IPATH_KREG_OFFSET(TxIntMemBase), + .kr_txintmemsize = IPATH_KREG_OFFSET(TxIntMemSize), + .kr_userregbase = IPATH_KREG_OFFSET(UserRegBase), + + .kr_xgxsconfig = IPATH_KREG_OFFSET(XGXSConfig), + + /* send dma related regs */ + .kr_senddmabase = IPATH_KREG_OFFSET(SendDmaBase), + .kr_senddmalengen = IPATH_KREG_OFFSET(SendDmaLenGen), + .kr_senddmatail = IPATH_KREG_OFFSET(SendDmaTail), + .kr_senddmahead = IPATH_KREG_OFFSET(SendDmaHead), + .kr_senddmaheadaddr = IPATH_KREG_OFFSET(SendDmaHeadAddr), + .kr_senddmabufmask0 = IPATH_KREG_OFFSET(SendDmaBufMask0), + .kr_senddmabufmask1 = IPATH_KREG_OFFSET(SendDmaBufMask1), + .kr_senddmabufmask2 = IPATH_KREG_OFFSET(SendDmaBufMask2), + .kr_senddmastatus = IPATH_KREG_OFFSET(SendDmaStatus), + + /* SerDes related regs */ + .kr_ibserdesctrl = IPATH_KREG_OFFSET(IBSerDesCtrl), + .kr_ib_epbacc = IPATH_KREG_OFFSET(IbsdEpbAccCtl), + .kr_ib_epbtrans = IPATH_KREG_OFFSET(IbsdEpbTransReg), + .kr_pcie_epbacc = IPATH_KREG_OFFSET(PcieEpbAccCtl), + .kr_pcie_epbtrans = IPATH_KREG_OFFSET(PcieEpbTransCtl), + .kr_ib_ddsrxeq = IPATH_KREG_OFFSET(SerDesDDSRXEQ), + + /* + * These should not be used directly via ipath_read_kreg64(), + * use them with ipath_read_kreg64_port() + */ + .kr_rcvhdraddr = IPATH_KREG_OFFSET(RcvHdrAddr0), + .kr_rcvhdrtailaddr = IPATH_KREG_OFFSET(RcvHdrTailAddr0), + + /* + * The rcvpktled register controls one of the debug port signals, so + * a packet activity LED can be connected to it. + */ + .kr_rcvpktledcnt = IPATH_KREG_OFFSET(RcvPktLEDCnt), + .kr_pcierbuftestreg0 = IPATH_KREG_OFFSET(PCIeRBufTestReg0), + .kr_pcierbuftestreg1 = IPATH_KREG_OFFSET(PCIeRBufTestReg1), + + .kr_hrtbt_guid = IPATH_KREG_OFFSET(HRTBT_GUID), + .kr_ibcddrctrl = IPATH_KREG_OFFSET(IBCDDRCtrl), + .kr_ibcddrstatus = IPATH_KREG_OFFSET(IBCDDRStatus), + .kr_jintreload = IPATH_KREG_OFFSET(JIntReload) +}; + +static const struct ipath_cregs ipath_7220_cregs = { + .cr_badformatcnt = IPATH_CREG_OFFSET(RxBadFormatCnt), + .cr_erricrccnt = IPATH_CREG_OFFSET(RxICRCErrCnt), + .cr_errlinkcnt = IPATH_CREG_OFFSET(RxLinkProblemCnt), + .cr_errlpcrccnt = IPATH_CREG_OFFSET(RxLPCRCErrCnt), + .cr_errpkey = IPATH_CREG_OFFSET(RxPKeyMismatchCnt), + .cr_errrcvflowctrlcnt = IPATH_CREG_OFFSET(RxFlowCtrlErrCnt), + .cr_err_rlencnt = IPATH_CREG_OFFSET(RxLenErrCnt), + .cr_errslencnt = IPATH_CREG_OFFSET(TxLenErrCnt), + .cr_errtidfull = IPATH_CREG_OFFSET(RxTIDFullErrCnt), + .cr_errtidvalid = IPATH_CREG_OFFSET(RxTIDValidErrCnt), + .cr_errvcrccnt = IPATH_CREG_OFFSET(RxVCRCErrCnt), + .cr_ibstatuschange = IPATH_CREG_OFFSET(IBStatusChangeCnt), + .cr_intcnt = IPATH_CREG_OFFSET(LBIntCnt), + .cr_invalidrlencnt = IPATH_CREG_OFFSET(RxMaxMinLenErrCnt), + .cr_invalidslencnt = IPATH_CREG_OFFSET(TxMaxMinLenErrCnt), + .cr_lbflowstallcnt = IPATH_CREG_OFFSET(LBFlowStallCnt), + .cr_pktrcvcnt = IPATH_CREG_OFFSET(RxDataPktCnt), + .cr_pktrcvflowctrlcnt = IPATH_CREG_OFFSET(RxFlowPktCnt), + .cr_pktsendcnt = IPATH_CREG_OFFSET(TxDataPktCnt), + .cr_pktsendflowcnt = IPATH_CREG_OFFSET(TxFlowPktCnt), + .cr_portovflcnt = IPATH_CREG_OFFSET(RxP0HdrEgrOvflCnt), + .cr_rcvebpcnt = IPATH_CREG_OFFSET(RxEBPCnt), + .cr_rcvovflcnt = IPATH_CREG_OFFSET(RxBufOvflCnt), + .cr_senddropped = IPATH_CREG_OFFSET(TxDroppedPktCnt), + .cr_sendstallcnt = IPATH_CREG_OFFSET(TxFlowStallCnt), + .cr_sendunderruncnt = IPATH_CREG_OFFSET(TxUnderrunCnt), + .cr_wordrcvcnt = IPATH_CREG_OFFSET(RxDwordCnt), + .cr_wordsendcnt = IPATH_CREG_OFFSET(TxDwordCnt), + .cr_unsupvlcnt = IPATH_CREG_OFFSET(TxUnsupVLErrCnt), + .cr_rxdroppktcnt = IPATH_CREG_OFFSET(RxDroppedPktCnt), + .cr_iblinkerrrecovcnt = IPATH_CREG_OFFSET(IBLinkErrRecoveryCnt), + .cr_iblinkdowncnt = IPATH_CREG_OFFSET(IBLinkDownedCnt), + .cr_ibsymbolerrcnt = IPATH_CREG_OFFSET(IBSymbolErrCnt), + .cr_vl15droppedpktcnt = IPATH_CREG_OFFSET(RxVL15DroppedPktCnt), + .cr_rxotherlocalphyerrcnt = + IPATH_CREG_OFFSET(RxOtherLocalPhyErrCnt), + .cr_excessbufferovflcnt = IPATH_CREG_OFFSET(ExcessBufferOvflCnt), + .cr_locallinkintegrityerrcnt = + IPATH_CREG_OFFSET(LocalLinkIntegrityErrCnt), + .cr_rxvlerrcnt = IPATH_CREG_OFFSET(RxVlErrCnt), + .cr_rxdlidfltrcnt = IPATH_CREG_OFFSET(RxDlidFltrCnt), + .cr_psstat = IPATH_CREG_OFFSET(PSStat), + .cr_psstart = IPATH_CREG_OFFSET(PSStart), + .cr_psinterval = IPATH_CREG_OFFSET(PSInterval), + .cr_psrcvdatacount = IPATH_CREG_OFFSET(PSRcvDataCount), + .cr_psrcvpktscount = IPATH_CREG_OFFSET(PSRcvPktsCount), + .cr_psxmitdatacount = IPATH_CREG_OFFSET(PSXmitDataCount), + .cr_psxmitpktscount = IPATH_CREG_OFFSET(PSXmitPktsCount), + .cr_psxmitwaitcount = IPATH_CREG_OFFSET(PSXmitWaitCount), +}; + +/* kr_control bits */ +#define INFINIPATH_C_RESET (1U<<7) + +/* kr_intstatus, kr_intclear, kr_intmask bits */ +#define INFINIPATH_I_RCVURG_MASK ((1ULL<<17)-1) +#define INFINIPATH_I_RCVURG_SHIFT 32 +#define INFINIPATH_I_RCVAVAIL_MASK ((1ULL<<17)-1) +#define INFINIPATH_I_RCVAVAIL_SHIFT 0 +#define INFINIPATH_I_SERDESTRIMDONE (1ULL<<27) + +/* kr_hwerrclear, kr_hwerrmask, kr_hwerrstatus, bits */ +#define INFINIPATH_HWE_PCIEMEMPARITYERR_MASK 0x00000000000000ffULL +#define INFINIPATH_HWE_PCIEMEMPARITYERR_SHIFT 0 +#define INFINIPATH_HWE_PCIEPOISONEDTLP 0x0000000010000000ULL +#define INFINIPATH_HWE_PCIECPLTIMEOUT 0x0000000020000000ULL +#define INFINIPATH_HWE_PCIEBUSPARITYXTLH 0x0000000040000000ULL +#define INFINIPATH_HWE_PCIEBUSPARITYXADM 0x0000000080000000ULL +#define INFINIPATH_HWE_PCIEBUSPARITYRADM 0x0000000100000000ULL +#define INFINIPATH_HWE_COREPLL_FBSLIP 0x0080000000000000ULL +#define INFINIPATH_HWE_COREPLL_RFSLIP 0x0100000000000000ULL +#define INFINIPATH_HWE_PCIE1PLLFAILED 0x0400000000000000ULL +#define INFINIPATH_HWE_PCIE0PLLFAILED 0x0800000000000000ULL +#define INFINIPATH_HWE_SERDESPLLFAILED 0x1000000000000000ULL +/* specific to this chip */ +#define INFINIPATH_HWE_PCIECPLDATAQUEUEERR 0x0000000000000040ULL +#define INFINIPATH_HWE_PCIECPLHDRQUEUEERR 0x0000000000000080ULL +#define INFINIPATH_HWE_SDMAMEMREADERR 0x0000000010000000ULL +#define INFINIPATH_HWE_CLK_UC_PLLNOTLOCKED 0x2000000000000000ULL +#define INFINIPATH_HWE_PCIESERDESQ0PCLKNOTDETECT 0x0100000000000000ULL +#define INFINIPATH_HWE_PCIESERDESQ1PCLKNOTDETECT 0x0200000000000000ULL +#define INFINIPATH_HWE_PCIESERDESQ2PCLKNOTDETECT 0x0400000000000000ULL +#define INFINIPATH_HWE_PCIESERDESQ3PCLKNOTDETECT 0x0800000000000000ULL +#define INFINIPATH_HWE_DDSRXEQMEMORYPARITYERR 0x0000008000000000ULL +#define INFINIPATH_HWE_IB_UC_MEMORYPARITYERR 0x0000004000000000ULL +#define INFINIPATH_HWE_PCIE_UC_OCT0MEMORYPARITYERR 0x0000001000000000ULL +#define INFINIPATH_HWE_PCIE_UC_OCT1MEMORYPARITYERR 0x0000002000000000ULL + +#define IBA7220_IBCS_LINKTRAININGSTATE_MASK 0x1F +#define IBA7220_IBCS_LINKSTATE_SHIFT 5 +#define IBA7220_IBCS_LINKSPEED_SHIFT 8 +#define IBA7220_IBCS_LINKWIDTH_SHIFT 9 + +#define IBA7220_IBCC_LINKINITCMD_MASK 0x7ULL +#define IBA7220_IBCC_LINKCMD_SHIFT 19 +#define IBA7220_IBCC_MAXPKTLEN_SHIFT 21 + +/* kr_ibcddrctrl bits */ +#define IBA7220_IBC_DLIDLMC_MASK 0xFFFFFFFFUL +#define IBA7220_IBC_DLIDLMC_SHIFT 32 +#define IBA7220_IBC_HRTBT_MASK 3 +#define IBA7220_IBC_HRTBT_SHIFT 16 +#define IBA7220_IBC_HRTBT_ENB 0x10000UL +#define IBA7220_IBC_LANE_REV_SUPPORTED (1<<8) +#define IBA7220_IBC_LREV_MASK 1 +#define IBA7220_IBC_LREV_SHIFT 8 +#define IBA7220_IBC_RXPOL_MASK 1 +#define IBA7220_IBC_RXPOL_SHIFT 7 +#define IBA7220_IBC_WIDTH_SHIFT 5 +#define IBA7220_IBC_WIDTH_MASK 0x3 +#define IBA7220_IBC_WIDTH_1X_ONLY (0<ipath_p0_rcvegrcnt + + (port-1) * dd->ipath_rcvegrcnt : 0; +} + +static void ipath_7220_txe_recover(struct ipath_devdata *dd) +{ + ++ipath_stats.sps_txeparity; + + dev_info(&dd->pcidev->dev, + "Recovering from TXE PIO parity error\n"); + ipath_disarm_senderrbufs(dd, 1); +} + + +/** + * ipath_7220_handle_hwerrors - display hardware errors. + * @dd: the infinipath device + * @msg: the output buffer + * @msgl: the size of the output buffer + * + * Use same msg buffer as regular errors to avoid excessive stack + * use. Most hardware errors are catastrophic, but for right now, + * we'll print them and continue. We reuse the same message buffer as + * ipath_handle_errors() to avoid excessive stack usage. + */ +static void ipath_7220_handle_hwerrors(struct ipath_devdata *dd, char *msg, + size_t msgl) +{ + ipath_err_t hwerrs; + u32 bits, ctrl; + int isfatal = 0; + char bitsmsg[64]; + int log_idx; + + hwerrs = ipath_read_kreg64(dd, dd->ipath_kregs->kr_hwerrstatus); + if (!hwerrs) { + /* + * better than printing cofusing messages + * This seems to be related to clearing the crc error, or + * the pll error during init. + */ + ipath_cdbg(VERBOSE, "Called but no hardware errors set\n"); + goto bail; + } else if (hwerrs == ~0ULL) { + ipath_dev_err(dd, "Read of hardware error status failed " + "(all bits set); ignoring\n"); + goto bail; + } + ipath_stats.sps_hwerrs++; + + /* + * Always clear the error status register, except MEMBISTFAIL, + * regardless of whether we continue or stop using the chip. + * We want that set so we know it failed, even across driver reload. + * We'll still ignore it in the hwerrmask. We do this partly for + * diagnostics, but also for support. + */ + ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrclear, + hwerrs&~INFINIPATH_HWE_MEMBISTFAILED); + + hwerrs &= dd->ipath_hwerrmask; + + /* We log some errors to EEPROM, check if we have any of those. */ + for (log_idx = 0; log_idx < IPATH_EEP_LOG_CNT; ++log_idx) + if (hwerrs & dd->ipath_eep_st_masks[log_idx].hwerrs_to_log) + ipath_inc_eeprom_err(dd, log_idx, 1); + /* + * Make sure we get this much out, unless told to be quiet, + * or it's occurred within the last 5 seconds. + */ + if ((hwerrs & ~(dd->ipath_lasthwerror | + ((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF | + INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC) + << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT))) || + (ipath_debug & __IPATH_VERBDBG)) + dev_info(&dd->pcidev->dev, "Hardware error: hwerr=0x%llx " + "(cleared)\n", (unsigned long long) hwerrs); + dd->ipath_lasthwerror |= hwerrs; + + if (hwerrs & ~dd->ipath_hwe_bitsextant) + ipath_dev_err(dd, "hwerror interrupt with unknown errors " + "%llx set\n", (unsigned long long) + (hwerrs & ~dd->ipath_hwe_bitsextant)); + + if (hwerrs & INFINIPATH_HWE_IB_UC_MEMORYPARITYERR) + ipath_sd7220_clr_ibpar(dd); + + ctrl = ipath_read_kreg32(dd, dd->ipath_kregs->kr_control); + if ((ctrl & INFINIPATH_C_FREEZEMODE) && !ipath_diag_inuse) { + /* + * Parity errors in send memory are recoverable, + * just cancel the send (if indicated in * sendbuffererror), + * count the occurrence, unfreeze (if no other handled + * hardware error bits are set), and continue. + */ + if (hwerrs & ((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF | + INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC) + << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT)) { + ipath_7220_txe_recover(dd); + hwerrs &= ~((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF | + INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC) + << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT); + if (!hwerrs) { + /* else leave in freeze mode */ + ipath_write_kreg(dd, + dd->ipath_kregs->kr_control, + dd->ipath_control); + goto bail; + } + } + if (hwerrs) { + /* + * If any set that we aren't ignoring only make the + * complaint once, in case it's stuck or recurring, + * and we get here multiple times + * Force link down, so switch knows, and + * LEDs are turned off. + */ + if (dd->ipath_flags & IPATH_INITTED) { + ipath_set_linkstate(dd, IPATH_IB_LINKDOWN); + ipath_setup_7220_setextled(dd, + INFINIPATH_IBCS_L_STATE_DOWN, + INFINIPATH_IBCS_LT_STATE_DISABLED); + ipath_dev_err(dd, "Fatal Hardware Error " + "(freeze mode), no longer" + " usable, SN %.16s\n", + dd->ipath_serial); + isfatal = 1; + } + /* + * Mark as having had an error for driver, and also + * for /sys and status word mapped to user programs. + * This marks unit as not usable, until reset. + */ + *dd->ipath_statusp &= ~IPATH_STATUS_IB_READY; + *dd->ipath_statusp |= IPATH_STATUS_HWERROR; + dd->ipath_flags &= ~IPATH_INITTED; + } else { + ipath_dbg("Clearing freezemode on ignored hardware " + "error\n"); + ipath_clear_freeze(dd); + } + } + + *msg = '\0'; + + if (hwerrs & INFINIPATH_HWE_MEMBISTFAILED) { + strlcat(msg, "[Memory BIST test failed, " + "InfiniPath hardware unusable]", msgl); + /* ignore from now on, so disable until driver reloaded */ + *dd->ipath_statusp |= IPATH_STATUS_HWERROR; + dd->ipath_hwerrmask &= ~INFINIPATH_HWE_MEMBISTFAILED; + ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrmask, + dd->ipath_hwerrmask); + } + + ipath_format_hwerrors(hwerrs, + ipath_7220_hwerror_msgs, + ARRAY_SIZE(ipath_7220_hwerror_msgs), + msg, msgl); + + if (hwerrs & (INFINIPATH_HWE_PCIEMEMPARITYERR_MASK + << INFINIPATH_HWE_PCIEMEMPARITYERR_SHIFT)) { + bits = (u32) ((hwerrs >> + INFINIPATH_HWE_PCIEMEMPARITYERR_SHIFT) & + INFINIPATH_HWE_PCIEMEMPARITYERR_MASK); + snprintf(bitsmsg, sizeof bitsmsg, + "[PCIe Mem Parity Errs %x] ", bits); + strlcat(msg, bitsmsg, msgl); + } + +#define _IPATH_PLL_FAIL (INFINIPATH_HWE_COREPLL_FBSLIP | \ + INFINIPATH_HWE_COREPLL_RFSLIP) + + if (hwerrs & _IPATH_PLL_FAIL) { + snprintf(bitsmsg, sizeof bitsmsg, + "[PLL failed (%llx), InfiniPath hardware unusable]", + (unsigned long long) hwerrs & _IPATH_PLL_FAIL); + strlcat(msg, bitsmsg, msgl); + /* ignore from now on, so disable until driver reloaded */ + dd->ipath_hwerrmask &= ~(hwerrs & _IPATH_PLL_FAIL); + ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrmask, + dd->ipath_hwerrmask); + } + + if (hwerrs & INFINIPATH_HWE_SERDESPLLFAILED) { + /* + * If it occurs, it is left masked since the eternal + * interface is unused. + */ + dd->ipath_hwerrmask &= ~INFINIPATH_HWE_SERDESPLLFAILED; + ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrmask, + dd->ipath_hwerrmask); + } + + ipath_dev_err(dd, "%s hardware error\n", msg); + /* + * For /sys status file. if no trailing } is copied, we'll + * know it was truncated. + */ + if (isfatal && !ipath_diag_inuse && dd->ipath_freezemsg) + snprintf(dd->ipath_freezemsg, dd->ipath_freezelen, + "{%s}", msg); +bail:; +} + +/** + * ipath_7220_boardname - fill in the board name + * @dd: the infinipath device + * @name: the output buffer + * @namelen: the size of the output buffer + * + * info is based on the board revision register + */ +static int ipath_7220_boardname(struct ipath_devdata *dd, char *name, + size_t namelen) +{ + char *n = NULL; + u8 boardrev = dd->ipath_boardrev; + int ret; + + if (boardrev == 15) { + /* + * Emulator sometimes comes up all-ones, rather than zero. + */ + boardrev = 0; + dd->ipath_boardrev = boardrev; + } + switch (boardrev) { + case 0: + n = "InfiniPath_7220_Emulation"; + break; + case 1: + n = "InfiniPath_QLE7240"; + break; + case 2: + n = "InfiniPath_QLE7280"; + break; + case 3: + n = "InfiniPath_QLE7242"; + break; + case 4: + n = "InfiniPath_QEM7240"; + break; + case 5: + n = "InfiniPath_QMI7240"; + break; + case 6: + n = "InfiniPath_QMI7264"; + break; + case 7: + n = "InfiniPath_QMH7240"; + break; + case 8: + n = "InfiniPath_QME7240"; + break; + case 9: + n = "InfiniPath_QLE7250"; + break; + case 10: + n = "InfiniPath_QLE7290"; + break; + case 11: + n = "InfiniPath_QEM7250"; + break; + case 12: + n = "InfiniPath_QLE-Bringup"; + break; + default: + ipath_dev_err(dd, + "Don't yet know about board with ID %u\n", + boardrev); + snprintf(name, namelen, "Unknown_InfiniPath_PCIe_%u", + boardrev); + break; + } + if (n) + snprintf(name, namelen, "%s", n); + + if (dd->ipath_majrev != 5 || !dd->ipath_minrev || + dd->ipath_minrev > 2) { + ipath_dev_err(dd, "Unsupported InfiniPath hardware " + "revision %u.%u!\n", + dd->ipath_majrev, dd->ipath_minrev); + ret = 1; + } else if (dd->ipath_minrev == 1) { + /* Rev1 chips are prototype. Complain, but allow use */ + ipath_dev_err(dd, "Unsupported hardware " + "revision %u.%u, Contact support at qlogic.com\n", + dd->ipath_majrev, dd->ipath_minrev); + ret = 0; + } else + ret = 0; + + /* + * Set here not in ipath_init_*_funcs because we have to do + * it after we can read chip registers. + */ + dd->ipath_ureg_align = 0x10000; /* 64KB alignment */ + + return ret; +} + +/** + * ipath_7220_init_hwerrors - enable hardware errors + * @dd: the infinipath device + * + * now that we have finished initializing everything that might reasonably + * cause a hardware error, and cleared those errors bits as they occur, + * we can enable hardware errors in the mask (potentially enabling + * freeze mode), and enable hardware errors as errors (along with + * everything else) in errormask + */ +static void ipath_7220_init_hwerrors(struct ipath_devdata *dd) +{ + ipath_err_t val; + u64 extsval; + + extsval = ipath_read_kreg64(dd, dd->ipath_kregs->kr_extstatus); + + if (!(extsval & (INFINIPATH_EXTS_MEMBIST_ENDTEST | + INFINIPATH_EXTS_MEMBIST_DISABLED))) + ipath_dev_err(dd, "MemBIST did not complete!\n"); + if (extsval & INFINIPATH_EXTS_MEMBIST_DISABLED) + dev_info(&dd->pcidev->dev, "MemBIST is disabled.\n"); + + val = ~0ULL; /* barring bugs, all hwerrors become interrupts, */ + + if (!dd->ipath_boardrev) /* no PLL for Emulator */ + val &= ~INFINIPATH_HWE_SERDESPLLFAILED; + + if (dd->ipath_minrev == 1) + val &= ~(1ULL << 42); /* TXE LaunchFIFO Parity rev1 issue */ + + val &= ~INFINIPATH_HWE_IB_UC_MEMORYPARITYERR; + dd->ipath_hwerrmask = val; + + /* + * special trigger "error" is for debugging purposes. It + * works around a processor/chipset problem. The error + * interrupt allows us to count occurrences, but we don't + * want to pay the overhead for normal use. Emulation only + */ + if (!dd->ipath_boardrev) + dd->ipath_maskederrs = INFINIPATH_E_SENDSPECIALTRIGGER; +} + +/* + * All detailed interaction with the SerDes has been moved to ipath_sd7220.c + * + * The portion of IBA7220-specific bringup_serdes() that actually deals with + * registers and memory within the SerDes itself is ipath_sd7220_init(). + */ + +/** + * ipath_7220_bringup_serdes - bring up the serdes + * @dd: the infinipath device + */ +static int ipath_7220_bringup_serdes(struct ipath_devdata *dd) +{ + int ret = 0; + u64 val, prev_val, guid; + int was_reset; /* Note whether uC was reset */ + + ipath_dbg("Trying to bringup serdes\n"); + + if (ipath_read_kreg64(dd, dd->ipath_kregs->kr_hwerrstatus) & + INFINIPATH_HWE_SERDESPLLFAILED) { + ipath_dbg("At start, serdes PLL failed bit set " + "in hwerrstatus, clearing and continuing\n"); + ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrclear, + INFINIPATH_HWE_SERDESPLLFAILED); + } + + if (!dd->ipath_ibcddrctrl) { + /* not on re-init after reset */ + dd->ipath_ibcddrctrl = + ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibcddrctrl); + + if (dd->ipath_link_speed_enabled == + (IPATH_IB_SDR | IPATH_IB_DDR)) + dd->ipath_ibcddrctrl |= + IBA7220_IBC_SPEED_AUTONEG_MASK | + IBA7220_IBC_IBTA_1_2_MASK; + else + dd->ipath_ibcddrctrl |= + dd->ipath_link_speed_enabled == IPATH_IB_DDR + ? IBA7220_IBC_SPEED_DDR : + IBA7220_IBC_SPEED_SDR; + if ((dd->ipath_link_width_enabled & (IB_WIDTH_1X | + IB_WIDTH_4X)) == (IB_WIDTH_1X | IB_WIDTH_4X)) + dd->ipath_ibcddrctrl |= IBA7220_IBC_WIDTH_AUTONEG; + else + dd->ipath_ibcddrctrl |= + dd->ipath_link_width_enabled == IB_WIDTH_4X + ? IBA7220_IBC_WIDTH_4X_ONLY : + IBA7220_IBC_WIDTH_1X_ONLY; + + /* always enable these on driver reload, not sticky */ + dd->ipath_ibcddrctrl |= + IBA7220_IBC_RXPOL_MASK << IBA7220_IBC_RXPOL_SHIFT; + dd->ipath_ibcddrctrl |= + IBA7220_IBC_HRTBT_MASK << IBA7220_IBC_HRTBT_SHIFT; + /* + * automatic lane reversal detection for receive + * doesn't work correctly in rev 1, so disable it + * on that rev, otherwise enable (disabling not + * sticky across reload for >rev1) + */ + if (dd->ipath_minrev == 1) + dd->ipath_ibcddrctrl &= + ~IBA7220_IBC_LANE_REV_SUPPORTED; + else + dd->ipath_ibcddrctrl |= + IBA7220_IBC_LANE_REV_SUPPORTED; + } + + ipath_write_kreg(dd, dd->ipath_kregs->kr_ibcddrctrl, + dd->ipath_ibcddrctrl); + + ipath_write_kreg(dd, IPATH_KREG_OFFSET(IBNCModeCtrl), 0Ull); + + /* IBA7220 has SERDES MPU reset in D0 of what _was_ IBPLLCfg */ + val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibserdesctrl); + /* remember if uC was in Reset or not, for dactrim */ + was_reset = (val & 1); + ipath_cdbg(VERBOSE, "IBReset %s xgxsconfig %llx\n", + was_reset ? "Asserted" : "Negated", (unsigned long long) + ipath_read_kreg64(dd, dd->ipath_kregs->kr_xgxsconfig)); + + if (dd->ipath_boardrev) { + /* + * Hardware is not emulator, and may have been reset. Init it. + * Below will release reset, but needs to know if chip was + * originally in reset, to only trim DACs on first time + * after chip reset or powercycle (not driver reload) + */ + ret = ipath_sd7220_init(dd, was_reset); + } + + val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_xgxsconfig); + prev_val = val; + val |= INFINIPATH_XGXS_FC_SAFE; + if (val != prev_val) { + ipath_write_kreg(dd, dd->ipath_kregs->kr_xgxsconfig, val); + ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); + } + if (val & INFINIPATH_XGXS_RESET) + val &= ~INFINIPATH_XGXS_RESET; + if (val != prev_val) + ipath_write_kreg(dd, dd->ipath_kregs->kr_xgxsconfig, val); + + ipath_cdbg(VERBOSE, "done: xgxs=%llx from %llx\n", + (unsigned long long) + ipath_read_kreg64(dd, dd->ipath_kregs->kr_xgxsconfig), + prev_val); + + guid = be64_to_cpu(dd->ipath_guid); + + if (!guid) { + /* have to have something, so use likely unique tsc */ + guid = get_cycles(); + ipath_dbg("No GUID for heartbeat, faking %llx\n", + (unsigned long long)guid); + } else + ipath_cdbg(VERBOSE, "Wrote %llX to HRTBT_GUID\n", guid); + ipath_write_kreg(dd, dd->ipath_kregs->kr_hrtbt_guid, guid); + return ret; +} + +static void ipath_7220_config_jint(struct ipath_devdata *dd, + u16 idle_ticks, u16 max_packets) +{ + + /* + * We can request a receive interrupt for 1 or more packets + * from current offset. + */ + if (idle_ticks == 0 || max_packets == 0) + /* interrupt after one packet if no mitigation */ + dd->ipath_rhdrhead_intr_off = + 1ULL << IBA7220_HDRHEAD_PKTINT_SHIFT; + else + /* Turn off RcvHdrHead interrupts if using mitigation */ + dd->ipath_rhdrhead_intr_off = 0ULL; + + /* refresh kernel RcvHdrHead registers... */ + ipath_write_ureg(dd, ur_rcvhdrhead, + dd->ipath_rhdrhead_intr_off | + dd->ipath_pd[0]->port_head, 0); + + dd->ipath_jint_max_packets = max_packets; + dd->ipath_jint_idle_ticks = idle_ticks; + ipath_write_kreg(dd, dd->ipath_kregs->kr_jintreload, + ((u64) max_packets << INFINIPATH_JINT_PACKETSHIFT) | + idle_ticks); +} + +/** + * ipath_7220_quiet_serdes - set serdes to txidle + * @dd: the infinipath device + * Called when driver is being unloaded + */ +static void ipath_7220_quiet_serdes(struct ipath_devdata *dd) +{ + u64 val; + dd->ipath_flags &= ~IPATH_IB_AUTONEG_INPROG; + wake_up(&dd->ipath_autoneg_wait); + cancel_delayed_work(&dd->ipath_autoneg_work); + flush_scheduled_work(); + ipath_shutdown_relock_poll(dd); + val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_xgxsconfig); + val |= INFINIPATH_XGXS_RESET; + ipath_write_kreg(dd, dd->ipath_kregs->kr_xgxsconfig, val); +} + +static int ipath_7220_intconfig(struct ipath_devdata *dd) +{ + ipath_7220_config_jint(dd, dd->ipath_jint_idle_ticks, + dd->ipath_jint_max_packets); + return 0; +} + +/** + * ipath_setup_7220_setextled - set the state of the two external LEDs + * @dd: the infinipath device + * @lst: the L state + * @ltst: the LT state + * + * These LEDs indicate the physical and logical state of IB link. + * For this chip (at least with recommended board pinouts), LED1 + * is Yellow (logical state) and LED2 is Green (physical state), + * + * Note: We try to match the Mellanox HCA LED behavior as best + * we can. Green indicates physical link state is OK (something is + * plugged in, and we can train). + * Amber indicates the link is logically up (ACTIVE). + * Mellanox further blinks the amber LED to indicate data packet + * activity, but we have no hardware support for that, so it would + * require waking up every 10-20 msecs and checking the counters + * on the chip, and then turning the LED off if appropriate. That's + * visible overhead, so not something we will do. + * + */ +static void ipath_setup_7220_setextled(struct ipath_devdata *dd, u64 lst, + u64 ltst) +{ + u64 extctl, ledblink = 0; + unsigned long flags = 0; + + /* the diags use the LED to indicate diag info, so we leave + * the external LED alone when the diags are running */ + if (ipath_diag_inuse) + return; + + /* Allow override of LED display for, e.g. Locating system in rack */ + if (dd->ipath_led_override) { + ltst = (dd->ipath_led_override & IPATH_LED_PHYS) + ? INFINIPATH_IBCS_LT_STATE_LINKUP + : INFINIPATH_IBCS_LT_STATE_DISABLED; + lst = (dd->ipath_led_override & IPATH_LED_LOG) + ? INFINIPATH_IBCS_L_STATE_ACTIVE + : INFINIPATH_IBCS_L_STATE_DOWN; + } + + spin_lock_irqsave(&dd->ipath_gpio_lock, flags); + extctl = dd->ipath_extctrl & ~(INFINIPATH_EXTC_LED1PRIPORT_ON | + INFINIPATH_EXTC_LED2PRIPORT_ON); + if (ltst == INFINIPATH_IBCS_LT_STATE_LINKUP) { + extctl |= INFINIPATH_EXTC_LED1PRIPORT_ON; + /* + * counts are in chip clock (4ns) periods. + * This is 1/16 sec (66.6ms) on, + * 3/16 sec (187.5 ms) off, with packets rcvd + */ + ledblink = ((66600*1000UL/4) << IBA7220_LEDBLINK_ON_SHIFT) + | ((187500*1000UL/4) << IBA7220_LEDBLINK_OFF_SHIFT); + } + if (lst == INFINIPATH_IBCS_L_STATE_ACTIVE) + extctl |= INFINIPATH_EXTC_LED2PRIPORT_ON; + dd->ipath_extctrl = extctl; + ipath_write_kreg(dd, dd->ipath_kregs->kr_extctrl, extctl); + spin_unlock_irqrestore(&dd->ipath_gpio_lock, flags); + + if (ledblink) /* blink the LED on packet receive */ + ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvpktledcnt, + ledblink); +} + +/* + * Similar to pci_intx(pdev, 1), except that we make sure + * msi is off... + */ +static void ipath_enable_intx(struct pci_dev *pdev) +{ + u16 cw, new; + int pos; + + /* first, turn on INTx */ + pci_read_config_word(pdev, PCI_COMMAND, &cw); + new = cw & ~PCI_COMMAND_INTX_DISABLE; + if (new != cw) + pci_write_config_word(pdev, PCI_COMMAND, new); + + /* then turn off MSI */ + pos = pci_find_capability(pdev, PCI_CAP_ID_MSI); + if (pos) { + pci_read_config_word(pdev, pos + PCI_MSI_FLAGS, &cw); + new = cw & ~PCI_MSI_FLAGS_ENABLE; + if (new != cw) + pci_write_config_word(pdev, pos + PCI_MSI_FLAGS, new); + } +} + +static int ipath_msi_enabled(struct pci_dev *pdev) +{ + int pos, ret = 0; + + pos = pci_find_capability(pdev, PCI_CAP_ID_MSI); + if (pos) { + u16 cw; + + pci_read_config_word(pdev, pos + PCI_MSI_FLAGS, &cw); + ret = !!(cw & PCI_MSI_FLAGS_ENABLE); + } + return ret; +} + +/* + * disable msi interrupt if enabled, and clear the flag. + * flag is used primarily for the fallback to IntX, but + * is also used in reinit after reset as a flag. + */ +static void ipath_7220_nomsi(struct ipath_devdata *dd) +{ + dd->ipath_msi_lo = 0; +#ifdef CONFIG_PCI_MSI + if (ipath_msi_enabled(dd->pcidev)) { + /* + * free, but don't zero; later kernels require + * it be freed before disable_msi, so the intx + * setup has to request it again. + */ + if (dd->ipath_irq) + free_irq(dd->ipath_irq, dd); + pci_disable_msi(dd->pcidev); + } +#endif +} + +/* + * ipath_setup_7220_cleanup - clean up any per-chip chip-specific stuff + * @dd: the infinipath device + * + * Nothing but msi interrupt cleanup for now. + * + * This is called during driver unload. + */ +static void ipath_setup_7220_cleanup(struct ipath_devdata *dd) +{ + ipath_7220_nomsi(dd); +} + + +static void ipath_7220_pcie_params(struct ipath_devdata *dd, u32 boardrev) +{ + u16 linkstat, minwidth, speed; + int pos; + + pos = pci_find_capability(dd->pcidev, PCI_CAP_ID_EXP); + if (!pos) { + ipath_dev_err(dd, "Can't find PCI Express capability!\n"); + goto bail; + } + + pci_read_config_word(dd->pcidev, pos + PCI_EXP_LNKSTA, + &linkstat); + /* + * speed is bits 0-4, linkwidth is bits 4-8 + * no defines for them in headers + */ + speed = linkstat & 0xf; + linkstat >>= 4; + linkstat &= 0x1f; + dd->ipath_lbus_width = linkstat; + switch (boardrev) { + case 0: + case 2: + case 10: + case 12: + minwidth = 16; /* x16 capable boards */ + break; + default: + minwidth = 8; /* x8 capable boards */ + break; + } + + switch (speed) { + case 1: + dd->ipath_lbus_speed = 2500; /* Gen1, 2.5GHz */ + break; + case 2: + dd->ipath_lbus_speed = 5000; /* Gen1, 5GHz */ + break; + default: /* not defined, assume gen1 */ + dd->ipath_lbus_speed = 2500; + break; + } + + if (linkstat < minwidth) + ipath_dev_err(dd, + "PCIe width %u (x%u HCA), performance " + "reduced\n", linkstat, minwidth); + else + ipath_cdbg(VERBOSE, "PCIe speed %u width %u (x%u HCA)\n", + dd->ipath_lbus_speed, linkstat, minwidth); + + if (speed != 1) + ipath_dev_err(dd, + "PCIe linkspeed %u is incorrect; " + "should be 1 (2500)!\n", speed); + +bail: + /* fill in string, even on errors */ + snprintf(dd->ipath_lbus_info, sizeof(dd->ipath_lbus_info), + "PCIe,%uMHz,x%u\n", + dd->ipath_lbus_speed, + dd->ipath_lbus_width); + return; +} + + +/** + * ipath_setup_7220_config - setup PCIe config related stuff + * @dd: the infinipath device + * @pdev: the PCI device + * + * The pci_enable_msi() call will fail on systems with MSI quirks + * such as those with AMD8131, even if the device of interest is not + * attached to that device, (in the 2.6.13 - 2.6.15 kernels, at least, fixed + * late in 2.6.16). + * All that can be done is to edit the kernel source to remove the quirk + * check until that is fixed. + * We do not need to call enable_msi() for our HyperTransport chip, + * even though it uses MSI, and we want to avoid the quirk warning, so + * So we call enable_msi only for PCIe. If we do end up needing + * pci_enable_msi at some point in the future for HT, we'll move the + * call back into the main init_one code. + * We save the msi lo and hi values, so we can restore them after + * chip reset (the kernel PCI infrastructure doesn't yet handle that + * correctly). + */ +static int ipath_setup_7220_config(struct ipath_devdata *dd, + struct pci_dev *pdev) +{ + int pos, ret = -1; + u32 boardrev; + + dd->ipath_msi_lo = 0; /* used as a flag during reset processing */ +#ifdef CONFIG_PCI_MSI + pos = pci_find_capability(pdev, PCI_CAP_ID_MSI); + if (!strcmp(int_type, "force_msi") || !strcmp(int_type, "auto")) + ret = pci_enable_msi(pdev); + if (ret) { + if (!strcmp(int_type, "force_msi")) { + ipath_dev_err(dd, "pci_enable_msi failed: %d, " + "force_msi is on, so not continuing.\n", + ret); + return ret; + } + + ipath_enable_intx(pdev); + if (!strcmp(int_type, "auto")) + ipath_dev_err(dd, "pci_enable_msi failed: %d, " + "falling back to INTx\n", ret); + } else if (pos) { + u16 control; + pci_read_config_dword(pdev, pos + PCI_MSI_ADDRESS_LO, + &dd->ipath_msi_lo); + pci_read_config_dword(pdev, pos + PCI_MSI_ADDRESS_HI, + &dd->ipath_msi_hi); + pci_read_config_word(pdev, pos + PCI_MSI_FLAGS, + &control); + /* now save the data (vector) info */ + pci_read_config_word(pdev, + pos + ((control & PCI_MSI_FLAGS_64BIT) + ? PCI_MSI_DATA_64 : + PCI_MSI_DATA_32), + &dd->ipath_msi_data); + } else + ipath_dev_err(dd, "Can't find MSI capability, " + "can't save MSI settings for reset\n"); +#else + ipath_dbg("PCI_MSI not configured, using IntX interrupts\n"); + ipath_enable_intx(pdev); +#endif + + dd->ipath_irq = pdev->irq; + + /* + * We save the cachelinesize also, although it doesn't + * really matter. + */ + pci_read_config_byte(pdev, PCI_CACHE_LINE_SIZE, + &dd->ipath_pci_cacheline); + + /* + * this function called early, ipath_boardrev not set yet. Can't + * use ipath_read_kreg64() yet, too early in init, so use readq() + */ + boardrev = (readq(&dd->ipath_kregbase[dd->ipath_kregs->kr_revision]) + >> INFINIPATH_R_BOARDID_SHIFT) & INFINIPATH_R_BOARDID_MASK; + + ipath_7220_pcie_params(dd, boardrev); + + dd->ipath_flags |= IPATH_NODMA_RTAIL | IPATH_HAS_SEND_DMA | + IPATH_HAS_PBC_CNT | IPATH_HAS_THRESH_UPDATE; + dd->ipath_pioupd_thresh = 4U; /* set default update threshold */ + return 0; +} + +static void ipath_init_7220_variables(struct ipath_devdata *dd) +{ + /* + * setup the register offsets, since they are different for each + * chip + */ + dd->ipath_kregs = &ipath_7220_kregs; + dd->ipath_cregs = &ipath_7220_cregs; + + /* + * bits for selecting i2c direction and values, + * used for I2C serial flash + */ + dd->ipath_gpio_sda_num = _IPATH_GPIO_SDA_NUM; + dd->ipath_gpio_scl_num = _IPATH_GPIO_SCL_NUM; + dd->ipath_gpio_sda = IPATH_GPIO_SDA; + dd->ipath_gpio_scl = IPATH_GPIO_SCL; + + /* + * Fill in data for field-values that change in IBA7220. + * We dynamically specify only the mask for LINKTRAININGSTATE + * and only the shift for LINKSTATE, as they are the only ones + * that change. Also precalculate the 3 link states of interest + * and the combined mask. + */ + dd->ibcs_ls_shift = IBA7220_IBCS_LINKSTATE_SHIFT; + dd->ibcs_lts_mask = IBA7220_IBCS_LINKTRAININGSTATE_MASK; + dd->ibcs_mask = (INFINIPATH_IBCS_LINKSTATE_MASK << + dd->ibcs_ls_shift) | dd->ibcs_lts_mask; + dd->ib_init = (INFINIPATH_IBCS_LT_STATE_LINKUP << + INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT) | + (INFINIPATH_IBCS_L_STATE_INIT << dd->ibcs_ls_shift); + dd->ib_arm = (INFINIPATH_IBCS_LT_STATE_LINKUP << + INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT) | + (INFINIPATH_IBCS_L_STATE_ARM << dd->ibcs_ls_shift); + dd->ib_active = (INFINIPATH_IBCS_LT_STATE_LINKUP << + INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT) | + (INFINIPATH_IBCS_L_STATE_ACTIVE << dd->ibcs_ls_shift); + + /* + * Fill in data for ibcc field-values that change in IBA7220. + * We dynamically specify only the mask for LINKINITCMD + * and only the shift for LINKCMD and MAXPKTLEN, as they are + * the only ones that change. + */ + dd->ibcc_lic_mask = IBA7220_IBCC_LINKINITCMD_MASK; + dd->ibcc_lc_shift = IBA7220_IBCC_LINKCMD_SHIFT; + dd->ibcc_mpl_shift = IBA7220_IBCC_MAXPKTLEN_SHIFT; + + /* Fill in shifts for RcvCtrl. */ + dd->ipath_r_portenable_shift = INFINIPATH_R_PORTENABLE_SHIFT; + dd->ipath_r_intravail_shift = IBA7220_R_INTRAVAIL_SHIFT; + dd->ipath_r_tailupd_shift = IBA7220_R_TAILUPD_SHIFT; + dd->ipath_r_portcfg_shift = IBA7220_R_PORTCFG_SHIFT; + + /* variables for sanity checking interrupt and errors */ + dd->ipath_hwe_bitsextant = + (INFINIPATH_HWE_RXEMEMPARITYERR_MASK << + INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT) | + (INFINIPATH_HWE_TXEMEMPARITYERR_MASK << + INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT) | + (INFINIPATH_HWE_PCIEMEMPARITYERR_MASK << + INFINIPATH_HWE_PCIEMEMPARITYERR_SHIFT) | + INFINIPATH_HWE_PCIE1PLLFAILED | + INFINIPATH_HWE_PCIE0PLLFAILED | + INFINIPATH_HWE_PCIEPOISONEDTLP | + INFINIPATH_HWE_PCIECPLTIMEOUT | + INFINIPATH_HWE_PCIEBUSPARITYXTLH | + INFINIPATH_HWE_PCIEBUSPARITYXADM | + INFINIPATH_HWE_PCIEBUSPARITYRADM | + INFINIPATH_HWE_MEMBISTFAILED | + INFINIPATH_HWE_COREPLL_FBSLIP | + INFINIPATH_HWE_COREPLL_RFSLIP | + INFINIPATH_HWE_SERDESPLLFAILED | + INFINIPATH_HWE_IBCBUSTOSPCPARITYERR | + INFINIPATH_HWE_IBCBUSFRSPCPARITYERR | + INFINIPATH_HWE_PCIECPLDATAQUEUEERR | + INFINIPATH_HWE_PCIECPLHDRQUEUEERR | + INFINIPATH_HWE_SDMAMEMREADERR | + INFINIPATH_HWE_CLK_UC_PLLNOTLOCKED | + INFINIPATH_HWE_PCIESERDESQ0PCLKNOTDETECT | + INFINIPATH_HWE_PCIESERDESQ1PCLKNOTDETECT | + INFINIPATH_HWE_PCIESERDESQ2PCLKNOTDETECT | + INFINIPATH_HWE_PCIESERDESQ3PCLKNOTDETECT | + INFINIPATH_HWE_DDSRXEQMEMORYPARITYERR | + INFINIPATH_HWE_IB_UC_MEMORYPARITYERR | + INFINIPATH_HWE_PCIE_UC_OCT0MEMORYPARITYERR | + INFINIPATH_HWE_PCIE_UC_OCT1MEMORYPARITYERR; + dd->ipath_i_bitsextant = + INFINIPATH_I_SDMAINT | INFINIPATH_I_SDMADISABLED | + (INFINIPATH_I_RCVURG_MASK << INFINIPATH_I_RCVURG_SHIFT) | + (INFINIPATH_I_RCVAVAIL_MASK << + INFINIPATH_I_RCVAVAIL_SHIFT) | + INFINIPATH_I_ERROR | INFINIPATH_I_SPIOSENT | + INFINIPATH_I_SPIOBUFAVAIL | INFINIPATH_I_GPIO | + INFINIPATH_I_JINT | INFINIPATH_I_SERDESTRIMDONE; + dd->ipath_e_bitsextant = + INFINIPATH_E_RFORMATERR | INFINIPATH_E_RVCRC | + INFINIPATH_E_RICRC | INFINIPATH_E_RMINPKTLEN | + INFINIPATH_E_RMAXPKTLEN | INFINIPATH_E_RLONGPKTLEN | + INFINIPATH_E_RSHORTPKTLEN | INFINIPATH_E_RUNEXPCHAR | + INFINIPATH_E_RUNSUPVL | INFINIPATH_E_REBP | + INFINIPATH_E_RIBFLOW | INFINIPATH_E_RBADVERSION | + INFINIPATH_E_RRCVEGRFULL | INFINIPATH_E_RRCVHDRFULL | + INFINIPATH_E_RBADTID | INFINIPATH_E_RHDRLEN | + INFINIPATH_E_RHDR | INFINIPATH_E_RIBLOSTLINK | + INFINIPATH_E_SENDSPECIALTRIGGER | + INFINIPATH_E_SDMADISABLED | INFINIPATH_E_SMINPKTLEN | + INFINIPATH_E_SMAXPKTLEN | INFINIPATH_E_SUNDERRUN | + INFINIPATH_E_SPKTLEN | INFINIPATH_E_SDROPPEDSMPPKT | + INFINIPATH_E_SDROPPEDDATAPKT | + INFINIPATH_E_SPIOARMLAUNCH | INFINIPATH_E_SUNEXPERRPKTNUM | + INFINIPATH_E_SUNSUPVL | INFINIPATH_E_SENDBUFMISUSE | + INFINIPATH_E_SDMAGENMISMATCH | INFINIPATH_E_SDMAOUTOFBOUND | + INFINIPATH_E_SDMATAILOUTOFBOUND | INFINIPATH_E_SDMABASE | + INFINIPATH_E_SDMA1STDESC | INFINIPATH_E_SDMARPYTAG | + INFINIPATH_E_SDMADWEN | INFINIPATH_E_SDMAMISSINGDW | + INFINIPATH_E_SDMAUNEXPDATA | + INFINIPATH_E_IBSTATUSCHANGED | INFINIPATH_E_INVALIDADDR | + INFINIPATH_E_RESET | INFINIPATH_E_HARDWARE | + INFINIPATH_E_SDMADESCADDRMISALIGN | + INFINIPATH_E_INVALIDEEPCMD; + + dd->ipath_i_rcvavail_mask = INFINIPATH_I_RCVAVAIL_MASK; + dd->ipath_i_rcvurg_mask = INFINIPATH_I_RCVURG_MASK; + dd->ipath_i_rcvavail_shift = INFINIPATH_I_RCVAVAIL_SHIFT; + dd->ipath_i_rcvurg_shift = INFINIPATH_I_RCVURG_SHIFT; + dd->ipath_flags |= IPATH_INTREG_64 | IPATH_HAS_MULT_IB_SPEED + | IPATH_HAS_LINK_LATENCY; + + /* + * EEPROM error log 0 is TXE Parity errors. 1 is RXE Parity. + * 2 is Some Misc, 3 is reserved for future. + */ + dd->ipath_eep_st_masks[0].hwerrs_to_log = + INFINIPATH_HWE_TXEMEMPARITYERR_MASK << + INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT; + + dd->ipath_eep_st_masks[1].hwerrs_to_log = + INFINIPATH_HWE_RXEMEMPARITYERR_MASK << + INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT; + + dd->ipath_eep_st_masks[2].errs_to_log = INFINIPATH_E_RESET; + + ipath_linkrecovery = 0; + + init_waitqueue_head(&dd->ipath_autoneg_wait); + INIT_DELAYED_WORK(&dd->ipath_autoneg_work, autoneg_work); + + dd->ipath_link_width_supported = IB_WIDTH_1X | IB_WIDTH_4X; + dd->ipath_link_speed_supported = IPATH_IB_SDR | IPATH_IB_DDR; + + dd->ipath_link_width_enabled = dd->ipath_link_width_supported; + dd->ipath_link_speed_enabled = dd->ipath_link_speed_supported; + /* + * set the initial values to reasonable default, will be set + * for real when link is up. + */ + dd->ipath_link_width_active = IB_WIDTH_4X; + dd->ipath_link_speed_active = IPATH_IB_SDR; + dd->delay_mult = rate_to_delay[0][1]; +} + + +/* + * Setup the MSI stuff again after a reset. I'd like to just call + * pci_enable_msi() and request_irq() again, but when I do that, + * the MSI enable bit doesn't get set in the command word, and + * we switch to to a different interrupt vector, which is confusing, + * so I instead just do it all inline. Perhaps somehow can tie this + * into the PCIe hotplug support at some point + * Note, because I'm doing it all here, I don't call pci_disable_msi() + * or free_irq() at the start of ipath_setup_7220_reset(). + */ +static int ipath_reinit_msi(struct ipath_devdata *dd) +{ + int ret = 0; +#ifdef CONFIG_PCI_MSI + int pos; + u16 control; + if (!dd->ipath_msi_lo) /* Using intX, or init problem */ + goto bail; + + pos = pci_find_capability(dd->pcidev, PCI_CAP_ID_MSI); + if (!pos) { + ipath_dev_err(dd, "Can't find MSI capability, " + "can't restore MSI settings\n"); + goto bail; + } + ipath_cdbg(VERBOSE, "Writing msi_lo 0x%x to config offset 0x%x\n", + dd->ipath_msi_lo, pos + PCI_MSI_ADDRESS_LO); + pci_write_config_dword(dd->pcidev, pos + PCI_MSI_ADDRESS_LO, + dd->ipath_msi_lo); + ipath_cdbg(VERBOSE, "Writing msi_lo 0x%x to config offset 0x%x\n", + dd->ipath_msi_hi, pos + PCI_MSI_ADDRESS_HI); + pci_write_config_dword(dd->pcidev, pos + PCI_MSI_ADDRESS_HI, + dd->ipath_msi_hi); + pci_read_config_word(dd->pcidev, pos + PCI_MSI_FLAGS, &control); + if (!(control & PCI_MSI_FLAGS_ENABLE)) { + ipath_cdbg(VERBOSE, "MSI control at off %x was %x, " + "setting MSI enable (%x)\n", pos + PCI_MSI_FLAGS, + control, control | PCI_MSI_FLAGS_ENABLE); + control |= PCI_MSI_FLAGS_ENABLE; + pci_write_config_word(dd->pcidev, pos + PCI_MSI_FLAGS, + control); + } + /* now rewrite the data (vector) info */ + pci_write_config_word(dd->pcidev, pos + + ((control & PCI_MSI_FLAGS_64BIT) ? 12 : 8), + dd->ipath_msi_data); + ret = 1; +bail: +#endif + if (!ret) { + ipath_dbg("Using IntX, MSI disabled or not configured\n"); + ipath_enable_intx(dd->pcidev); + ret = 1; + } + /* + * We restore the cachelinesize also, although it doesn't really + * matter. + */ + pci_write_config_byte(dd->pcidev, PCI_CACHE_LINE_SIZE, + dd->ipath_pci_cacheline); + /* and now set the pci master bit again */ + pci_set_master(dd->pcidev); + + return ret; +} + +/* + * This routine sleeps, so it can only be called from user context, not + * from interrupt context. If we need interrupt context, we can split + * it into two routines. + */ +static int ipath_setup_7220_reset(struct ipath_devdata *dd) +{ + u64 val; + int i; + int ret; + u16 cmdval; + + pci_read_config_word(dd->pcidev, PCI_COMMAND, &cmdval); + + /* Use dev_err so it shows up in logs, etc. */ + ipath_dev_err(dd, "Resetting InfiniPath unit %u\n", dd->ipath_unit); + + /* keep chip from being accessed in a few places */ + dd->ipath_flags &= ~(IPATH_INITTED | IPATH_PRESENT); + val = dd->ipath_control | INFINIPATH_C_RESET; + ipath_write_kreg(dd, dd->ipath_kregs->kr_control, val); + mb(); + + for (i = 1; i <= 5; i++) { + int r; + + /* + * Allow MBIST, etc. to complete; longer on each retry. + * We sometimes get machine checks from bus timeout if no + * response, so for now, make it *really* long. + */ + msleep(1000 + (1 + i) * 2000); + r = pci_write_config_dword(dd->pcidev, PCI_BASE_ADDRESS_0, + dd->ipath_pcibar0); + if (r) + ipath_dev_err(dd, "rewrite of BAR0 failed: %d\n", r); + r = pci_write_config_dword(dd->pcidev, PCI_BASE_ADDRESS_1, + dd->ipath_pcibar1); + if (r) + ipath_dev_err(dd, "rewrite of BAR1 failed: %d\n", r); + /* now re-enable memory access */ + pci_write_config_word(dd->pcidev, PCI_COMMAND, cmdval); + r = pci_enable_device(dd->pcidev); + if (r) + ipath_dev_err(dd, "pci_enable_device failed after " + "reset: %d\n", r); + /* + * whether it fully enabled or not, mark as present, + * again (but not INITTED) + */ + dd->ipath_flags |= IPATH_PRESENT; + val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_revision); + if (val == dd->ipath_revision) { + ipath_cdbg(VERBOSE, "Got matching revision " + "register %llx on try %d\n", + (unsigned long long) val, i); + ret = ipath_reinit_msi(dd); + goto bail; + } + /* Probably getting -1 back */ + ipath_dbg("Didn't get expected revision register, " + "got %llx, try %d\n", (unsigned long long) val, + i + 1); + } + ret = 0; /* failed */ + +bail: + if (ret) + ipath_7220_pcie_params(dd, dd->ipath_boardrev); + + return ret; +} + +/** + * ipath_7220_put_tid - write a TID to the chip + * @dd: the infinipath device + * @tidptr: pointer to the expected TID (in chip) to udpate + * @tidtype: 0 for eager, 1 for expected + * @pa: physical address of in memory buffer; ipath_tidinvalid if freeing + * + * This exists as a separate routine to allow for selection of the + * appropriate "flavor". The static calls in cleanup just use the + * revision-agnostic form, as they are not performance critical. + */ +static void ipath_7220_put_tid(struct ipath_devdata *dd, u64 __iomem *tidptr, + u32 type, unsigned long pa) +{ + if (pa != dd->ipath_tidinvalid) { + u64 chippa = pa >> IBA7220_TID_PA_SHIFT; + + /* paranoia checks */ + if (pa != (chippa << IBA7220_TID_PA_SHIFT)) { + dev_info(&dd->pcidev->dev, "BUG: physaddr %lx " + "not 2KB aligned!\n", pa); + return; + } + if (pa >= (1UL << IBA7220_TID_SZ_SHIFT)) { + ipath_dev_err(dd, + "BUG: Physical page address 0x%lx " + "larger than supported\n", pa); + return; + } + + if (type == RCVHQ_RCV_TYPE_EAGER) + chippa |= dd->ipath_tidtemplate; + else /* for now, always full 4KB page */ + chippa |= IBA7220_TID_SZ_4K; + writeq(chippa, tidptr); + } else + writeq(pa, tidptr); + mmiowb(); +} + +/** + * ipath_7220_clear_tid - clear all TID entries for a port, expected and eager + * @dd: the infinipath device + * @port: the port + * + * clear all TID entries for a port, expected and eager. + * Used from ipath_close(). On this chip, TIDs are only 32 bits, + * not 64, but they are still on 64 bit boundaries, so tidbase + * is declared as u64 * for the pointer math, even though we write 32 bits + */ +static void ipath_7220_clear_tids(struct ipath_devdata *dd, unsigned port) +{ + u64 __iomem *tidbase; + unsigned long tidinv; + int i; + + if (!dd->ipath_kregbase) + return; + + ipath_cdbg(VERBOSE, "Invalidate TIDs for port %u\n", port); + + tidinv = dd->ipath_tidinvalid; + tidbase = (u64 __iomem *) + ((char __iomem *)(dd->ipath_kregbase) + + dd->ipath_rcvtidbase + + port * dd->ipath_rcvtidcnt * sizeof(*tidbase)); + + for (i = 0; i < dd->ipath_rcvtidcnt; i++) + ipath_7220_put_tid(dd, &tidbase[i], RCVHQ_RCV_TYPE_EXPECTED, + tidinv); + + tidbase = (u64 __iomem *) + ((char __iomem *)(dd->ipath_kregbase) + + dd->ipath_rcvegrbase + port_egrtid_idx(dd, port) + * sizeof(*tidbase)); + + for (i = port ? dd->ipath_rcvegrcnt : dd->ipath_p0_rcvegrcnt; i; i--) + ipath_7220_put_tid(dd, &tidbase[i-1], RCVHQ_RCV_TYPE_EAGER, + tidinv); +} + +/** + * ipath_7220_tidtemplate - setup constants for TID updates + * @dd: the infinipath device + * + * We setup stuff that we use a lot, to avoid calculating each time + */ +static void ipath_7220_tidtemplate(struct ipath_devdata *dd) +{ + /* For now, we always allocate 4KB buffers (at init) so we can + * receive max size packets. We may want a module parameter to + * specify 2KB or 4KB and/or make be per port instead of per device + * for those who want to reduce memory footprint. Note that the + * ipath_rcvhdrentsize size must be large enough to hold the largest + * IB header (currently 96 bytes) that we expect to handle (plus of + * course the 2 dwords of RHF). + */ + if (dd->ipath_rcvegrbufsize == 2048) + dd->ipath_tidtemplate = IBA7220_TID_SZ_2K; + else if (dd->ipath_rcvegrbufsize == 4096) + dd->ipath_tidtemplate = IBA7220_TID_SZ_4K; + else { + dev_info(&dd->pcidev->dev, "BUG: unsupported egrbufsize " + "%u, using %u\n", dd->ipath_rcvegrbufsize, + 4096); + dd->ipath_tidtemplate = IBA7220_TID_SZ_4K; + } + dd->ipath_tidinvalid = 0; +} + +static int ipath_7220_early_init(struct ipath_devdata *dd) +{ + u32 i, s; + + if (strcmp(int_type, "auto") && + strcmp(int_type, "force_msi") && + strcmp(int_type, "force_intx")) { + ipath_dev_err(dd, "Invalid interrupt_type: '%s', expecting " + "auto, force_msi or force_intx\n", int_type); + return -EINVAL; + } + + /* + * Control[4] has been added to change the arbitration within + * the SDMA engine between favoring data fetches over descriptor + * fetches. ipath_sdma_fetch_arb==0 gives data fetches priority. + */ + if (ipath_sdma_fetch_arb && (dd->ipath_minrev > 1)) + dd->ipath_control |= 1<<4; + + dd->ipath_flags |= IPATH_4BYTE_TID; + + /* + * For openfabrics, we need to be able to handle an IB header of + * 24 dwords. HT chip has arbitrary sized receive buffers, so we + * made them the same size as the PIO buffers. This chip does not + * handle arbitrary size buffers, so we need the header large enough + * to handle largest IB header, but still have room for a 2KB MTU + * standard IB packet. + */ + dd->ipath_rcvhdrentsize = 24; + dd->ipath_rcvhdrsize = IPATH_DFLT_RCVHDRSIZE; + dd->ipath_rhf_offset = + dd->ipath_rcvhdrentsize - sizeof(u64) / sizeof(u32); + + dd->ipath_rcvegrbufsize = ipath_mtu4096 ? 4096 : 2048; + /* + * the min() check here is currently a nop, but it may not always + * be, depending on just how we do ipath_rcvegrbufsize + */ + dd->ipath_ibmaxlen = min(ipath_mtu4096 ? dd->ipath_piosize4k : + dd->ipath_piosize2k, + dd->ipath_rcvegrbufsize + + (dd->ipath_rcvhdrentsize << 2)); + dd->ipath_init_ibmaxlen = dd->ipath_ibmaxlen; + + ipath_7220_config_jint(dd, INFINIPATH_JINT_DEFAULT_IDLE_TICKS, + INFINIPATH_JINT_DEFAULT_MAX_PACKETS); + + if (dd->ipath_boardrev) /* no eeprom on emulator */ + ipath_get_eeprom_info(dd); + + /* start of code to check and print procmon */ + s = ipath_read_kreg32(dd, IPATH_KREG_OFFSET(ProcMon)); + s &= ~(1U<<31); /* clear done bit */ + s |= 1U<<14; /* clear counter (write 1 to clear) */ + ipath_write_kreg(dd, IPATH_KREG_OFFSET(ProcMon), s); + /* make sure clear_counter low long enough before start */ + ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); + ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); + + s &= ~(1U<<14); /* allow counter to count (before starting) */ + ipath_write_kreg(dd, IPATH_KREG_OFFSET(ProcMon), s); + ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); + ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); + s = ipath_read_kreg32(dd, IPATH_KREG_OFFSET(ProcMon)); + + s |= 1U<<15; /* start the counter */ + s &= ~(1U<<31); /* clear done bit */ + s &= ~0x7ffU; /* clear frequency bits */ + s |= 0xe29; /* set frequency bits, in case cleared */ + ipath_write_kreg(dd, IPATH_KREG_OFFSET(ProcMon), s); + + s = 0; + for (i = 500; i > 0 && !(s&(1ULL<<31)); i--) { + ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); + s = ipath_read_kreg32(dd, IPATH_KREG_OFFSET(ProcMon)); + } + if (!(s&(1U<<31))) + ipath_dev_err(dd, "ProcMon register not valid: 0x%x\n", s); + else + ipath_dbg("ProcMon=0x%x, count=0x%x\n", s, (s>>16)&0x1ff); + + return 0; +} + +/** + * ipath_init_7220_get_base_info - set chip-specific flags for user code + * @pd: the infinipath port + * @kbase: ipath_base_info pointer + * + * We set the PCIE flag because the lower bandwidth on PCIe vs + * HyperTransport can affect some user packet algorithims. + */ +static int ipath_7220_get_base_info(struct ipath_portdata *pd, void *kbase) +{ + struct ipath_base_info *kinfo = kbase; + + kinfo->spi_runtime_flags |= + IPATH_RUNTIME_PCIE | IPATH_RUNTIME_NODMA_RTAIL | + IPATH_RUNTIME_SDMA; + + return 0; +} + +static void ipath_7220_free_irq(struct ipath_devdata *dd) +{ + free_irq(dd->ipath_irq, dd); + dd->ipath_irq = 0; +} + +static struct ipath_message_header * +ipath_7220_get_msgheader(struct ipath_devdata *dd, __le32 *rhf_addr) +{ + u32 offset = ipath_hdrget_offset(rhf_addr); + + return (struct ipath_message_header *) + (rhf_addr - dd->ipath_rhf_offset + offset); +} + +static void ipath_7220_config_ports(struct ipath_devdata *dd, ushort cfgports) +{ + u32 nchipports; + + nchipports = ipath_read_kreg32(dd, dd->ipath_kregs->kr_portcnt); + if (!cfgports) { + int ncpus = num_online_cpus(); + + if (ncpus <= 4) + dd->ipath_portcnt = 5; + else if (ncpus <= 8) + dd->ipath_portcnt = 9; + if (dd->ipath_portcnt) + ipath_dbg("Auto-configured for %u ports, %d cpus " + "online\n", dd->ipath_portcnt, ncpus); + } else if (cfgports <= nchipports) + dd->ipath_portcnt = cfgports; + if (!dd->ipath_portcnt) /* none of the above, set to max */ + dd->ipath_portcnt = nchipports; + /* + * chip can be configured for 5, 9, or 17 ports, and choice + * affects number of eager TIDs per port (1K, 2K, 4K). + */ + if (dd->ipath_portcnt > 9) + dd->ipath_rcvctrl |= 2ULL << IBA7220_R_PORTCFG_SHIFT; + else if (dd->ipath_portcnt > 5) + dd->ipath_rcvctrl |= 1ULL << IBA7220_R_PORTCFG_SHIFT; + /* else configure for default 5 receive ports */ + ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, + dd->ipath_rcvctrl); + dd->ipath_p0_rcvegrcnt = 2048; /* always */ + if (dd->ipath_flags & IPATH_HAS_SEND_DMA) + dd->ipath_pioreserved = 1; /* reserve a buffer */ +} + + +static int ipath_7220_get_ib_cfg(struct ipath_devdata *dd, int which) +{ + int lsb, ret = 0; + u64 maskr; /* right-justified mask */ + + switch (which) { + case IPATH_IB_CFG_HRTBT: /* Get Heartbeat off/enable/auto */ + lsb = IBA7220_IBC_HRTBT_SHIFT; + maskr = IBA7220_IBC_HRTBT_MASK; + break; + + case IPATH_IB_CFG_LWID_ENB: /* Get allowed Link-width */ + ret = dd->ipath_link_width_enabled; + goto done; + + case IPATH_IB_CFG_LWID: /* Get currently active Link-width */ + ret = dd->ipath_link_width_active; + goto done; + + case IPATH_IB_CFG_SPD_ENB: /* Get allowed Link speeds */ + ret = dd->ipath_link_speed_enabled; + goto done; + + case IPATH_IB_CFG_SPD: /* Get current Link spd */ + ret = dd->ipath_link_speed_active; + goto done; + + case IPATH_IB_CFG_RXPOL_ENB: /* Get Auto-RX-polarity enable */ + lsb = IBA7220_IBC_RXPOL_SHIFT; + maskr = IBA7220_IBC_RXPOL_MASK; + break; + + case IPATH_IB_CFG_LREV_ENB: /* Get Auto-Lane-reversal enable */ + lsb = IBA7220_IBC_LREV_SHIFT; + maskr = IBA7220_IBC_LREV_MASK; + break; + + case IPATH_IB_CFG_LINKLATENCY: + ret = ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibcddrstatus) + & IBA7220_DDRSTAT_LINKLAT_MASK; + goto done; + + default: + ret = -ENOTSUPP; + goto done; + } + ret = (int)((dd->ipath_ibcddrctrl >> lsb) & maskr); +done: + return ret; +} + +static int ipath_7220_set_ib_cfg(struct ipath_devdata *dd, int which, u32 val) +{ + int lsb, ret = 0, setforce = 0; + u64 maskr; /* right-justified mask */ + + switch (which) { + case IPATH_IB_CFG_LIDLMC: + /* + * Set LID and LMC. Combined to avoid possible hazard + * caller puts LMC in 16MSbits, DLID in 16LSbits of val + */ + lsb = IBA7220_IBC_DLIDLMC_SHIFT; + maskr = IBA7220_IBC_DLIDLMC_MASK; + break; + + case IPATH_IB_CFG_HRTBT: /* set Heartbeat off/enable/auto */ + if (val & IPATH_IB_HRTBT_ON && + (dd->ipath_flags & IPATH_NO_HRTBT)) + goto bail; + lsb = IBA7220_IBC_HRTBT_SHIFT; + maskr = IBA7220_IBC_HRTBT_MASK; + break; + + case IPATH_IB_CFG_LWID_ENB: /* set allowed Link-width */ + /* + * As with speed, only write the actual register if + * the link is currently down, otherwise takes effect + * on next link change. + */ + dd->ipath_link_width_enabled = val; + if ((dd->ipath_flags & (IPATH_LINKDOWN|IPATH_LINKINIT)) != + IPATH_LINKDOWN) + goto bail; + /* + * We set the IPATH_IB_FORCE_NOTIFY bit so updown + * will get called because we want update + * link_width_active, and the change may not take + * effect for some time (if we are in POLL), so this + * flag will force the updown routine to be called + * on the next ibstatuschange down interrupt, even + * if it's not an down->up transition. + */ + val--; /* convert from IB to chip */ + maskr = IBA7220_IBC_WIDTH_MASK; + lsb = IBA7220_IBC_WIDTH_SHIFT; + setforce = 1; + dd->ipath_flags |= IPATH_IB_FORCE_NOTIFY; + break; + + case IPATH_IB_CFG_SPD_ENB: /* set allowed Link speeds */ + /* + * If we turn off IB1.2, need to preset SerDes defaults, + * but not right now. Set a flag for the next time + * we command the link down. As with width, only write the + * actual register if the link is currently down, otherwise + * takes effect on next link change. Since setting is being + * explictly requested (via MAD or sysfs), clear autoneg + * failure status if speed autoneg is enabled. + */ + dd->ipath_link_speed_enabled = val; + if (dd->ipath_ibcddrctrl & IBA7220_IBC_IBTA_1_2_MASK && + !(val & (val - 1))) + dd->ipath_presets_needed = 1; + if ((dd->ipath_flags & (IPATH_LINKDOWN|IPATH_LINKINIT)) != + IPATH_LINKDOWN) + goto bail; + /* + * We set the IPATH_IB_FORCE_NOTIFY bit so updown + * will get called because we want update + * link_speed_active, and the change may not take + * effect for some time (if we are in POLL), so this + * flag will force the updown routine to be called + * on the next ibstatuschange down interrupt, even + * if it's not an down->up transition. When setting + * speed autoneg, clear AUTONEG_FAILED. + */ + if (val == (IPATH_IB_SDR | IPATH_IB_DDR)) { + val = IBA7220_IBC_SPEED_AUTONEG_MASK | + IBA7220_IBC_IBTA_1_2_MASK; + dd->ipath_flags &= ~IPATH_IB_AUTONEG_FAILED; + } else + val = val == IPATH_IB_DDR ? IBA7220_IBC_SPEED_DDR + : IBA7220_IBC_SPEED_SDR; + maskr = IBA7220_IBC_SPEED_AUTONEG_MASK | + IBA7220_IBC_IBTA_1_2_MASK; + lsb = 0; /* speed bits are low bits */ + setforce = 1; + break; + + case IPATH_IB_CFG_RXPOL_ENB: /* set Auto-RX-polarity enable */ + lsb = IBA7220_IBC_RXPOL_SHIFT; + maskr = IBA7220_IBC_RXPOL_MASK; + break; + + case IPATH_IB_CFG_LREV_ENB: /* set Auto-Lane-reversal enable */ + lsb = IBA7220_IBC_LREV_SHIFT; + maskr = IBA7220_IBC_LREV_MASK; + break; + + default: + ret = -ENOTSUPP; + goto bail; + } + dd->ipath_ibcddrctrl &= ~(maskr << lsb); + dd->ipath_ibcddrctrl |= (((u64) val & maskr) << lsb); + ipath_write_kreg(dd, dd->ipath_kregs->kr_ibcddrctrl, + dd->ipath_ibcddrctrl); + if (setforce) + dd->ipath_flags |= IPATH_IB_FORCE_NOTIFY; +bail: + return ret; +} + +static void ipath_7220_read_counters(struct ipath_devdata *dd, + struct infinipath_counters *cntrs) +{ + u64 *counters = (u64 *) cntrs; + int i; + + for (i = 0; i < sizeof(*cntrs) / sizeof(u64); i++) + counters[i] = ipath_snap_cntr(dd, i); +} + +/* if we are using MSI, try to fallback to IntX */ +static int ipath_7220_intr_fallback(struct ipath_devdata *dd) +{ + if (dd->ipath_msi_lo) { + dev_info(&dd->pcidev->dev, "MSI interrupt not detected," + " trying IntX interrupts\n"); + ipath_7220_nomsi(dd); + ipath_enable_intx(dd->pcidev); + /* + * some newer kernels require free_irq before disable_msi, + * and irq can be changed during disable and intx enable + * and we need to therefore use the pcidev->irq value, + * not our saved MSI value. + */ + dd->ipath_irq = dd->pcidev->irq; + if (request_irq(dd->ipath_irq, ipath_intr, IRQF_SHARED, + IPATH_DRV_NAME, dd)) + ipath_dev_err(dd, + "Could not re-request_irq for IntX\n"); + return 1; + } + return 0; +} + +/* + * reset the XGXS (between serdes and IBC). Slightly less intrusive + * than resetting the IBC or external link state, and useful in some + * cases to cause some retraining. To do this right, we reset IBC + * as well. + */ +static void ipath_7220_xgxs_reset(struct ipath_devdata *dd) +{ + u64 val, prev_val; + + prev_val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_xgxsconfig); + val = prev_val | INFINIPATH_XGXS_RESET; + prev_val &= ~INFINIPATH_XGXS_RESET; /* be sure */ + ipath_write_kreg(dd, dd->ipath_kregs->kr_control, + dd->ipath_control & ~INFINIPATH_C_LINKENABLE); + ipath_write_kreg(dd, dd->ipath_kregs->kr_xgxsconfig, val); + ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); + ipath_write_kreg(dd, dd->ipath_kregs->kr_xgxsconfig, prev_val); + ipath_write_kreg(dd, dd->ipath_kregs->kr_control, + dd->ipath_control); +} + + +/* Still needs cleanup, too much hardwired stuff */ +static void autoneg_send(struct ipath_devdata *dd, + u32 *hdr, u32 dcnt, u32 *data) +{ + int i; + u64 cnt; + u32 __iomem *piobuf; + u32 pnum; + + i = 0; + cnt = 7 + dcnt + 1; /* 7 dword header, dword data, icrc */ + while (!(piobuf = ipath_getpiobuf(dd, cnt, &pnum))) { + if (i++ > 15) { + ipath_dbg("Couldn't get pio buffer for send\n"); + return; + } + udelay(2); + } + if (dd->ipath_flags&IPATH_HAS_PBC_CNT) + cnt |= 0x80000000UL<<32; /* mark as VL15 */ + writeq(cnt, piobuf); + ipath_flush_wc(); + __iowrite32_copy(piobuf + 2, hdr, 7); + __iowrite32_copy(piobuf + 9, data, dcnt); + ipath_flush_wc(); +} + +/* + * _start packet gets sent twice at start, _done gets sent twice at end + */ +static void ipath_autoneg_send(struct ipath_devdata *dd, int which) +{ + static u32 swapped; + u32 dw, i, hcnt, dcnt, *data; + static u32 hdr[7] = { 0xf002ffff, 0x48ffff, 0x6400abba }; + static u32 madpayload_start[0x40] = { + 0x1810103, 0x1, 0x0, 0x0, 0x2c90000, 0x2c9, 0x0, 0x0, + 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, + 0x1, 0x1388, 0x15e, 0x1, /* rest 0's */ + }; + static u32 madpayload_done[0x40] = { + 0x1810103, 0x1, 0x0, 0x0, 0x2c90000, 0x2c9, 0x0, 0x0, + 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, + 0x40000001, 0x1388, 0x15e, /* rest 0's */ + }; + dcnt = sizeof(madpayload_start)/sizeof(madpayload_start[0]); + hcnt = sizeof(hdr)/sizeof(hdr[0]); + if (!swapped) { + /* for maintainability, do it at runtime */ + for (i = 0; i < hcnt; i++) { + dw = (__force u32) cpu_to_be32(hdr[i]); + hdr[i] = dw; + } + for (i = 0; i < dcnt; i++) { + dw = (__force u32) cpu_to_be32(madpayload_start[i]); + madpayload_start[i] = dw; + dw = (__force u32) cpu_to_be32(madpayload_done[i]); + madpayload_done[i] = dw; + } + swapped = 1; + } + + data = which ? madpayload_done : madpayload_start; + ipath_cdbg(PKT, "Sending %s special MADs\n", which?"done":"start"); + + autoneg_send(dd, hdr, dcnt, data); + ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); + udelay(2); + autoneg_send(dd, hdr, dcnt, data); + ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); + udelay(2); +} + + + +/* + * Do the absolute minimum to cause an IB speed change, and make it + * ready, but don't actually trigger the change. The caller will + * do that when ready (if link is in Polling training state, it will + * happen immediately, otherwise when link next goes down) + * + * This routine should only be used as part of the DDR autonegotation + * code for devices that are not compliant with IB 1.2 (or code that + * fixes things up for same). + * + * When link has gone down, and autoneg enabled, or autoneg has + * failed and we give up until next time we set both speeds, and + * then we want IBTA enabled as well as "use max enabled speed. + */ +static void set_speed_fast(struct ipath_devdata *dd, u32 speed) +{ + dd->ipath_ibcddrctrl &= ~(IBA7220_IBC_SPEED_AUTONEG_MASK | + IBA7220_IBC_IBTA_1_2_MASK | + (IBA7220_IBC_WIDTH_MASK << IBA7220_IBC_WIDTH_SHIFT)); + + if (speed == (IPATH_IB_SDR | IPATH_IB_DDR)) + dd->ipath_ibcddrctrl |= IBA7220_IBC_SPEED_AUTONEG_MASK | + IBA7220_IBC_IBTA_1_2_MASK; + else + dd->ipath_ibcddrctrl |= speed == IPATH_IB_DDR ? + IBA7220_IBC_SPEED_DDR : IBA7220_IBC_SPEED_SDR; + + /* + * Convert from IB-style 1 = 1x, 2 = 4x, 3 = auto + * to chip-centric 0 = 1x, 1 = 4x, 2 = auto + */ + dd->ipath_ibcddrctrl |= (u64)(dd->ipath_link_width_enabled - 1) << + IBA7220_IBC_WIDTH_SHIFT; + ipath_write_kreg(dd, dd->ipath_kregs->kr_ibcddrctrl, + dd->ipath_ibcddrctrl); + ipath_cdbg(VERBOSE, "setup for IB speed (%x) done\n", speed); +} + + +/* + * this routine is only used when we are not talking to another + * IB 1.2-compliant device that we think can do DDR. + * (This includes all existing switch chips as of Oct 2007.) + * 1.2-compliant devices go directly to DDR prior to reaching INIT + */ +static void try_auto_neg(struct ipath_devdata *dd) +{ + /* + * required for older non-IB1.2 DDR switches. Newer + * non-IB-compliant switches don't need it, but so far, + * aren't bothered by it either. "Magic constant" + */ + ipath_write_kreg(dd, IPATH_KREG_OFFSET(IBNCModeCtrl), + 0x3b9dc07); + dd->ipath_flags |= IPATH_IB_AUTONEG_INPROG; + ipath_autoneg_send(dd, 0); + set_speed_fast(dd, IPATH_IB_DDR); + ipath_toggle_rclkrls(dd); + /* 2 msec is minimum length of a poll cycle */ + schedule_delayed_work(&dd->ipath_autoneg_work, + msecs_to_jiffies(2)); +} + + +static int ipath_7220_ib_updown(struct ipath_devdata *dd, int ibup, u64 ibcs) +{ + int ret = 0; + u32 ltstate = ipath_ib_linkstate(dd, ibcs); + + dd->ipath_link_width_active = + ((ibcs >> IBA7220_IBCS_LINKWIDTH_SHIFT) & 1) ? + IB_WIDTH_4X : IB_WIDTH_1X; + dd->ipath_link_speed_active = + ((ibcs >> IBA7220_IBCS_LINKSPEED_SHIFT) & 1) ? + IPATH_IB_DDR : IPATH_IB_SDR; + + if (!ibup) { + /* + * when link goes down we don't want aeq running, so it + * won't't interfere with IBC training, etc., and we need + * to go back to the static SerDes preset values + */ + if (dd->ipath_x1_fix_tries && + ltstate <= INFINIPATH_IBCS_LT_STATE_SLEEPQUIET && + ltstate != INFINIPATH_IBCS_LT_STATE_LINKUP) + dd->ipath_x1_fix_tries = 0; + if (!(dd->ipath_flags & (IPATH_IB_AUTONEG_FAILED | + IPATH_IB_AUTONEG_INPROG))) + set_speed_fast(dd, dd->ipath_link_speed_enabled); + if (!(dd->ipath_flags & IPATH_IB_AUTONEG_INPROG)) { + ipath_cdbg(VERBOSE, "Setting RXEQ defaults\n"); + ipath_sd7220_presets(dd); + } + /* this might better in ipath_sd7220_presets() */ + ipath_set_relock_poll(dd, ibup); + } else { + if (ipath_compat_ddr_negotiate && + !(dd->ipath_flags & (IPATH_IB_AUTONEG_FAILED | + IPATH_IB_AUTONEG_INPROG)) && + dd->ipath_link_speed_active == IPATH_IB_SDR && + (dd->ipath_link_speed_enabled & + (IPATH_IB_DDR | IPATH_IB_SDR)) == + (IPATH_IB_DDR | IPATH_IB_SDR) && + dd->ipath_autoneg_tries < IPATH_AUTONEG_TRIES) { + /* we are SDR, and DDR auto-negotiation enabled */ + ++dd->ipath_autoneg_tries; + ipath_dbg("DDR negotiation try, %u/%u\n", + dd->ipath_autoneg_tries, + IPATH_AUTONEG_TRIES); + try_auto_neg(dd); + ret = 1; /* no other IB status change processing */ + } else if ((dd->ipath_flags & IPATH_IB_AUTONEG_INPROG) + && dd->ipath_link_speed_active == IPATH_IB_SDR) { + ipath_autoneg_send(dd, 1); + set_speed_fast(dd, IPATH_IB_DDR); + udelay(2); + ipath_toggle_rclkrls(dd); + ret = 1; /* no other IB status change processing */ + } else { + if ((dd->ipath_flags & IPATH_IB_AUTONEG_INPROG) && + (dd->ipath_link_speed_active & IPATH_IB_DDR)) { + ipath_dbg("Got to INIT with DDR autoneg\n"); + dd->ipath_flags &= ~(IPATH_IB_AUTONEG_INPROG + | IPATH_IB_AUTONEG_FAILED); + dd->ipath_autoneg_tries = 0; + /* re-enable SDR, for next link down */ + set_speed_fast(dd, + dd->ipath_link_speed_enabled); + wake_up(&dd->ipath_autoneg_wait); + } else if (dd->ipath_flags & IPATH_IB_AUTONEG_FAILED) { + /* + * clear autoneg failure flag, and do setup + * so we'll try next time link goes down and + * back to INIT (possibly connected to different + * device). + */ + ipath_dbg("INIT %sDR after autoneg failure\n", + (dd->ipath_link_speed_active & + IPATH_IB_DDR) ? "D" : "S"); + dd->ipath_flags &= ~IPATH_IB_AUTONEG_FAILED; + dd->ipath_ibcddrctrl |= + IBA7220_IBC_IBTA_1_2_MASK; + ipath_write_kreg(dd, + IPATH_KREG_OFFSET(IBNCModeCtrl), 0); + } + } + /* + * if we are in 1X, and are in autoneg width, it + * could be due to an xgxs problem, so if we haven't + * already tried, try twice to get to 4X; if we + * tried, and couldn't, report it, since it will + * probably not be what is desired. + */ + if ((dd->ipath_link_width_enabled & (IB_WIDTH_1X | + IB_WIDTH_4X)) == (IB_WIDTH_1X | IB_WIDTH_4X) + && dd->ipath_link_width_active == IB_WIDTH_1X + && dd->ipath_x1_fix_tries < 3) { + if (++dd->ipath_x1_fix_tries == 3) + dev_info(&dd->pcidev->dev, + "IB link is in 1X mode\n"); + else { + ipath_cdbg(VERBOSE, "IB 1X in " + "auto-width, try %u to be " + "sure it's really 1X; " + "ltstate %u\n", + dd->ipath_x1_fix_tries, + ltstate); + dd->ipath_f_xgxs_reset(dd); + ret = 1; /* skip other processing */ + } + } + + if (!ret) { + dd->delay_mult = rate_to_delay + [(ibcs >> IBA7220_IBCS_LINKSPEED_SHIFT) & 1] + [(ibcs >> IBA7220_IBCS_LINKWIDTH_SHIFT) & 1]; + + ipath_set_relock_poll(dd, ibup); + } + } + + if (!ret) + ipath_setup_7220_setextled(dd, ipath_ib_linkstate(dd, ibcs), + ltstate); + return ret; +} + + +/* + * Handle the empirically determined mechanism for auto-negotiation + * of DDR speed with switches. + */ +static void autoneg_work(struct work_struct *work) +{ + struct ipath_devdata *dd; + u64 startms; + u32 lastlts, i; + + dd = container_of(work, struct ipath_devdata, + ipath_autoneg_work.work); + + startms = jiffies_to_msecs(jiffies); + + /* + * busy wait for this first part, it should be at most a + * few hundred usec, since we scheduled ourselves for 2msec. + */ + for (i = 0; i < 25; i++) { + lastlts = ipath_ib_linktrstate(dd, dd->ipath_lastibcstat); + if (lastlts == INFINIPATH_IBCS_LT_STATE_POLLQUIET) { + ipath_set_linkstate(dd, IPATH_IB_LINKDOWN_DISABLE); + break; + } + udelay(100); + } + + if (!(dd->ipath_flags & IPATH_IB_AUTONEG_INPROG)) + goto done; /* we got there early or told to stop */ + + /* we expect this to timeout */ + if (wait_event_timeout(dd->ipath_autoneg_wait, + !(dd->ipath_flags & IPATH_IB_AUTONEG_INPROG), + msecs_to_jiffies(90))) + goto done; + + ipath_toggle_rclkrls(dd); + + /* we expect this to timeout */ + if (wait_event_timeout(dd->ipath_autoneg_wait, + !(dd->ipath_flags & IPATH_IB_AUTONEG_INPROG), + msecs_to_jiffies(1700))) + goto done; + + set_speed_fast(dd, IPATH_IB_SDR); + ipath_toggle_rclkrls(dd); + + /* + * wait up to 250 msec for link to train and get to INIT at DDR; + * this should terminate early + */ + wait_event_timeout(dd->ipath_autoneg_wait, + !(dd->ipath_flags & IPATH_IB_AUTONEG_INPROG), + msecs_to_jiffies(250)); +done: + if (dd->ipath_flags & IPATH_IB_AUTONEG_INPROG) { + ipath_dbg("Did not get to DDR INIT (%x) after %Lu msecs\n", + ipath_ib_state(dd, dd->ipath_lastibcstat), + jiffies_to_msecs(jiffies)-startms); + dd->ipath_flags &= ~IPATH_IB_AUTONEG_INPROG; + if (dd->ipath_autoneg_tries == IPATH_AUTONEG_TRIES) { + dd->ipath_flags |= IPATH_IB_AUTONEG_FAILED; + ipath_dbg("Giving up on DDR until next IB " + "link Down\n"); + dd->ipath_autoneg_tries = 0; + } + set_speed_fast(dd, dd->ipath_link_speed_enabled); + } +} + + +/** + * ipath_init_iba7220_funcs - set up the chip-specific function pointers + * @dd: the infinipath device + * + * This is global, and is called directly at init to set up the + * chip-specific function pointers for later use. + */ +void ipath_init_iba7220_funcs(struct ipath_devdata *dd) +{ + dd->ipath_f_intrsetup = ipath_7220_intconfig; + dd->ipath_f_bus = ipath_setup_7220_config; + dd->ipath_f_reset = ipath_setup_7220_reset; + dd->ipath_f_get_boardname = ipath_7220_boardname; + dd->ipath_f_init_hwerrors = ipath_7220_init_hwerrors; + dd->ipath_f_early_init = ipath_7220_early_init; + dd->ipath_f_handle_hwerrors = ipath_7220_handle_hwerrors; + dd->ipath_f_quiet_serdes = ipath_7220_quiet_serdes; + dd->ipath_f_bringup_serdes = ipath_7220_bringup_serdes; + dd->ipath_f_clear_tids = ipath_7220_clear_tids; + dd->ipath_f_put_tid = ipath_7220_put_tid; + dd->ipath_f_cleanup = ipath_setup_7220_cleanup; + dd->ipath_f_setextled = ipath_setup_7220_setextled; + dd->ipath_f_get_base_info = ipath_7220_get_base_info; + dd->ipath_f_free_irq = ipath_7220_free_irq; + dd->ipath_f_tidtemplate = ipath_7220_tidtemplate; + dd->ipath_f_intr_fallback = ipath_7220_intr_fallback; + dd->ipath_f_xgxs_reset = ipath_7220_xgxs_reset; + dd->ipath_f_get_ib_cfg = ipath_7220_get_ib_cfg; + dd->ipath_f_set_ib_cfg = ipath_7220_set_ib_cfg; + dd->ipath_f_config_jint = ipath_7220_config_jint; + dd->ipath_f_config_ports = ipath_7220_config_ports; + dd->ipath_f_read_counters = ipath_7220_read_counters; + dd->ipath_f_get_msgheader = ipath_7220_get_msgheader; + dd->ipath_f_ib_updown = ipath_7220_ib_updown; + + /* initialize chip-specific variables */ + ipath_init_7220_variables(dd); +} From ralph.campbell at qlogic.com Wed Apr 2 15:50:08 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:50:08 -0700 Subject: [ofa-general] [PATCH 13/20] IB/ipath -- support for SerDes portion of IBA7220 In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: <20080402225007.28598.50581.stgit@eng-46.mv.qlogic.com> From: Michael Albaugh The control and initialization of the SerDes blocks of the IBA7220 is sufficiently complex to merit a separate file. This is that file. Signed-off-by: Michael Albaugh --- drivers/infiniband/hw/ipath/ipath_sd7220.c | 1462 ++++++++++++++++++++++++++++ 1 files changed, 1462 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_sd7220.c b/drivers/infiniband/hw/ipath/ipath_sd7220.c new file mode 100644 index 0000000..aa47eb5 --- /dev/null +++ b/drivers/infiniband/hw/ipath/ipath_sd7220.c @@ -0,0 +1,1462 @@ +/* + * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved. + * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +/* + * This file contains all of the code that is specific to the SerDes + * on the InfiniPath 7220 chip. + */ + +#include +#include + +#include "ipath_kernel.h" +#include "ipath_registers.h" +#include "ipath_7220.h" + +/* + * The IBSerDesMappTable is a memory that holds values to be stored in + * various SerDes registers by IBC. It is not part of the normal kregs + * map and is used in exactly one place, hence the #define below. + */ +#define KR_IBSerDesMappTable (0x94000 / (sizeof(uint64_t))) + +/* + * Below used for sdnum parameter, selecting one of the two sections + * used for PCIe, or the single SerDes used for IB. + */ +#define PCIE_SERDES0 0 +#define PCIE_SERDES1 1 + +/* + * The EPB requires addressing in a particular form. EPB_LOC() is intended + * to make #definitions a little more readable. + */ +#define EPB_ADDR_SHF 8 +#define EPB_LOC(chn, elt, reg) \ + (((elt & 0xf) | ((chn & 7) << 4) | ((reg & 0x3f) << 9)) << \ + EPB_ADDR_SHF) +#define EPB_IB_QUAD0_CS_SHF (25) +#define EPB_IB_QUAD0_CS (1U << EPB_IB_QUAD0_CS_SHF) +#define EPB_IB_UC_CS_SHF (26) +#define EPB_PCIE_UC_CS_SHF (27) +#define EPB_GLOBAL_WR (1U << (EPB_ADDR_SHF + 8)) + +/* Forward declarations. */ +static int ipath_sd7220_reg_mod(struct ipath_devdata *dd, int sdnum, u32 loc, + u32 data, u32 mask); +static int ibsd_mod_allchnls(struct ipath_devdata *dd, int loc, int val, + int mask); +static int ipath_sd_trimdone_poll(struct ipath_devdata *dd); +static void ipath_sd_trimdone_monitor(struct ipath_devdata *dd, + const char *where); +static int ipath_sd_setvals(struct ipath_devdata *dd); +static int ipath_sd_early(struct ipath_devdata *dd); +static int ipath_sd_dactrim(struct ipath_devdata *dd); +/* Set the registers that IBC may muck with to their default "preset" values */ +int ipath_sd7220_presets(struct ipath_devdata *dd); +static int ipath_internal_presets(struct ipath_devdata *dd); +/* Tweak the register (CMUCTRL5) that contains the TRIMSELF controls */ +static int ipath_sd_trimself(struct ipath_devdata *dd, int val); +static int epb_access(struct ipath_devdata *dd, int sdnum, int claim); + +void ipath_set_relock_poll(struct ipath_devdata *dd, int ibup); + +/* + * Below keeps track of whether the "once per power-on" initialization has + * been done, because uC code Version 1.32.17 or higher allows the uC to + * be reset at will, and Automatic Equalization may require it. So the + * state of the reset "pin", as reflected in was_reset parameter to + * ipath_sd7220_init() is no longer valid. Instead, we check for the + * actual uC code having been loaded. + */ +static int ipath_ibsd_ucode_loaded(struct ipath_devdata *dd) +{ + if (!dd->serdes_first_init_done && (ipath_sd7220_ib_vfy(dd) > 0)) + dd->serdes_first_init_done = 1; + return dd->serdes_first_init_done; +} + +/* repeat #define for local use. "Real" #define is in ipath_iba7220.c */ +#define INFINIPATH_HWE_IB_UC_MEMORYPARITYERR 0x0000004000000000ULL +#define IB_MPREG5 (EPB_LOC(6, 0, 0xE) | (1L << EPB_IB_UC_CS_SHF)) +#define IB_MPREG6 (EPB_LOC(6, 0, 0xF) | (1U << EPB_IB_UC_CS_SHF)) +#define UC_PAR_CLR_D 8 +#define UC_PAR_CLR_M 0xC +#define IB_CTRL2(chn) (EPB_LOC(chn, 7, 3) | EPB_IB_QUAD0_CS) +#define START_EQ1(chan) EPB_LOC(chan, 7, 0x27) + +void ipath_sd7220_clr_ibpar(struct ipath_devdata *dd) +{ + int ret; + + /* clear, then re-enable parity errs */ + ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, IB_MPREG6, + UC_PAR_CLR_D, UC_PAR_CLR_M); + if (ret < 0) { + ipath_dev_err(dd, "Failed clearing IBSerDes Parity err\n"); + goto bail; + } + ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, IB_MPREG6, 0, + UC_PAR_CLR_M); + + ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); + udelay(4); + ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrclear, + INFINIPATH_HWE_IB_UC_MEMORYPARITYERR); + ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); +bail: + return; +} + +/* + * After a reset or other unusual event, the epb interface may need + * to be re-synchronized, between the host and the uC. + * returns <0 for failure to resync within IBSD_RESYNC_TRIES (not expected) + */ +#define IBSD_RESYNC_TRIES 3 +#define IB_PGUDP(chn) (EPB_LOC((chn), 2, 1) | EPB_IB_QUAD0_CS) +#define IB_CMUDONE(chn) (EPB_LOC((chn), 7, 0xF) | EPB_IB_QUAD0_CS) + +static int ipath_resync_ibepb(struct ipath_devdata *dd) +{ + int ret, pat, tries, chn; + u32 loc; + + ret = -1; + chn = 0; + for (tries = 0; tries < (4 * IBSD_RESYNC_TRIES); ++tries) { + loc = IB_PGUDP(chn); + ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, loc, 0, 0); + if (ret < 0) { + ipath_dev_err(dd, "Failed read in resync\n"); + continue; + } + if (ret != 0xF0 && ret != 0x55 && tries == 0) + ipath_dev_err(dd, "unexpected pattern in resync\n"); + pat = ret ^ 0xA5; /* alternate F0 and 55 */ + ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, loc, pat, 0xFF); + if (ret < 0) { + ipath_dev_err(dd, "Failed write in resync\n"); + continue; + } + ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, loc, 0, 0); + if (ret < 0) { + ipath_dev_err(dd, "Failed re-read in resync\n"); + continue; + } + if (ret != pat) { + ipath_dev_err(dd, "Failed compare1 in resync\n"); + continue; + } + loc = IB_CMUDONE(chn); + ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, loc, 0, 0); + if (ret < 0) { + ipath_dev_err(dd, "Failed CMUDONE rd in resync\n"); + continue; + } + if ((ret & 0x70) != ((chn << 4) | 0x40)) { + ipath_dev_err(dd, "Bad CMUDONE value %02X, chn %d\n", + ret, chn); + continue; + } + if (++chn == 4) + break; /* Success */ + } + ipath_cdbg(VERBOSE, "Resync in %d tries\n", tries); + return (ret > 0) ? 0 : ret; +} + +/* + * Localize the stuff that should be done to change IB uC reset + * returns <0 for errors. + */ +static int ipath_ibsd_reset(struct ipath_devdata *dd, int assert_rst) +{ + u64 rst_val; + int ret = 0; + unsigned long flags; + + rst_val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibserdesctrl); + if (assert_rst) { + /* + * Vendor recommends "interrupting" uC before reset, to + * minimize possible glitches. + */ + spin_lock_irqsave(&dd->ipath_sdepb_lock, flags); + epb_access(dd, IB_7220_SERDES, 1); + rst_val |= 1ULL; + /* Squelch possible parity error from _asserting_ reset */ + ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrmask, + dd->ipath_hwerrmask & + ~INFINIPATH_HWE_IB_UC_MEMORYPARITYERR); + ipath_write_kreg(dd, dd->ipath_kregs->kr_ibserdesctrl, rst_val); + /* flush write, delay to ensure it took effect */ + ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); + udelay(2); + /* once it's reset, can remove interrupt */ + epb_access(dd, IB_7220_SERDES, -1); + spin_unlock_irqrestore(&dd->ipath_sdepb_lock, flags); + } else { + /* + * Before we de-assert reset, we need to deal with + * possible glitch on the Parity-error line. + * Suppress it around the reset, both in chip-level + * hwerrmask and in IB uC control reg. uC will allow + * it again during startup. + */ + u64 val; + rst_val &= ~(1ULL); + ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrmask, + dd->ipath_hwerrmask & + ~INFINIPATH_HWE_IB_UC_MEMORYPARITYERR); + + ret = ipath_resync_ibepb(dd); + if (ret < 0) + ipath_dev_err(dd, "unable to re-sync IB EPB\n"); + + /* set uC control regs to suppress parity errs */ + ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, IB_MPREG5, 1, 1); + if (ret < 0) + goto bail; + /* IB uC code past Version 1.32.17 allow suppression of wdog */ + ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, IB_MPREG6, 0x80, + 0x80); + if (ret < 0) { + ipath_dev_err(dd, "Failed to set WDOG disable\n"); + goto bail; + } + ipath_write_kreg(dd, dd->ipath_kregs->kr_ibserdesctrl, rst_val); + /* flush write, delay for startup */ + ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); + udelay(1); + /* clear, then re-enable parity errs */ + ipath_sd7220_clr_ibpar(dd); + val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_hwerrstatus); + if (val & INFINIPATH_HWE_IB_UC_MEMORYPARITYERR) { + ipath_dev_err(dd, "IBUC Parity still set after RST\n"); + dd->ipath_hwerrmask &= + ~INFINIPATH_HWE_IB_UC_MEMORYPARITYERR; + } + ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrmask, + dd->ipath_hwerrmask); + } + +bail: + return ret; +} + +static void ipath_sd_trimdone_monitor(struct ipath_devdata *dd, + const char *where) +{ + int ret, chn, baduns; + u64 val; + + if (!where) + where = "?"; + + /* give time for reset to settle out in EPB */ + udelay(2); + + ret = ipath_resync_ibepb(dd); + if (ret < 0) + ipath_dev_err(dd, "not able to re-sync IB EPB (%s)\n", where); + + /* Do "sacrificial read" to get EPB in sane state after reset */ + ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, IB_CTRL2(0), 0, 0); + if (ret < 0) + ipath_dev_err(dd, "Failed TRIMDONE 1st read, (%s)\n", where); + + /* Check/show "summary" Trim-done bit in IBCStatus */ + val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibcstatus); + if (val & (1ULL << 11)) + ipath_cdbg(VERBOSE, "IBCS TRIMDONE set (%s)\n", where); + else + ipath_dev_err(dd, "IBCS TRIMDONE clear (%s)\n", where); + + udelay(2); + + ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, IB_MPREG6, 0x80, 0x80); + if (ret < 0) + ipath_dev_err(dd, "Failed Dummy RMW, (%s)\n", where); + udelay(10); + + baduns = 0; + + for (chn = 3; chn >= 0; --chn) { + /* Read CTRL reg for each channel to check TRIMDONE */ + ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, + IB_CTRL2(chn), 0, 0); + if (ret < 0) + ipath_dev_err(dd, "Failed checking TRIMDONE, chn %d" + " (%s)\n", chn, where); + + if (!(ret & 0x10)) { + int probe; + baduns |= (1 << chn); + ipath_dev_err(dd, "TRIMDONE cleared on chn %d (%02X)." + " (%s)\n", chn, ret, where); + probe = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, + IB_PGUDP(0), 0, 0); + ipath_dev_err(dd, "probe is %d (%02X)\n", + probe, probe); + probe = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, + IB_CTRL2(chn), 0, 0); + ipath_dev_err(dd, "re-read: %d (%02X)\n", + probe, probe); + ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, + IB_CTRL2(chn), 0x10, 0x10); + if (ret < 0) + ipath_dev_err(dd, + "Err on TRIMDONE rewrite1\n"); + } + } + for (chn = 3; chn >= 0; --chn) { + /* Read CTRL reg for each channel to check TRIMDONE */ + if (baduns & (1 << chn)) { + ipath_dev_err(dd, + "Reseting TRIMDONE on chn %d (%s)\n", + chn, where); + ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, + IB_CTRL2(chn), 0x10, 0x10); + if (ret < 0) + ipath_dev_err(dd, "Failed re-setting " + "TRIMDONE, chn %d (%s)\n", + chn, where); + } + } +} + +/* + * Below is portion of IBA7220-specific bringup_serdes() that actually + * deals with registers and memory within the SerDes itself. + * Post IB uC code version 1.32.17, was_reset being 1 is not really + * informative, so we double-check. + */ +int ipath_sd7220_init(struct ipath_devdata *dd, int was_reset) +{ + int ret = 1; /* default to failure */ + int first_reset; + int val_stat; + + if (!was_reset) { + /* entered with reset not asserted, we need to do it */ + ipath_ibsd_reset(dd, 1); + ipath_sd_trimdone_monitor(dd, "Driver-reload"); + } + + /* Substitute our deduced value for was_reset */ + ret = ipath_ibsd_ucode_loaded(dd); + if (ret < 0) { + ret = 1; + goto done; + } + first_reset = !ret; /* First reset if IBSD uCode not yet loaded */ + + /* + * Alter some regs per vendor latest doc, reset-defaults + * are not right for IB. + */ + ret = ipath_sd_early(dd); + if (ret < 0) { + ipath_dev_err(dd, "Failed to set IB SERDES early defaults\n"); + ret = 1; + goto done; + } + + /* + * Set DAC manual trim IB. + * We only do this once after chip has been reset (usually + * same as once per system boot). + */ + if (first_reset) { + ret = ipath_sd_dactrim(dd); + if (ret < 0) { + ipath_dev_err(dd, "Failed IB SERDES DAC trim\n"); + ret = 1; + goto done; + } + } + + /* + * Set various registers (DDS and RXEQ) that will be + * controlled by IBC (in 1.2 mode) to reasonable preset values + * Calling the "internal" version avoids the "check for needed" + * and "trimdone monitor" that might be counter-productive. + */ + ret = ipath_internal_presets(dd); + if (ret < 0) { + ipath_dev_err(dd, "Failed to set IB SERDES presets\n"); + ret = 1; + goto done; + } + ret = ipath_sd_trimself(dd, 0x80); + if (ret < 0) { + ipath_dev_err(dd, "Failed to set IB SERDES TRIMSELF\n"); + ret = 1; + goto done; + } + + /* Load image, then try to verify */ + ret = 0; /* Assume success */ + if (first_reset) { + int vfy; + int trim_done; + ipath_dbg("SerDes uC was reset, reloading PRAM\n"); + ret = ipath_sd7220_ib_load(dd); + if (ret < 0) { + ipath_dev_err(dd, "Failed to load IB SERDES image\n"); + ret = 1; + goto done; + } + + /* Loaded image, try to verify */ + vfy = ipath_sd7220_ib_vfy(dd); + if (vfy != ret) { + ipath_dev_err(dd, "SERDES PRAM VFY failed\n"); + ret = 1; + goto done; + } + /* + * Loaded and verified. Almost good... + * hold "success" in ret + */ + ret = 0; + + /* + * Prev steps all worked, continue bringup + * De-assert RESET to uC, only in first reset, to allow + * trimming. + * + * Since our default setup sets START_EQ1 to + * PRESET, we need to clear that for this very first run. + */ + ret = ibsd_mod_allchnls(dd, START_EQ1(0), 0, 0x38); + if (ret < 0) { + ipath_dev_err(dd, "Failed clearing START_EQ1\n"); + ret = 1; + goto done; + } + + ipath_ibsd_reset(dd, 0); + /* + * If this is not the first reset, trimdone should be set + * already. + */ + trim_done = ipath_sd_trimdone_poll(dd); + /* + * Whether or not trimdone succeeded, we need to put the + * uC back into reset to avoid a possible fight with the + * IBC state-machine. + */ + ipath_ibsd_reset(dd, 1); + + if (!trim_done) { + ipath_dev_err(dd, "No TRIMDONE seen\n"); + ret = 1; + goto done; + } + + ipath_sd_trimdone_monitor(dd, "First-reset"); + /* Remember so we do not re-do the load, dactrim, etc. */ + dd->serdes_first_init_done = 1; + } + /* + * Setup for channel training and load values for + * RxEq and DDS in tables used by IBC in IB1.2 mode + */ + + val_stat = ipath_sd_setvals(dd); + if (val_stat < 0) + ret = 1; +done: + /* start relock timer regardless, but start at 1 second */ + ipath_set_relock_poll(dd, -1); + return ret; +} + +#define EPB_ACC_REQ 1 +#define EPB_ACC_GNT 0x100 +#define EPB_DATA_MASK 0xFF +#define EPB_RD (1ULL << 24) +#define EPB_TRANS_RDY (1ULL << 31) +#define EPB_TRANS_ERR (1ULL << 30) +#define EPB_TRANS_TRIES 5 + +/* + * query, claim, release ownership of the EPB (External Parallel Bus) + * for a specified SERDES. + * the "claim" parameter is >0 to claim, <0 to release, 0 to query. + * Returns <0 for errors, >0 if we had ownership, else 0. + */ +static int epb_access(struct ipath_devdata *dd, int sdnum, int claim) +{ + u16 acc; + u64 accval; + int owned = 0; + u64 oct_sel = 0; + + switch (sdnum) { + case IB_7220_SERDES : + /* + * The IB SERDES "ownership" is fairly simple. A single each + * request/grant. + */ + acc = dd->ipath_kregs->kr_ib_epbacc; + break; + case PCIE_SERDES0 : + case PCIE_SERDES1 : + /* PCIe SERDES has two "octants", need to select which */ + acc = dd->ipath_kregs->kr_pcie_epbacc; + oct_sel = (2 << (sdnum - PCIE_SERDES0)); + break; + default : + return 0; + } + + /* Make sure any outstanding transaction was seen */ + ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); + udelay(15); + + accval = ipath_read_kreg32(dd, acc); + + owned = !!(accval & EPB_ACC_GNT); + if (claim < 0) { + /* Need to release */ + u64 pollval; + /* + * The only writeable bits are the request and CS. + * Both should be clear + */ + u64 newval = 0; + ipath_write_kreg(dd, acc, newval); + /* First read after write is not trustworthy */ + pollval = ipath_read_kreg32(dd, acc); + udelay(5); + pollval = ipath_read_kreg32(dd, acc); + if (pollval & EPB_ACC_GNT) + owned = -1; + } else if (claim > 0) { + /* Need to claim */ + u64 pollval; + u64 newval = EPB_ACC_REQ | oct_sel; + ipath_write_kreg(dd, acc, newval); + /* First read after write is not trustworthy */ + pollval = ipath_read_kreg32(dd, acc); + udelay(5); + pollval = ipath_read_kreg32(dd, acc); + if (!(pollval & EPB_ACC_GNT)) + owned = -1; + } + return owned; +} + +/* + * Lemma to deal with race condition of write..read to epb regs + */ +static int epb_trans(struct ipath_devdata *dd, u16 reg, u64 i_val, u64 *o_vp) +{ + int tries; + u64 transval; + + + ipath_write_kreg(dd, reg, i_val); + /* Throw away first read, as RDY bit may be stale */ + transval = ipath_read_kreg64(dd, reg); + + for (tries = EPB_TRANS_TRIES; tries; --tries) { + transval = ipath_read_kreg32(dd, reg); + if (transval & EPB_TRANS_RDY) + break; + udelay(5); + } + if (transval & EPB_TRANS_ERR) + return -1; + if (tries > 0 && o_vp) + *o_vp = transval; + return tries; +} + +/** + * + * ipath_sd7220_reg_mod - modify SERDES register + * @dd: the infinipath device + * @sdnum: which SERDES to access + * @loc: location - channel, element, register, as packed by EPB_LOC() macro. + * @wd: Write Data - value to set in register + * @mask: ones where data should be spliced into reg. + * + * Basic register read/modify/write, with un-needed acesses elided. That is, + * a mask of zero will prevent write, while a mask of 0xFF will prevent read. + * returns current (presumed, if a write was done) contents of selected + * register, or <0 if errors. + */ +static int ipath_sd7220_reg_mod(struct ipath_devdata *dd, int sdnum, u32 loc, + u32 wd, u32 mask) +{ + u16 trans; + u64 transval; + int owned; + int tries, ret; + unsigned long flags; + + switch (sdnum) { + case IB_7220_SERDES : + trans = dd->ipath_kregs->kr_ib_epbtrans; + break; + case PCIE_SERDES0 : + case PCIE_SERDES1 : + trans = dd->ipath_kregs->kr_pcie_epbtrans; + break; + default : + return -1; + } + + /* + * All access is locked in software (vs other host threads) and + * hardware (vs uC access). + */ + spin_lock_irqsave(&dd->ipath_sdepb_lock, flags); + + owned = epb_access(dd, sdnum, 1); + if (owned < 0) { + spin_unlock_irqrestore(&dd->ipath_sdepb_lock, flags); + return -1; + } + ret = 0; + for (tries = EPB_TRANS_TRIES; tries; --tries) { + transval = ipath_read_kreg32(dd, trans); + if (transval & EPB_TRANS_RDY) + break; + udelay(5); + } + + if (tries > 0) { + tries = 1; /* to make read-skip work */ + if (mask != 0xFF) { + /* + * Not a pure write, so need to read. + * loc encodes chip-select as well as address + */ + transval = loc | EPB_RD; + tries = epb_trans(dd, trans, transval, &transval); + } + if (tries > 0 && mask != 0) { + /* + * Not a pure read, so need to write. + */ + wd = (wd & mask) | (transval & ~mask); + transval = loc | (wd & EPB_DATA_MASK); + tries = epb_trans(dd, trans, transval, &transval); + } + } + /* else, failed to see ready, what error-handling? */ + + /* + * Release bus. Failure is an error. + */ + if (epb_access(dd, sdnum, -1) < 0) + ret = -1; + else + ret = transval & EPB_DATA_MASK; + + spin_unlock_irqrestore(&dd->ipath_sdepb_lock, flags); + if (tries <= 0) + ret = -1; + return ret; +} + +#define EPB_ROM_R (2) +#define EPB_ROM_W (1) +/* + * Below, all uC-related, use appropriate UC_CS, depending + * on which SerDes is used. + */ +#define EPB_UC_CTL EPB_LOC(6, 0, 0) +#define EPB_MADDRL EPB_LOC(6, 0, 2) +#define EPB_MADDRH EPB_LOC(6, 0, 3) +#define EPB_ROMDATA EPB_LOC(6, 0, 4) +#define EPB_RAMDATA EPB_LOC(6, 0, 5) + +/* Transfer date to/from uC Program RAM of IB or PCIe SerDes */ +static int ipath_sd7220_ram_xfer(struct ipath_devdata *dd, int sdnum, u32 loc, + u8 *buf, int cnt, int rd_notwr) +{ + u16 trans; + u64 transval; + u64 csbit; + int owned; + int tries; + int sofar; + int addr; + int ret; + unsigned long flags; + const char *op; + + /* Pick appropriate transaction reg and "Chip select" for this serdes */ + switch (sdnum) { + case IB_7220_SERDES : + csbit = 1ULL << EPB_IB_UC_CS_SHF; + trans = dd->ipath_kregs->kr_ib_epbtrans; + break; + case PCIE_SERDES0 : + case PCIE_SERDES1 : + /* PCIe SERDES has uC "chip select" in different bit, too */ + csbit = 1ULL << EPB_PCIE_UC_CS_SHF; + trans = dd->ipath_kregs->kr_pcie_epbtrans; + break; + default : + return -1; + } + + op = rd_notwr ? "Rd" : "Wr"; + spin_lock_irqsave(&dd->ipath_sdepb_lock, flags); + + owned = epb_access(dd, sdnum, 1); + if (owned < 0) { + spin_unlock_irqrestore(&dd->ipath_sdepb_lock, flags); + ipath_dbg("Could not get %s access to %s EPB: %X, loc %X\n", + op, (sdnum == IB_7220_SERDES) ? "IB" : "PCIe", + owned, loc); + return -1; + } + + /* + * In future code, we may need to distinguish several address ranges, + * and select various memories based on this. For now, just trim + * "loc" (location including address and memory select) to + * "addr" (address within memory). we will only support PRAM + * The memory is 8KB. + */ + addr = loc & 0x1FFF; + for (tries = EPB_TRANS_TRIES; tries; --tries) { + transval = ipath_read_kreg32(dd, trans); + if (transval & EPB_TRANS_RDY) + break; + udelay(5); + } + + sofar = 0; + if (tries <= 0) + ipath_dbg("No initial RDY on EPB access request\n"); + else { + /* + * Every "memory" access is doubly-indirect. + * We set two bytes of address, then read/write + * one or mores bytes of data. + */ + + /* First, we set control to "Read" or "Write" */ + transval = csbit | EPB_UC_CTL | + (rd_notwr ? EPB_ROM_R : EPB_ROM_W); + tries = epb_trans(dd, trans, transval, &transval); + if (tries <= 0) + ipath_dbg("No EPB response to uC %s cmd\n", op); + while (tries > 0 && sofar < cnt) { + if (!sofar) { + /* Only set address at start of chunk */ + int addrbyte = (addr + sofar) >> 8; + transval = csbit | EPB_MADDRH | addrbyte; + tries = epb_trans(dd, trans, transval, + &transval); + if (tries <= 0) { + ipath_dbg("No EPB response ADDRH\n"); + break; + } + addrbyte = (addr + sofar) & 0xFF; + transval = csbit | EPB_MADDRL | addrbyte; + tries = epb_trans(dd, trans, transval, + &transval); + if (tries <= 0) { + ipath_dbg("No EPB response ADDRL\n"); + break; + } + } + + if (rd_notwr) + transval = csbit | EPB_ROMDATA | EPB_RD; + else + transval = csbit | EPB_ROMDATA | buf[sofar]; + tries = epb_trans(dd, trans, transval, &transval); + if (tries <= 0) { + ipath_dbg("No EPB response DATA\n"); + break; + } + if (rd_notwr) + buf[sofar] = transval & EPB_DATA_MASK; + ++sofar; + } + /* Finally, clear control-bit for Read or Write */ + transval = csbit | EPB_UC_CTL; + tries = epb_trans(dd, trans, transval, &transval); + if (tries <= 0) + ipath_dbg("No EPB response to drop of uC %s cmd\n", op); + } + + ret = sofar; + /* Release bus. Failure is an error */ + if (epb_access(dd, sdnum, -1) < 0) + ret = -1; + + spin_unlock_irqrestore(&dd->ipath_sdepb_lock, flags); + if (tries <= 0) { + ipath_dbg("SERDES PRAM %s failed after %d bytes\n", op, sofar); + ret = -1; + } + return ret; +} + +#define PROG_CHUNK 64 + +int ipath_sd7220_prog_ld(struct ipath_devdata *dd, int sdnum, + u8 *img, int len, int offset) +{ + int cnt, sofar, req; + + sofar = 0; + while (sofar < len) { + req = len - sofar; + if (req > PROG_CHUNK) + req = PROG_CHUNK; + cnt = ipath_sd7220_ram_xfer(dd, sdnum, offset + sofar, + img + sofar, req, 0); + if (cnt < req) { + sofar = -1; + break; + } + sofar += req; + } + return sofar; +} + +#define VFY_CHUNK 64 +#define SD_PRAM_ERROR_LIMIT 42 + +int ipath_sd7220_prog_vfy(struct ipath_devdata *dd, int sdnum, + const u8 *img, int len, int offset) +{ + int cnt, sofar, req, idx, errors; + unsigned char readback[VFY_CHUNK]; + + errors = 0; + sofar = 0; + while (sofar < len) { + req = len - sofar; + if (req > VFY_CHUNK) + req = VFY_CHUNK; + cnt = ipath_sd7220_ram_xfer(dd, sdnum, sofar + offset, + readback, req, 1); + if (cnt < req) { + /* failed in read itself */ + sofar = -1; + break; + } + for (idx = 0; idx < cnt; ++idx) { + if (readback[idx] != img[idx+sofar]) + ++errors; + } + sofar += cnt; + } + return errors ? -errors : sofar; +} + +/* IRQ not set up at this point in init, so we poll. */ +#define IB_SERDES_TRIM_DONE (1ULL << 11) +#define TRIM_TMO (30) + +static int ipath_sd_trimdone_poll(struct ipath_devdata *dd) +{ + int trim_tmo, ret; + uint64_t val; + + /* + * Default to failure, so IBC will not start + * without IB_SERDES_TRIM_DONE. + */ + ret = 0; + for (trim_tmo = 0; trim_tmo < TRIM_TMO; ++trim_tmo) { + val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibcstatus); + if (val & IB_SERDES_TRIM_DONE) { + ipath_cdbg(VERBOSE, "TRIMDONE after %d\n", trim_tmo); + ret = 1; + break; + } + msleep(10); + } + if (trim_tmo >= TRIM_TMO) { + ipath_dev_err(dd, "No TRIMDONE in %d tries\n", trim_tmo); + ret = 0; + } + return ret; +} + +#define TX_FAST_ELT (9) + +/* + * Set the "negotiation" values for SERDES. These are used by the IB1.2 + * link negotiation. Macros below are attempt to keep the values a + * little more human-editable. + * First, values related to Drive De-emphasis Settings. + */ + +#define NUM_DDS_REGS 6 +#define DDS_REG_MAP 0x76A910 /* LSB-first list of regs (in elt 9) to mod */ + +#define DDS_VAL(amp_d, main_d, ipst_d, ipre_d, amp_s, main_s, ipst_s, ipre_s) \ + { { ((amp_d & 0x1F) << 1) | 1, ((amp_s & 0x1F) << 1) | 1, \ + (main_d << 3) | 4 | (ipre_d >> 2), \ + (main_s << 3) | 4 | (ipre_s >> 2), \ + ((ipst_d & 0xF) << 1) | ((ipre_d & 3) << 6) | 0x21, \ + ((ipst_s & 0xF) << 1) | ((ipre_s & 3) << 6) | 0x21 } } + +static struct dds_init { + uint8_t reg_vals[NUM_DDS_REGS]; +} dds_init_vals[] = { + /* DDR(FDR) SDR(HDR) */ + /* Vendor recommends below for 3m cable */ +#define DDS_3M 0 + DDS_VAL(31, 19, 12, 0, 29, 22, 9, 0), + DDS_VAL(31, 12, 15, 4, 31, 15, 15, 1), + DDS_VAL(31, 13, 15, 3, 31, 16, 15, 0), + DDS_VAL(31, 14, 15, 2, 31, 17, 14, 0), + DDS_VAL(31, 15, 15, 1, 31, 18, 13, 0), + DDS_VAL(31, 16, 15, 0, 31, 19, 12, 0), + DDS_VAL(31, 17, 14, 0, 31, 20, 11, 0), + DDS_VAL(31, 18, 13, 0, 30, 21, 10, 0), + DDS_VAL(31, 20, 11, 0, 28, 23, 8, 0), + DDS_VAL(31, 21, 10, 0, 27, 24, 7, 0), + DDS_VAL(31, 22, 9, 0, 26, 25, 6, 0), + DDS_VAL(30, 23, 8, 0, 25, 26, 5, 0), + DDS_VAL(29, 24, 7, 0, 23, 27, 4, 0), + /* Vendor recommends below for 1m cable */ +#define DDS_1M 13 + DDS_VAL(28, 25, 6, 0, 21, 28, 3, 0), + DDS_VAL(27, 26, 5, 0, 19, 29, 2, 0), + DDS_VAL(25, 27, 4, 0, 17, 30, 1, 0) +}; + +/* + * Next, values related to Receive Equalization. + * In comments, FDR (Full) is IB DDR, HDR (Half) is IB SDR + */ +/* Hardware packs an element number and register address thus: */ +#define RXEQ_INIT_RDESC(elt, addr) (((elt) & 0xF) | ((addr) << 4)) +#define RXEQ_VAL(elt, adr, val0, val1, val2, val3) \ + {RXEQ_INIT_RDESC((elt), (adr)), {(val0), (val1), (val2), (val3)} } + +#define RXEQ_VAL_ALL(elt, adr, val) \ + {RXEQ_INIT_RDESC((elt), (adr)), {(val), (val), (val), (val)} } + +#define RXEQ_SDR_DFELTH 0 +#define RXEQ_SDR_TLTH 0 +#define RXEQ_SDR_G1CNT_Z1CNT 0x11 +#define RXEQ_SDR_ZCNT 23 + +static struct rxeq_init { + u16 rdesc; /* in form used in SerDesDDSRXEQ */ + u8 rdata[4]; +} rxeq_init_vals[] = { + /* Set Rcv Eq. to Preset node */ + RXEQ_VAL_ALL(7, 0x27, 0x10), + /* Set DFELTHFDR/HDR thresholds */ + RXEQ_VAL(7, 8, 0, 0, 0, 0), /* FDR */ + RXEQ_VAL(7, 0x21, 0, 0, 0, 0), /* HDR */ + /* Set TLTHFDR/HDR theshold */ + RXEQ_VAL(7, 9, 2, 2, 2, 2), /* FDR */ + RXEQ_VAL(7, 0x23, 2, 2, 2, 2), /* HDR */ + /* Set Preamp setting 2 (ZFR/ZCNT) */ + RXEQ_VAL(7, 0x1B, 12, 12, 12, 12), /* FDR */ + RXEQ_VAL(7, 0x1C, 12, 12, 12, 12), /* HDR */ + /* Set Preamp DC gain and Setting 1 (GFR/GHR) */ + RXEQ_VAL(7, 0x1E, 0x10, 0x10, 0x10, 0x10), /* FDR */ + RXEQ_VAL(7, 0x1F, 0x10, 0x10, 0x10, 0x10), /* HDR */ + /* Toggle RELOCK (in VCDL_CTRL0) to lock to data */ + RXEQ_VAL_ALL(6, 6, 0x20), /* Set D5 High */ + RXEQ_VAL_ALL(6, 6, 0), /* Set D5 Low */ +}; + +/* There are 17 values from vendor, but IBC only accesses the first 16 */ +#define DDS_ROWS (16) +#define RXEQ_ROWS ARRAY_SIZE(rxeq_init_vals) + +static int ipath_sd_setvals(struct ipath_devdata *dd) +{ + int idx, midx; + int min_idx; /* Minimum index for this portion of table */ + uint32_t dds_reg_map; + u64 __iomem *taddr, *iaddr; + uint64_t data; + uint64_t sdctl; + + taddr = dd->ipath_kregbase + KR_IBSerDesMappTable; + iaddr = dd->ipath_kregbase + dd->ipath_kregs->kr_ib_ddsrxeq; + + /* + * Init the DDS section of the table. + * Each "row" of the table provokes NUM_DDS_REG writes, to the + * registers indicated in DDS_REG_MAP. + */ + sdctl = ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibserdesctrl); + sdctl = (sdctl & ~(0x1f << 8)) | (NUM_DDS_REGS << 8); + sdctl = (sdctl & ~(0x1f << 13)) | (RXEQ_ROWS << 13); + ipath_write_kreg(dd, dd->ipath_kregs->kr_ibserdesctrl, sdctl); + + /* + * Iterate down table within loop for each register to store. + */ + dds_reg_map = DDS_REG_MAP; + for (idx = 0; idx < NUM_DDS_REGS; ++idx) { + data = ((dds_reg_map & 0xF) << 4) | TX_FAST_ELT; + writeq(data, iaddr + idx); + mmiowb(); + ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); + dds_reg_map >>= 4; + for (midx = 0; midx < DDS_ROWS; ++midx) { + u64 __iomem *daddr = taddr + ((midx << 4) + idx); + data = dds_init_vals[midx].reg_vals[idx]; + writeq(data, daddr); + mmiowb(); + ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); + } /* End inner for (vals for this reg, each row) */ + } /* end outer for (regs to be stored) */ + + /* + * Init the RXEQ section of the table. As explained above the table + * rxeq_init_vals[], this runs in a different order, as the pattern + * of register references is more complex, but there are only + * four "data" values per register. + */ + min_idx = idx; /* RXEQ indices pick up where DDS left off */ + taddr += 0x100; /* RXEQ data is in second half of table */ + /* Iterate through RXEQ register addresses */ + for (idx = 0; idx < RXEQ_ROWS; ++idx) { + int didx; /* "destination" */ + int vidx; + + /* didx is offset by min_idx to address RXEQ range of regs */ + didx = idx + min_idx; + /* Store the next RXEQ register address */ + writeq(rxeq_init_vals[idx].rdesc, iaddr + didx); + mmiowb(); + ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); + /* Iterate through RXEQ values */ + for (vidx = 0; vidx < 4; vidx++) { + data = rxeq_init_vals[idx].rdata[vidx]; + writeq(data, taddr + (vidx << 6) + idx); + mmiowb(); + ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); + } + } /* end outer for (Reg-writes for RXEQ) */ + return 0; +} + +#define CMUCTRL5 EPB_LOC(7, 0, 0x15) +#define RXHSCTRL0(chan) EPB_LOC(chan, 6, 0) +#define VCDL_DAC2(chan) EPB_LOC(chan, 6, 5) +#define VCDL_CTRL0(chan) EPB_LOC(chan, 6, 6) +#define VCDL_CTRL2(chan) EPB_LOC(chan, 6, 8) +#define START_EQ2(chan) EPB_LOC(chan, 7, 0x28) + +static int ibsd_sto_noisy(struct ipath_devdata *dd, int loc, int val, int mask) +{ + int ret = -1; + int sloc; /* shifted loc, for messages */ + + loc |= (1U << EPB_IB_QUAD0_CS_SHF); + sloc = loc >> EPB_ADDR_SHF; + + ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, loc, val, mask); + if (ret < 0) + ipath_dev_err(dd, "Write failed: elt %d," + " addr 0x%X, chnl %d, val 0x%02X, mask 0x%02X\n", + (sloc & 0xF), (sloc >> 9) & 0x3f, (sloc >> 4) & 7, + val & 0xFF, mask & 0xFF); + return ret; +} + +/* + * Repeat a "store" across all channels of the IB SerDes. + * Although nominally it inherits the "read value" of the last + * channel it modified, the only really useful return is <0 for + * failure, >= 0 for success. The parameter 'loc' is assumed to + * be the location for the channel-0 copy of the register to + * be modified. + */ +static int ibsd_mod_allchnls(struct ipath_devdata *dd, int loc, int val, + int mask) +{ + int ret = -1; + int chnl; + + if (loc & EPB_GLOBAL_WR) { + /* + * Our caller has assured us that we can set all four + * channels at once. Trust that. If mask is not 0xFF, + * we will read the _specified_ channel for our starting + * value. + */ + loc |= (1U << EPB_IB_QUAD0_CS_SHF); + chnl = (loc >> (4 + EPB_ADDR_SHF)) & 7; + if (mask != 0xFF) { + ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, + loc & ~EPB_GLOBAL_WR, 0, 0); + if (ret < 0) { + int sloc = loc >> EPB_ADDR_SHF; + ipath_dev_err(dd, "pre-read failed: elt %d," + " addr 0x%X, chnl %d\n", (sloc & 0xF), + (sloc >> 9) & 0x3f, chnl); + return ret; + } + val = (ret & ~mask) | (val & mask); + } + loc &= ~(7 << (4+EPB_ADDR_SHF)); + ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, loc, val, 0xFF); + if (ret < 0) { + int sloc = loc >> EPB_ADDR_SHF; + ipath_dev_err(dd, "Global WR failed: elt %d," + " addr 0x%X, val %02X\n", + (sloc & 0xF), (sloc >> 9) & 0x3f, val); + } + return ret; + } + /* Clear "channel" and set CS so we can simply iterate */ + loc &= ~(7 << (4+EPB_ADDR_SHF)); + loc |= (1U << EPB_IB_QUAD0_CS_SHF); + for (chnl = 0; chnl < 4; ++chnl) { + int cloc; + cloc = loc | (chnl << (4+EPB_ADDR_SHF)); + ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, cloc, val, mask); + if (ret < 0) { + int sloc = loc >> EPB_ADDR_SHF; + ipath_dev_err(dd, "Write failed: elt %d," + " addr 0x%X, chnl %d, val 0x%02X," + " mask 0x%02X\n", + (sloc & 0xF), (sloc >> 9) & 0x3f, chnl, + val & 0xFF, mask & 0xFF); + break; + } + } + return ret; +} + +/* + * Set the Tx values normally modified by IBC in IB1.2 mode to default + * values, as gotten from first row of init table. + */ +static int set_dds_vals(struct ipath_devdata *dd, struct dds_init *ddi) +{ + int ret; + int idx, reg, data; + uint32_t regmap; + + regmap = DDS_REG_MAP; + for (idx = 0; idx < NUM_DDS_REGS; ++idx) { + reg = (regmap & 0xF); + regmap >>= 4; + data = ddi->reg_vals[idx]; + /* Vendor says RMW not needed for these regs, use 0xFF mask */ + ret = ibsd_mod_allchnls(dd, EPB_LOC(0, 9, reg), data, 0xFF); + if (ret < 0) + break; + } + return ret; +} + +/* + * Set the Rx values normally modified by IBC in IB1.2 mode to default + * values, as gotten from selected column of init table. + */ +static int set_rxeq_vals(struct ipath_devdata *dd, int vsel) +{ + int ret; + int ridx; + int cnt = ARRAY_SIZE(rxeq_init_vals); + + for (ridx = 0; ridx < cnt; ++ridx) { + int elt, reg, val, loc; + elt = rxeq_init_vals[ridx].rdesc & 0xF; + reg = rxeq_init_vals[ridx].rdesc >> 4; + loc = EPB_LOC(0, elt, reg); + val = rxeq_init_vals[ridx].rdata[vsel]; + /* mask of 0xFF, because hardware does full-byte store. */ + ret = ibsd_mod_allchnls(dd, loc, val, 0xFF); + if (ret < 0) + break; + } + return ret; +} + +/* + * Set the default values (row 0) for DDR Driver Demphasis. + * we do this initially and whenever we turn off IB-1.2 + * The "default" values for Rx equalization are also stored to + * SerDes registers. Formerly (and still default), we used set 2. + * For experimenting with cables and link-partners, we allow changing + * that via a module parameter. + */ +static unsigned ipath_rxeq_set = 2; +module_param_named(rxeq_default_set, ipath_rxeq_set, uint, + S_IWUSR | S_IRUGO); +MODULE_PARM_DESC(rxeq_default_set, + "Which set [0..3] of Rx Equalization values is default"); + +static int ipath_internal_presets(struct ipath_devdata *dd) +{ + int ret = 0; + + ret = set_dds_vals(dd, dds_init_vals + DDS_3M); + + if (ret < 0) + ipath_dev_err(dd, "Failed to set default DDS values\n"); + ret = set_rxeq_vals(dd, ipath_rxeq_set & 3); + if (ret < 0) + ipath_dev_err(dd, "Failed to set default RXEQ values\n"); + return ret; +} + +int ipath_sd7220_presets(struct ipath_devdata *dd) +{ + int ret = 0; + + if (!dd->ipath_presets_needed) + return ret; + dd->ipath_presets_needed = 0; + /* Assert uC reset, so we don't clash with it. */ + ipath_ibsd_reset(dd, 1); + udelay(2); + ipath_sd_trimdone_monitor(dd, "link-down"); + + ret = ipath_internal_presets(dd); +return ret; +} + +static int ipath_sd_trimself(struct ipath_devdata *dd, int val) +{ + return ibsd_sto_noisy(dd, CMUCTRL5, val, 0xFF); +} + +static int ipath_sd_early(struct ipath_devdata *dd) +{ + int ret = -1; /* Default failed */ + int chnl; + + for (chnl = 0; chnl < 4; ++chnl) { + ret = ibsd_sto_noisy(dd, RXHSCTRL0(chnl), 0xD4, 0xFF); + if (ret < 0) + goto bail; + } + for (chnl = 0; chnl < 4; ++chnl) { + ret = ibsd_sto_noisy(dd, VCDL_DAC2(chnl), 0x2D, 0xFF); + if (ret < 0) + goto bail; + } + /* more fine-tuning of what will be default */ + for (chnl = 0; chnl < 4; ++chnl) { + ret = ibsd_sto_noisy(dd, VCDL_CTRL2(chnl), 3, 0xF); + if (ret < 0) + goto bail; + } + for (chnl = 0; chnl < 4; ++chnl) { + ret = ibsd_sto_noisy(dd, START_EQ1(chnl), 0x10, 0xFF); + if (ret < 0) + goto bail; + } + for (chnl = 0; chnl < 4; ++chnl) { + ret = ibsd_sto_noisy(dd, START_EQ2(chnl), 0x30, 0xFF); + if (ret < 0) + goto bail; + } +bail: + return ret; +} + +#define BACTRL(chnl) EPB_LOC(chnl, 6, 0x0E) +#define LDOUTCTRL1(chnl) EPB_LOC(chnl, 7, 6) +#define RXHSSTATUS(chnl) EPB_LOC(chnl, 6, 0xF) + +static int ipath_sd_dactrim(struct ipath_devdata *dd) +{ + int ret = -1; /* Default failed */ + int chnl; + + for (chnl = 0; chnl < 4; ++chnl) { + ret = ibsd_sto_noisy(dd, BACTRL(chnl), 0x40, 0xFF); + if (ret < 0) + goto bail; + } + for (chnl = 0; chnl < 4; ++chnl) { + ret = ibsd_sto_noisy(dd, LDOUTCTRL1(chnl), 0x04, 0xFF); + if (ret < 0) + goto bail; + } + for (chnl = 0; chnl < 4; ++chnl) { + ret = ibsd_sto_noisy(dd, RXHSSTATUS(chnl), 0x04, 0xFF); + if (ret < 0) + goto bail; + } + /* + * delay for max possible number of steps, with slop. + * Each step is about 4usec. + */ + udelay(415); + for (chnl = 0; chnl < 4; ++chnl) { + ret = ibsd_sto_noisy(dd, LDOUTCTRL1(chnl), 0x00, 0xFF); + if (ret < 0) + goto bail; + } +bail: + return ret; +} + +#define RELOCK_FIRST_MS 3 +#define RXLSPPM(chan) EPB_LOC(chan, 0, 2) +void ipath_toggle_rclkrls(struct ipath_devdata *dd) +{ + int loc = RXLSPPM(0) | EPB_GLOBAL_WR; + int ret; + + ret = ibsd_mod_allchnls(dd, loc, 0, 0x80); + if (ret < 0) + ipath_dev_err(dd, "RCLKRLS failed to clear D7\n"); + else { + udelay(1); + ibsd_mod_allchnls(dd, loc, 0x80, 0x80); + } + /* And again for good measure */ + udelay(1); + ret = ibsd_mod_allchnls(dd, loc, 0, 0x80); + if (ret < 0) + ipath_dev_err(dd, "RCLKRLS failed to clear D7\n"); + else { + udelay(1); + ibsd_mod_allchnls(dd, loc, 0x80, 0x80); + } + /* Now reset xgxs and IBC to complete the recovery */ + dd->ipath_f_xgxs_reset(dd); +} + +/* + * Shut down the timer that polls for relock occasions, if needed + * this is "hooked" from ipath_7220_quiet_serdes(), which is called + * just before ipath_shutdown_device() in ipath_driver.c shuts down all + * the other timers + */ +void ipath_shutdown_relock_poll(struct ipath_devdata *dd) +{ + struct ipath_relock *irp = &dd->ipath_relock_singleton; + if (atomic_read(&irp->ipath_relock_timer_active)) { + del_timer_sync(&irp->ipath_relock_timer); + atomic_set(&irp->ipath_relock_timer_active, 0); + } +} + +static unsigned ipath_relock_by_timer = 1; +module_param_named(relock_by_timer, ipath_relock_by_timer, uint, + S_IWUSR | S_IRUGO); +MODULE_PARM_DESC(relock_by_timer, "Allow relock attempt if link not up"); + +static void ipath_run_relock(unsigned long opaque) +{ + struct ipath_devdata *dd = (struct ipath_devdata *)opaque; + struct ipath_relock *irp = &dd->ipath_relock_singleton; + u64 val, ltstate; + + if (!(dd->ipath_flags & IPATH_INITTED)) { + /* Not yet up, just reenable the timer for later */ + irp->ipath_relock_interval = HZ; + mod_timer(&irp->ipath_relock_timer, jiffies + HZ); + return; + } + + /* + * Check link-training state for "stuck" state. + * if found, try relock and schedule another try at + * exponentially growing delay, maxed at one second. + * if not stuck, our work is done. + */ + val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibcstatus); + ltstate = ipath_ib_linktrstate(dd, val); + + if (ltstate <= INFINIPATH_IBCS_LT_STATE_CFGWAITRMT + && ltstate != INFINIPATH_IBCS_LT_STATE_LINKUP) { + int timeoff; + /* Not up yet. Try again, if allowed by module-param */ + if (ipath_relock_by_timer) { + if (dd->ipath_flags & IPATH_IB_AUTONEG_INPROG) + ipath_cdbg(VERBOSE, "Skip RELOCK in AUTONEG\n"); + else if (!(dd->ipath_flags & IPATH_IB_LINK_DISABLED)) { + ipath_cdbg(VERBOSE, "RELOCK\n"); + ipath_toggle_rclkrls(dd); + } + } + /* re-set timer for next check */ + timeoff = irp->ipath_relock_interval << 1; + if (timeoff > HZ) + timeoff = HZ; + irp->ipath_relock_interval = timeoff; + + mod_timer(&irp->ipath_relock_timer, jiffies + timeoff); + } else { + /* Up, so no more need to check so often */ + mod_timer(&irp->ipath_relock_timer, jiffies + HZ); + } +} + +void ipath_set_relock_poll(struct ipath_devdata *dd, int ibup) +{ + struct ipath_relock *irp = &dd->ipath_relock_singleton; + + if (ibup > 0) { + /* we are now up, so relax timer to 1 second interval */ + if (atomic_read(&irp->ipath_relock_timer_active)) + mod_timer(&irp->ipath_relock_timer, jiffies + HZ); + } else { + /* Transition to down, (re-)set timer to short interval. */ + int timeout; + timeout = (HZ * ((ibup == -1) ? 1000 : RELOCK_FIRST_MS))/1000; + if (timeout == 0) + timeout = 1; + /* If timer has not yet been started, do so. */ + if (atomic_inc_return(&irp->ipath_relock_timer_active) == 1) { + init_timer(&irp->ipath_relock_timer); + irp->ipath_relock_timer.function = ipath_run_relock; + irp->ipath_relock_timer.data = (unsigned long) dd; + irp->ipath_relock_interval = timeout; + irp->ipath_relock_timer.expires = jiffies + timeout; + add_timer(&irp->ipath_relock_timer); + } else { + irp->ipath_relock_interval = timeout; + mod_timer(&irp->ipath_relock_timer, jiffies + timeout); + atomic_dec(&irp->ipath_relock_timer_active); + } + } +} + From ralph.campbell at qlogic.com Wed Apr 2 15:50:13 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:50:13 -0700 Subject: [ofa-general] [PATCH 14/20] IB/ipath - Add IBA7220 specific initialization data In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: <20080402225013.28598.14756.stgit@eng-46.mv.qlogic.com> This patch adds binary data to initialize the IB SERDES. Signed-off-by: Michael Albaugh --- drivers/infiniband/hw/ipath/ipath_sd7220_img.c | 1082 ++++++++++++++++++++++++ 1 files changed, 1082 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_sd7220_img.c b/drivers/infiniband/hw/ipath/ipath_sd7220_img.c new file mode 100644 index 0000000..5ef59da --- /dev/null +++ b/drivers/infiniband/hw/ipath/ipath_sd7220_img.c @@ -0,0 +1,1082 @@ +/* + * Copyright (c) 2007, 2008 QLogic Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +/* + * This file contains the memory image from the vendor, to be copied into + * the IB SERDES of the IBA7220 during initialization. + * The file also includes the two functions which use this image. + */ +#include +#include + +#include "ipath_kernel.h" +#include "ipath_registers.h" +#include "ipath_7220.h" + +static unsigned char ipath_sd7220_ib_img[] = { +/*0000*/0x02, 0x0A, 0x29, 0x02, 0x0A, 0x87, 0xE5, 0xE6, + 0x30, 0xE6, 0x04, 0x7F, 0x01, 0x80, 0x02, 0x7F, +/*0010*/0x00, 0xE5, 0xE2, 0x30, 0xE4, 0x04, 0x7E, 0x01, + 0x80, 0x02, 0x7E, 0x00, 0xEE, 0x5F, 0x60, 0x08, +/*0020*/0x53, 0xF9, 0xF7, 0xE4, 0xF5, 0xFE, 0x80, 0x08, + 0x7F, 0x0A, 0x12, 0x17, 0x31, 0x12, 0x0E, 0xA2, +/*0030*/0x75, 0xFC, 0x08, 0xE4, 0xF5, 0xFD, 0xE5, 0xE7, + 0x20, 0xE7, 0x03, 0x43, 0xF9, 0x08, 0x22, 0x00, +/*0040*/0x01, 0x20, 0x11, 0x00, 0x04, 0x20, 0x00, 0x75, + 0x51, 0x01, 0xE4, 0xF5, 0x52, 0xF5, 0x53, 0xF5, +/*0050*/0x52, 0xF5, 0x7E, 0x7F, 0x04, 0x02, 0x04, 0x38, + 0xC2, 0x36, 0x05, 0x52, 0xE5, 0x52, 0xD3, 0x94, +/*0060*/0x0C, 0x40, 0x05, 0x75, 0x52, 0x01, 0xD2, 0x36, + 0x90, 0x07, 0x0C, 0x74, 0x07, 0xF0, 0xA3, 0x74, +/*0070*/0xFF, 0xF0, 0xE4, 0xF5, 0x0C, 0xA3, 0xF0, 0x90, + 0x07, 0x14, 0xF0, 0xA3, 0xF0, 0x75, 0x0B, 0x20, +/*0080*/0xF5, 0x09, 0xE4, 0xF5, 0x08, 0xE5, 0x08, 0xD3, + 0x94, 0x30, 0x40, 0x03, 0x02, 0x04, 0x04, 0x12, +/*0090*/0x00, 0x06, 0x15, 0x0B, 0xE5, 0x08, 0x70, 0x04, + 0x7F, 0x01, 0x80, 0x02, 0x7F, 0x00, 0xE5, 0x09, +/*00A0*/0x70, 0x04, 0x7E, 0x01, 0x80, 0x02, 0x7E, 0x00, + 0xEE, 0x5F, 0x60, 0x05, 0x12, 0x18, 0x71, 0xD2, +/*00B0*/0x35, 0x53, 0xE1, 0xF7, 0xE5, 0x08, 0x45, 0x09, + 0xFF, 0xE5, 0x0B, 0x25, 0xE0, 0x25, 0xE0, 0x24, +/*00C0*/0x83, 0xF5, 0x82, 0xE4, 0x34, 0x07, 0xF5, 0x83, + 0xEF, 0xF0, 0x85, 0xE2, 0x20, 0xE5, 0x52, 0xD3, +/*00D0*/0x94, 0x01, 0x40, 0x0D, 0x12, 0x19, 0xF3, 0xE0, + 0x54, 0xA0, 0x64, 0x40, 0x70, 0x03, 0x02, 0x03, +/*00E0*/0xFB, 0x53, 0xF9, 0xF8, 0x90, 0x94, 0x70, 0xE4, + 0xF0, 0xE0, 0xF5, 0x10, 0xAF, 0x09, 0x12, 0x1E, +/*00F0*/0xB3, 0xAF, 0x08, 0xEF, 0x44, 0x08, 0xF5, 0x82, + 0x75, 0x83, 0x80, 0xE0, 0xF5, 0x29, 0xEF, 0x44, +/*0100*/0x07, 0x12, 0x1A, 0x3C, 0xF5, 0x22, 0x54, 0x40, + 0xD3, 0x94, 0x00, 0x40, 0x1E, 0xE5, 0x29, 0x54, +/*0110*/0xF0, 0x70, 0x21, 0x12, 0x19, 0xF3, 0xE0, 0x44, + 0x80, 0xF0, 0xE5, 0x22, 0x54, 0x30, 0x65, 0x08, +/*0120*/0x70, 0x09, 0x12, 0x19, 0xF3, 0xE0, 0x54, 0xBF, + 0xF0, 0x80, 0x09, 0x12, 0x19, 0xF3, 0x74, 0x40, +/*0130*/0xF0, 0x02, 0x03, 0xFB, 0x12, 0x1A, 0x12, 0x75, + 0x83, 0xAE, 0x74, 0xFF, 0xF0, 0xAF, 0x08, 0x7E, +/*0140*/0x00, 0xEF, 0x44, 0x07, 0xF5, 0x82, 0xE0, 0xFD, + 0xE5, 0x0B, 0x25, 0xE0, 0x25, 0xE0, 0x24, 0x81, +/*0150*/0xF5, 0x82, 0xE4, 0x34, 0x07, 0xF5, 0x83, 0xED, + 0xF0, 0x90, 0x07, 0x0E, 0xE0, 0x04, 0xF0, 0xEF, +/*0160*/0x44, 0x07, 0xF5, 0x82, 0x75, 0x83, 0x98, 0xE0, + 0xF5, 0x28, 0x12, 0x1A, 0x23, 0x40, 0x0C, 0x12, +/*0170*/0x19, 0xF3, 0xE0, 0x44, 0x01, 0x12, 0x1A, 0x32, + 0x02, 0x03, 0xF6, 0xAF, 0x08, 0x7E, 0x00, 0x74, +/*0180*/0x80, 0xCD, 0xEF, 0xCD, 0x8D, 0x82, 0xF5, 0x83, + 0xE0, 0x30, 0xE0, 0x0A, 0x12, 0x19, 0xF3, 0xE0, +/*0190*/0x44, 0x20, 0xF0, 0x02, 0x03, 0xFB, 0x12, 0x19, + 0xF3, 0xE0, 0x54, 0xDF, 0xF0, 0xEE, 0x44, 0xAE, +/*01A0*/0x12, 0x1A, 0x43, 0x30, 0xE4, 0x03, 0x02, 0x03, + 0xFB, 0x74, 0x9E, 0x12, 0x1A, 0x05, 0x20, 0xE0, +/*01B0*/0x03, 0x02, 0x03, 0xFB, 0x8F, 0x82, 0x8E, 0x83, + 0xE0, 0x20, 0xE0, 0x03, 0x02, 0x03, 0xFB, 0x12, +/*01C0*/0x19, 0xF3, 0xE0, 0x44, 0x10, 0xF0, 0xE5, 0xE3, + 0x20, 0xE7, 0x08, 0xE5, 0x08, 0x12, 0x1A, 0x3A, +/*01D0*/0x44, 0x04, 0xF0, 0xAF, 0x08, 0x7E, 0x00, 0xEF, + 0x12, 0x1A, 0x3A, 0x20, 0xE2, 0x34, 0x12, 0x19, +/*01E0*/0xF3, 0xE0, 0x44, 0x08, 0xF0, 0xE5, 0xE4, 0x30, + 0xE6, 0x04, 0x7D, 0x01, 0x80, 0x02, 0x7D, 0x00, +/*01F0*/0xE5, 0x7E, 0xC3, 0x94, 0x04, 0x50, 0x04, 0x7C, + 0x01, 0x80, 0x02, 0x7C, 0x00, 0xEC, 0x4D, 0x60, +/*0200*/0x05, 0xC2, 0x35, 0x02, 0x03, 0xFB, 0xEE, 0x44, + 0xD2, 0x12, 0x1A, 0x43, 0x44, 0x40, 0xF0, 0x02, +/*0210*/0x03, 0xFB, 0x12, 0x19, 0xF3, 0xE0, 0x54, 0xF7, + 0xF0, 0x12, 0x1A, 0x12, 0x75, 0x83, 0xD2, 0xE0, +/*0220*/0x54, 0xBF, 0xF0, 0x90, 0x07, 0x14, 0xE0, 0x04, + 0xF0, 0xE5, 0x7E, 0x70, 0x03, 0x75, 0x7E, 0x01, +/*0230*/0xAF, 0x08, 0x7E, 0x00, 0x12, 0x1A, 0x23, 0x40, + 0x12, 0x12, 0x19, 0xF3, 0xE0, 0x44, 0x01, 0x12, +/*0240*/0x19, 0xF2, 0xE0, 0x54, 0x02, 0x12, 0x1A, 0x32, + 0x02, 0x03, 0xFB, 0x12, 0x19, 0xF3, 0xE0, 0x44, +/*0250*/0x02, 0x12, 0x19, 0xF2, 0xE0, 0x54, 0xFE, 0xF0, + 0xC2, 0x35, 0xEE, 0x44, 0x8A, 0x8F, 0x82, 0xF5, +/*0260*/0x83, 0xE0, 0xF5, 0x17, 0x54, 0x8F, 0x44, 0x40, + 0xF0, 0x74, 0x90, 0xFC, 0xE5, 0x08, 0x44, 0x07, +/*0270*/0xFD, 0xF5, 0x82, 0x8C, 0x83, 0xE0, 0x54, 0x3F, + 0x90, 0x07, 0x02, 0xF0, 0xE0, 0x54, 0xC0, 0x8D, +/*0280*/0x82, 0x8C, 0x83, 0xF0, 0x74, 0x92, 0x12, 0x1A, + 0x05, 0x90, 0x07, 0x03, 0x12, 0x1A, 0x19, 0x74, +/*0290*/0x82, 0x12, 0x1A, 0x05, 0x90, 0x07, 0x04, 0x12, + 0x1A, 0x19, 0x74, 0xB4, 0x12, 0x1A, 0x05, 0x90, +/*02A0*/0x07, 0x05, 0x12, 0x1A, 0x19, 0x74, 0x94, 0xFE, + 0xE5, 0x08, 0x44, 0x06, 0x12, 0x1A, 0x0A, 0xF5, +/*02B0*/0x10, 0x30, 0xE0, 0x04, 0xD2, 0x37, 0x80, 0x02, + 0xC2, 0x37, 0xE5, 0x10, 0x54, 0x7F, 0x8F, 0x82, +/*02C0*/0x8E, 0x83, 0xF0, 0x30, 0x44, 0x30, 0x12, 0x1A, + 0x03, 0x54, 0x80, 0xD3, 0x94, 0x00, 0x40, 0x04, +/*02D0*/0xD2, 0x39, 0x80, 0x02, 0xC2, 0x39, 0x8F, 0x82, + 0x8E, 0x83, 0xE0, 0x44, 0x80, 0xF0, 0x12, 0x1A, +/*02E0*/0x03, 0x54, 0x40, 0xD3, 0x94, 0x00, 0x40, 0x04, + 0xD2, 0x3A, 0x80, 0x02, 0xC2, 0x3A, 0x8F, 0x82, +/*02F0*/0x8E, 0x83, 0xE0, 0x44, 0x40, 0xF0, 0x74, 0x92, + 0xFE, 0xE5, 0x08, 0x44, 0x06, 0x12, 0x1A, 0x0A, +/*0300*/0x30, 0xE7, 0x04, 0xD2, 0x38, 0x80, 0x02, 0xC2, + 0x38, 0x8F, 0x82, 0x8E, 0x83, 0xE0, 0x54, 0x7F, +/*0310*/0xF0, 0x12, 0x1E, 0x46, 0xE4, 0xF5, 0x0A, 0x20, + 0x03, 0x02, 0x80, 0x03, 0x30, 0x43, 0x03, 0x12, +/*0320*/0x19, 0x95, 0x20, 0x02, 0x02, 0x80, 0x03, 0x30, + 0x42, 0x03, 0x12, 0x0C, 0x8F, 0x30, 0x30, 0x06, +/*0330*/0x12, 0x19, 0x95, 0x12, 0x0C, 0x8F, 0x12, 0x0D, + 0x47, 0x12, 0x19, 0xF3, 0xE0, 0x54, 0xFB, 0xF0, +/*0340*/0xE5, 0x0A, 0xC3, 0x94, 0x01, 0x40, 0x46, 0x43, + 0xE1, 0x08, 0x12, 0x19, 0xF3, 0xE0, 0x44, 0x04, +/*0350*/0xF0, 0xE5, 0xE4, 0x20, 0xE7, 0x2A, 0x12, 0x1A, + 0x12, 0x75, 0x83, 0xD2, 0xE0, 0x54, 0x08, 0xD3, +/*0360*/0x94, 0x00, 0x40, 0x04, 0x7F, 0x01, 0x80, 0x02, + 0x7F, 0x00, 0xE5, 0x0A, 0xC3, 0x94, 0x01, 0x40, +/*0370*/0x04, 0x7E, 0x01, 0x80, 0x02, 0x7E, 0x00, 0xEF, + 0x5E, 0x60, 0x05, 0x12, 0x1D, 0xD7, 0x80, 0x17, +/*0380*/0x12, 0x1A, 0x12, 0x75, 0x83, 0xD2, 0xE0, 0x44, + 0x08, 0xF0, 0x02, 0x03, 0xFB, 0x12, 0x1A, 0x12, +/*0390*/0x75, 0x83, 0xD2, 0xE0, 0x54, 0xF7, 0xF0, 0x12, + 0x1E, 0x46, 0x7F, 0x08, 0x12, 0x17, 0x31, 0x74, +/*03A0*/0x8E, 0xFE, 0x12, 0x1A, 0x12, 0x8E, 0x83, 0xE0, + 0xF5, 0x10, 0x54, 0xFE, 0xF0, 0xE5, 0x10, 0x44, +/*03B0*/0x01, 0xFF, 0xE5, 0x08, 0xFD, 0xED, 0x44, 0x07, + 0xF5, 0x82, 0xEF, 0xF0, 0xE5, 0x10, 0x54, 0xFE, +/*03C0*/0xFF, 0xED, 0x44, 0x07, 0xF5, 0x82, 0xEF, 0x12, + 0x1A, 0x11, 0x75, 0x83, 0x86, 0xE0, 0x44, 0x10, +/*03D0*/0x12, 0x1A, 0x11, 0xE0, 0x44, 0x10, 0xF0, 0x12, + 0x19, 0xF3, 0xE0, 0x54, 0xFD, 0x44, 0x01, 0xFF, +/*03E0*/0x12, 0x19, 0xF3, 0xEF, 0x12, 0x1A, 0x32, 0x30, + 0x32, 0x0C, 0xE5, 0x08, 0x44, 0x08, 0xF5, 0x82, +/*03F0*/0x75, 0x83, 0x82, 0x74, 0x05, 0xF0, 0xAF, 0x0B, + 0x12, 0x18, 0xD7, 0x74, 0x10, 0x25, 0x08, 0xF5, +/*0400*/0x08, 0x02, 0x00, 0x85, 0x05, 0x09, 0xE5, 0x09, + 0xD3, 0x94, 0x07, 0x50, 0x03, 0x02, 0x00, 0x82, +/*0410*/0xE5, 0x7E, 0xD3, 0x94, 0x00, 0x40, 0x04, 0x7F, + 0x01, 0x80, 0x02, 0x7F, 0x00, 0xE5, 0x7E, 0xC3, +/*0420*/0x94, 0xFA, 0x50, 0x04, 0x7E, 0x01, 0x80, 0x02, + 0x7E, 0x00, 0xEE, 0x5F, 0x60, 0x02, 0x05, 0x7E, +/*0430*/0x30, 0x35, 0x0B, 0x43, 0xE1, 0x01, 0x7F, 0x09, + 0x12, 0x17, 0x31, 0x02, 0x00, 0x58, 0x53, 0xE1, +/*0440*/0xFE, 0x02, 0x00, 0x58, 0x8E, 0x6A, 0x8F, 0x6B, + 0x8C, 0x6C, 0x8D, 0x6D, 0x75, 0x6E, 0x01, 0x75, +/*0450*/0x6F, 0x01, 0x75, 0x70, 0x01, 0xE4, 0xF5, 0x73, + 0xF5, 0x74, 0xF5, 0x75, 0x90, 0x07, 0x2F, 0xF0, +/*0460*/0xF5, 0x3C, 0xF5, 0x3E, 0xF5, 0x46, 0xF5, 0x47, + 0xF5, 0x3D, 0xF5, 0x3F, 0xF5, 0x6F, 0xE5, 0x6F, +/*0470*/0x70, 0x0F, 0xE5, 0x6B, 0x45, 0x6A, 0x12, 0x07, + 0x2A, 0x75, 0x83, 0x80, 0x74, 0x3A, 0xF0, 0x80, +/*0480*/0x09, 0x12, 0x07, 0x2A, 0x75, 0x83, 0x80, 0x74, + 0x1A, 0xF0, 0xE4, 0xF5, 0x6E, 0xC3, 0x74, 0x3F, +/*0490*/0x95, 0x6E, 0xFF, 0x12, 0x08, 0x65, 0x75, 0x83, + 0x82, 0xEF, 0xF0, 0x12, 0x1A, 0x4D, 0x12, 0x08, +/*04A0*/0xC6, 0xE5, 0x33, 0xF0, 0x12, 0x08, 0xFA, 0x12, + 0x08, 0xB1, 0x40, 0xE1, 0xE5, 0x6F, 0x70, 0x0B, +/*04B0*/0x12, 0x07, 0x2A, 0x75, 0x83, 0x80, 0x74, 0x36, + 0xF0, 0x80, 0x09, 0x12, 0x07, 0x2A, 0x75, 0x83, +/*04C0*/0x80, 0x74, 0x16, 0xF0, 0x75, 0x6E, 0x01, 0x12, + 0x07, 0x2A, 0x75, 0x83, 0xB4, 0xE5, 0x6E, 0xF0, +/*04D0*/0x12, 0x1A, 0x4D, 0x74, 0x3F, 0x25, 0x6E, 0xF5, + 0x82, 0xE4, 0x34, 0x00, 0xF5, 0x83, 0xE5, 0x33, +/*04E0*/0xF0, 0x74, 0xBF, 0x25, 0x6E, 0xF5, 0x82, 0xE4, + 0x34, 0x00, 0x12, 0x08, 0xB1, 0x40, 0xD8, 0xE4, +/*04F0*/0xF5, 0x70, 0xF5, 0x46, 0xF5, 0x47, 0xF5, 0x6E, + 0x12, 0x08, 0xFA, 0xF5, 0x83, 0xE0, 0xFE, 0x12, +/*0500*/0x08, 0xC6, 0xE0, 0x7C, 0x00, 0x24, 0x00, 0xFF, + 0xEC, 0x3E, 0xFE, 0xAD, 0x3B, 0xD3, 0xEF, 0x9D, +/*0510*/0xEE, 0x9C, 0x50, 0x04, 0x7B, 0x01, 0x80, 0x02, + 0x7B, 0x00, 0xE5, 0x70, 0x70, 0x04, 0x7A, 0x01, +/*0520*/0x80, 0x02, 0x7A, 0x00, 0xEB, 0x5A, 0x60, 0x06, + 0x85, 0x6E, 0x46, 0x75, 0x70, 0x01, 0xD3, 0xEF, +/*0530*/0x9D, 0xEE, 0x9C, 0x50, 0x04, 0x7F, 0x01, 0x80, + 0x02, 0x7F, 0x00, 0xE5, 0x70, 0xB4, 0x01, 0x04, +/*0540*/0x7E, 0x01, 0x80, 0x02, 0x7E, 0x00, 0xEF, 0x5E, + 0x60, 0x03, 0x85, 0x6E, 0x47, 0x05, 0x6E, 0xE5, +/*0550*/0x6E, 0x64, 0x7F, 0x70, 0xA3, 0xE5, 0x46, 0x60, + 0x05, 0xE5, 0x47, 0xB4, 0x7E, 0x03, 0x85, 0x46, +/*0560*/0x47, 0xE5, 0x6F, 0x70, 0x08, 0x85, 0x46, 0x76, + 0x85, 0x47, 0x77, 0x80, 0x0E, 0xC3, 0x74, 0x7F, +/*0570*/0x95, 0x46, 0xF5, 0x78, 0xC3, 0x74, 0x7F, 0x95, + 0x47, 0xF5, 0x79, 0xE5, 0x6F, 0x70, 0x37, 0xE5, +/*0580*/0x46, 0x65, 0x47, 0x70, 0x0C, 0x75, 0x73, 0x01, + 0x75, 0x74, 0x01, 0xF5, 0x3C, 0xF5, 0x3D, 0x80, +/*0590*/0x35, 0xE4, 0xF5, 0x4E, 0xC3, 0xE5, 0x47, 0x95, + 0x46, 0xF5, 0x3C, 0xC3, 0x13, 0xF5, 0x71, 0x25, +/*05A0*/0x46, 0xF5, 0x72, 0xC3, 0x94, 0x3F, 0x40, 0x05, + 0xE4, 0xF5, 0x3D, 0x80, 0x40, 0xC3, 0x74, 0x3F, +/*05B0*/0x95, 0x72, 0xF5, 0x3D, 0x80, 0x37, 0xE5, 0x46, + 0x65, 0x47, 0x70, 0x0F, 0x75, 0x73, 0x01, 0x75, +/*05C0*/0x75, 0x01, 0xF5, 0x3E, 0xF5, 0x3F, 0x75, 0x4E, + 0x01, 0x80, 0x22, 0xE4, 0xF5, 0x4E, 0xC3, 0xE5, +/*05D0*/0x47, 0x95, 0x46, 0xF5, 0x3E, 0xC3, 0x13, 0xF5, + 0x71, 0x25, 0x46, 0xF5, 0x72, 0xD3, 0x94, 0x3F, +/*05E0*/0x50, 0x05, 0xE4, 0xF5, 0x3F, 0x80, 0x06, 0xE5, + 0x72, 0x24, 0xC1, 0xF5, 0x3F, 0x05, 0x6F, 0xE5, +/*05F0*/0x6F, 0xC3, 0x94, 0x02, 0x50, 0x03, 0x02, 0x04, + 0x6E, 0xE5, 0x6D, 0x45, 0x6C, 0x70, 0x02, 0x80, +/*0600*/0x04, 0xE5, 0x74, 0x45, 0x75, 0x90, 0x07, 0x2F, + 0xF0, 0x7F, 0x01, 0xE5, 0x3E, 0x60, 0x04, 0xE5, +/*0610*/0x3C, 0x70, 0x14, 0xE4, 0xF5, 0x3C, 0xF5, 0x3D, + 0xF5, 0x3E, 0xF5, 0x3F, 0x12, 0x08, 0xD2, 0x70, +/*0620*/0x04, 0xF0, 0x02, 0x06, 0xA4, 0x80, 0x7A, 0xE5, + 0x3C, 0xC3, 0x95, 0x3E, 0x40, 0x07, 0xE5, 0x3C, +/*0630*/0x95, 0x3E, 0xFF, 0x80, 0x06, 0xC3, 0xE5, 0x3E, + 0x95, 0x3C, 0xFF, 0xE5, 0x76, 0xD3, 0x95, 0x79, +/*0640*/0x40, 0x05, 0x85, 0x76, 0x7A, 0x80, 0x03, 0x85, + 0x79, 0x7A, 0xE5, 0x77, 0xC3, 0x95, 0x78, 0x50, +/*0650*/0x05, 0x85, 0x77, 0x7B, 0x80, 0x03, 0x85, 0x78, + 0x7B, 0xE5, 0x7B, 0xD3, 0x95, 0x7A, 0x40, 0x30, +/*0660*/0xE5, 0x7B, 0x95, 0x7A, 0xF5, 0x3C, 0xF5, 0x3E, + 0xC3, 0xE5, 0x7B, 0x95, 0x7A, 0x90, 0x07, 0x19, +/*0670*/0xF0, 0xE5, 0x3C, 0xC3, 0x13, 0xF5, 0x71, 0x25, + 0x7A, 0xF5, 0x72, 0xC3, 0x94, 0x3F, 0x40, 0x05, +/*0680*/0xE4, 0xF5, 0x3D, 0x80, 0x1F, 0xC3, 0x74, 0x3F, + 0x95, 0x72, 0xF5, 0x3D, 0xF5, 0x3F, 0x80, 0x14, +/*0690*/0xE4, 0xF5, 0x3C, 0xF5, 0x3E, 0x90, 0x07, 0x19, + 0xF0, 0x12, 0x08, 0xD2, 0x70, 0x03, 0xF0, 0x80, +/*06A0*/0x03, 0x74, 0x01, 0xF0, 0x12, 0x08, 0x65, 0x75, + 0x83, 0xD0, 0xE0, 0x54, 0x0F, 0xFE, 0xAD, 0x3C, +/*06B0*/0x70, 0x02, 0x7E, 0x07, 0xBE, 0x0F, 0x02, 0x7E, + 0x80, 0xEE, 0xFB, 0xEF, 0xD3, 0x9B, 0x74, 0x80, +/*06C0*/0xF8, 0x98, 0x40, 0x1F, 0xE4, 0xF5, 0x3C, 0xF5, + 0x3E, 0x12, 0x08, 0xD2, 0x70, 0x03, 0xF0, 0x80, +/*06D0*/0x12, 0x74, 0x01, 0xF0, 0xE5, 0x08, 0xFB, 0xEB, + 0x44, 0x07, 0xF5, 0x82, 0x75, 0x83, 0xD2, 0xE0, +/*06E0*/0x44, 0x10, 0xF0, 0xE5, 0x08, 0xFB, 0xEB, 0x44, + 0x09, 0xF5, 0x82, 0x75, 0x83, 0x9E, 0xED, 0xF0, +/*06F0*/0xEB, 0x44, 0x07, 0xF5, 0x82, 0x75, 0x83, 0xCA, + 0xED, 0xF0, 0x12, 0x08, 0x65, 0x75, 0x83, 0xCC, +/*0700*/0xEF, 0xF0, 0x22, 0xE5, 0x08, 0x44, 0x07, 0xF5, + 0x82, 0x75, 0x83, 0xBC, 0xE0, 0x54, 0xF0, 0xF0, +/*0710*/0xE5, 0x08, 0x44, 0x07, 0xF5, 0x82, 0x75, 0x83, + 0xBE, 0xE0, 0x54, 0xF0, 0xF0, 0xE5, 0x08, 0x44, +/*0720*/0x07, 0xF5, 0x82, 0x75, 0x83, 0xC0, 0xE0, 0x54, + 0xF0, 0xF0, 0xE5, 0x08, 0x44, 0x07, 0xF5, 0x82, +/*0730*/0x22, 0xF0, 0x90, 0x07, 0x28, 0xE0, 0xFE, 0xA3, + 0xE0, 0xF5, 0x82, 0x8E, 0x83, 0x22, 0x85, 0x42, +/*0740*/0x42, 0x85, 0x41, 0x41, 0x85, 0x40, 0x40, 0x74, + 0xC0, 0x2F, 0xF5, 0x82, 0x74, 0x02, 0x3E, 0xF5, +/*0750*/0x83, 0xE5, 0x42, 0xF0, 0x74, 0xE0, 0x2F, 0xF5, + 0x82, 0x74, 0x02, 0x3E, 0xF5, 0x83, 0x22, 0xE5, +/*0760*/0x42, 0x29, 0xFD, 0xE4, 0x33, 0xFC, 0xE5, 0x3C, + 0xC3, 0x9D, 0xEC, 0x64, 0x80, 0xF8, 0x74, 0x80, +/*0770*/0x98, 0x22, 0xF5, 0x83, 0xE0, 0x90, 0x07, 0x22, + 0x54, 0x1F, 0xFD, 0xE0, 0xFA, 0xA3, 0xE0, 0xF5, +/*0780*/0x82, 0x8A, 0x83, 0xED, 0xF0, 0x22, 0x90, 0x07, + 0x22, 0xE0, 0xFC, 0xA3, 0xE0, 0xF5, 0x82, 0x8C, +/*0790*/0x83, 0x22, 0x90, 0x07, 0x24, 0xFF, 0xED, 0x44, + 0x07, 0xCF, 0xF0, 0xA3, 0xEF, 0xF0, 0x22, 0x85, +/*07A0*/0x38, 0x38, 0x85, 0x39, 0x39, 0x85, 0x3A, 0x3A, + 0x74, 0xC0, 0x2F, 0xF5, 0x82, 0x74, 0x02, 0x3E, +/*07B0*/0xF5, 0x83, 0x22, 0x90, 0x07, 0x26, 0xFF, 0xED, + 0x44, 0x07, 0xCF, 0xF0, 0xA3, 0xEF, 0xF0, 0x22, +/*07C0*/0xF0, 0x74, 0xA0, 0x2F, 0xF5, 0x82, 0x74, 0x02, + 0x3E, 0xF5, 0x83, 0x22, 0x74, 0xC0, 0x25, 0x11, +/*07D0*/0xF5, 0x82, 0xE4, 0x34, 0x01, 0xF5, 0x83, 0x22, + 0x74, 0x00, 0x25, 0x11, 0xF5, 0x82, 0xE4, 0x34, +/*07E0*/0x02, 0xF5, 0x83, 0x22, 0x74, 0x60, 0x25, 0x11, + 0xF5, 0x82, 0xE4, 0x34, 0x03, 0xF5, 0x83, 0x22, +/*07F0*/0x74, 0x80, 0x25, 0x11, 0xF5, 0x82, 0xE4, 0x34, + 0x03, 0xF5, 0x83, 0x22, 0x74, 0xE0, 0x25, 0x11, +/*0800*/0xF5, 0x82, 0xE4, 0x34, 0x03, 0xF5, 0x83, 0x22, + 0x74, 0x40, 0x25, 0x11, 0xF5, 0x82, 0xE4, 0x34, +/*0810*/0x06, 0xF5, 0x83, 0x22, 0x74, 0x80, 0x2F, 0xF5, + 0x82, 0x74, 0x02, 0x3E, 0xF5, 0x83, 0x22, 0xAF, +/*0820*/0x08, 0x7E, 0x00, 0xEF, 0x44, 0x07, 0xF5, 0x82, + 0x22, 0xF5, 0x83, 0xE5, 0x82, 0x44, 0x07, 0xF5, +/*0830*/0x82, 0xE5, 0x40, 0xF0, 0x22, 0x74, 0x40, 0x25, + 0x11, 0xF5, 0x82, 0xE4, 0x34, 0x02, 0xF5, 0x83, +/*0840*/0x22, 0x74, 0xC0, 0x25, 0x11, 0xF5, 0x82, 0xE4, + 0x34, 0x03, 0xF5, 0x83, 0x22, 0x74, 0x00, 0x25, +/*0850*/0x11, 0xF5, 0x82, 0xE4, 0x34, 0x06, 0xF5, 0x83, + 0x22, 0x74, 0x20, 0x25, 0x11, 0xF5, 0x82, 0xE4, +/*0860*/0x34, 0x06, 0xF5, 0x83, 0x22, 0xE5, 0x08, 0xFD, + 0xED, 0x44, 0x07, 0xF5, 0x82, 0x22, 0xE5, 0x41, +/*0870*/0xF0, 0xE5, 0x65, 0x64, 0x01, 0x45, 0x64, 0x22, + 0x7E, 0x00, 0xFB, 0x7A, 0x00, 0xFD, 0x7C, 0x00, +/*0880*/0x22, 0x74, 0x20, 0x25, 0x11, 0xF5, 0x82, 0xE4, + 0x34, 0x02, 0x22, 0x74, 0xA0, 0x25, 0x11, 0xF5, +/*0890*/0x82, 0xE4, 0x34, 0x03, 0x22, 0x85, 0x3E, 0x42, + 0x85, 0x3F, 0x41, 0x8F, 0x40, 0x22, 0x85, 0x3C, +/*08A0*/0x42, 0x85, 0x3D, 0x41, 0x8F, 0x40, 0x22, 0x75, + 0x45, 0x3F, 0x90, 0x07, 0x20, 0xE4, 0xF0, 0xA3, +/*08B0*/0x22, 0xF5, 0x83, 0xE5, 0x32, 0xF0, 0x05, 0x6E, + 0xE5, 0x6E, 0xC3, 0x94, 0x40, 0x22, 0xF0, 0xE5, +/*08C0*/0x08, 0x44, 0x06, 0xF5, 0x82, 0x22, 0x74, 0x00, + 0x25, 0x6E, 0xF5, 0x82, 0xE4, 0x34, 0x00, 0xF5, +/*08D0*/0x83, 0x22, 0xE5, 0x6D, 0x45, 0x6C, 0x90, 0x07, + 0x2F, 0x22, 0xE4, 0xF9, 0xE5, 0x3C, 0xD3, 0x95, +/*08E0*/0x3E, 0x22, 0x74, 0x80, 0x2E, 0xF5, 0x82, 0xE4, + 0x34, 0x02, 0xF5, 0x83, 0xE0, 0x22, 0x74, 0xA0, +/*08F0*/0x2E, 0xF5, 0x82, 0xE4, 0x34, 0x02, 0xF5, 0x83, + 0xE0, 0x22, 0x74, 0x80, 0x25, 0x6E, 0xF5, 0x82, +/*0900*/0xE4, 0x34, 0x00, 0x22, 0x25, 0x42, 0xFD, 0xE4, + 0x33, 0xFC, 0x22, 0x85, 0x42, 0x42, 0x85, 0x41, +/*0910*/0x41, 0x85, 0x40, 0x40, 0x22, 0xED, 0x4C, 0x60, + 0x03, 0x02, 0x09, 0xE5, 0xEF, 0x4E, 0x70, 0x37, +/*0920*/0x90, 0x07, 0x26, 0x12, 0x07, 0x89, 0xE0, 0xFD, + 0x12, 0x07, 0xCC, 0xED, 0xF0, 0x90, 0x07, 0x28, +/*0930*/0x12, 0x07, 0x89, 0xE0, 0xFD, 0x12, 0x07, 0xD8, + 0xED, 0xF0, 0x12, 0x07, 0x86, 0xE0, 0x54, 0x1F, +/*0940*/0xFD, 0x12, 0x08, 0x81, 0xF5, 0x83, 0xED, 0xF0, + 0x90, 0x07, 0x24, 0x12, 0x07, 0x89, 0xE0, 0x54, +/*0950*/0x1F, 0xFD, 0x12, 0x08, 0x35, 0xED, 0xF0, 0xEF, + 0x64, 0x04, 0x4E, 0x70, 0x37, 0x90, 0x07, 0x26, +/*0960*/0x12, 0x07, 0x89, 0xE0, 0xFD, 0x12, 0x07, 0xE4, + 0xED, 0xF0, 0x90, 0x07, 0x28, 0x12, 0x07, 0x89, +/*0970*/0xE0, 0xFD, 0x12, 0x07, 0xF0, 0xED, 0xF0, 0x12, + 0x07, 0x86, 0xE0, 0x54, 0x1F, 0xFD, 0x12, 0x08, +/*0980*/0x8B, 0xF5, 0x83, 0xED, 0xF0, 0x90, 0x07, 0x24, + 0x12, 0x07, 0x89, 0xE0, 0x54, 0x1F, 0xFD, 0x12, +/*0990*/0x08, 0x41, 0xED, 0xF0, 0xEF, 0x64, 0x01, 0x4E, + 0x70, 0x04, 0x7D, 0x01, 0x80, 0x02, 0x7D, 0x00, +/*09A0*/0xEF, 0x64, 0x02, 0x4E, 0x70, 0x04, 0x7F, 0x01, + 0x80, 0x02, 0x7F, 0x00, 0xEF, 0x4D, 0x60, 0x78, +/*09B0*/0x90, 0x07, 0x26, 0x12, 0x07, 0x35, 0xE0, 0xFF, + 0x12, 0x07, 0xFC, 0xEF, 0x12, 0x07, 0x31, 0xE0, +/*09C0*/0xFF, 0x12, 0x08, 0x08, 0xEF, 0xF0, 0x90, 0x07, + 0x22, 0x12, 0x07, 0x35, 0xE0, 0x54, 0x1F, 0xFF, +/*09D0*/0x12, 0x08, 0x4D, 0xEF, 0xF0, 0x90, 0x07, 0x24, + 0x12, 0x07, 0x35, 0xE0, 0x54, 0x1F, 0xFF, 0x12, +/*09E0*/0x08, 0x59, 0xEF, 0xF0, 0x22, 0x12, 0x07, 0xCC, + 0xE4, 0xF0, 0x12, 0x07, 0xD8, 0xE4, 0xF0, 0x12, +/*09F0*/0x08, 0x81, 0xF5, 0x83, 0xE4, 0xF0, 0x12, 0x08, + 0x35, 0x74, 0x14, 0xF0, 0x12, 0x07, 0xE4, 0xE4, +/*0A00*/0xF0, 0x12, 0x07, 0xF0, 0xE4, 0xF0, 0x12, 0x08, + 0x8B, 0xF5, 0x83, 0xE4, 0xF0, 0x12, 0x08, 0x41, +/*0A10*/0x74, 0x14, 0xF0, 0x12, 0x07, 0xFC, 0xE4, 0xF0, + 0x12, 0x08, 0x08, 0xE4, 0xF0, 0x12, 0x08, 0x4D, +/*0A20*/0xE4, 0xF0, 0x12, 0x08, 0x59, 0x74, 0x14, 0xF0, + 0x22, 0x53, 0xF9, 0xF7, 0x75, 0xFC, 0x10, 0xE4, +/*0A30*/0xF5, 0xFD, 0x75, 0xFE, 0x30, 0xF5, 0xFF, 0xE5, + 0xE7, 0x20, 0xE7, 0x03, 0x43, 0xF9, 0x08, 0xE5, +/*0A40*/0xE6, 0x20, 0xE7, 0x0B, 0x78, 0xFF, 0xE4, 0xF6, + 0xD8, 0xFD, 0x53, 0xE6, 0xFE, 0x80, 0x09, 0x78, +/*0A50*/0x08, 0xE4, 0xF6, 0xD8, 0xFD, 0x53, 0xE6, 0xFE, + 0x75, 0x81, 0x80, 0xE4, 0xF5, 0xA8, 0xD2, 0xA8, +/*0A60*/0xC2, 0xA9, 0xD2, 0xAF, 0xE5, 0xE2, 0x20, 0xE5, + 0x05, 0x20, 0xE6, 0x02, 0x80, 0x03, 0x43, 0xE1, +/*0A70*/0x02, 0xE5, 0xE2, 0x20, 0xE0, 0x0E, 0x90, 0x00, + 0x00, 0x7F, 0x00, 0x7E, 0x08, 0xE4, 0xF0, 0xA3, +/*0A80*/0xDF, 0xFC, 0xDE, 0xFA, 0x02, 0x0A, 0xDB, 0x43, + 0xFA, 0x01, 0xC0, 0xE0, 0xC0, 0xF0, 0xC0, 0x83, +/*0A90*/0xC0, 0x82, 0xC0, 0xD0, 0x12, 0x1C, 0xE7, 0xD0, + 0xD0, 0xD0, 0x82, 0xD0, 0x83, 0xD0, 0xF0, 0xD0, +/*0AA0*/0xE0, 0x53, 0xFA, 0xFE, 0x32, 0x02, 0x1B, 0x55, + 0xE4, 0x93, 0xA3, 0xF8, 0xE4, 0x93, 0xA3, 0xF6, +/*0AB0*/0x08, 0xDF, 0xF9, 0x80, 0x29, 0xE4, 0x93, 0xA3, + 0xF8, 0x54, 0x07, 0x24, 0x0C, 0xC8, 0xC3, 0x33, +/*0AC0*/0xC4, 0x54, 0x0F, 0x44, 0x20, 0xC8, 0x83, 0x40, + 0x04, 0xF4, 0x56, 0x80, 0x01, 0x46, 0xF6, 0xDF, +/*0AD0*/0xE4, 0x80, 0x0B, 0x01, 0x02, 0x04, 0x08, 0x10, + 0x20, 0x40, 0x80, 0x90, 0x00, 0x3F, 0xE4, 0x7E, +/*0AE0*/0x01, 0x93, 0x60, 0xC1, 0xA3, 0xFF, 0x54, 0x3F, + 0x30, 0xE5, 0x09, 0x54, 0x1F, 0xFE, 0xE4, 0x93, +/*0AF0*/0xA3, 0x60, 0x01, 0x0E, 0xCF, 0x54, 0xC0, 0x25, + 0xE0, 0x60, 0xAD, 0x40, 0xB8, 0x80, 0xFE, 0x8C, +/*0B00*/0x64, 0x8D, 0x65, 0x8A, 0x66, 0x8B, 0x67, 0xE4, + 0xF5, 0x69, 0xEF, 0x4E, 0x70, 0x03, 0x02, 0x1D, +/*0B10*/0x55, 0xE4, 0xF5, 0x68, 0xE5, 0x67, 0x45, 0x66, + 0x70, 0x32, 0x12, 0x07, 0x2A, 0x75, 0x83, 0x90, +/*0B20*/0xE4, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC2, 0xE4, + 0x12, 0x07, 0x29, 0x75, 0x83, 0xC4, 0xE4, 0x12, +/*0B30*/0x08, 0x70, 0x70, 0x29, 0x12, 0x07, 0x2A, 0x75, + 0x83, 0x92, 0xE4, 0x12, 0x07, 0x29, 0x75, 0x83, +/*0B40*/0xC6, 0xE4, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC8, + 0xE4, 0xF0, 0x80, 0x11, 0x90, 0x07, 0x26, 0x12, +/*0B50*/0x07, 0x35, 0xE4, 0x12, 0x08, 0x70, 0x70, 0x05, + 0x12, 0x07, 0x32, 0xE4, 0xF0, 0x12, 0x1D, 0x55, +/*0B60*/0x12, 0x1E, 0xBF, 0xE5, 0x67, 0x45, 0x66, 0x70, + 0x33, 0x12, 0x07, 0x2A, 0x75, 0x83, 0x90, 0xE5, +/*0B70*/0x41, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC2, 0xE5, + 0x41, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC4, 0x12, +/*0B80*/0x08, 0x6E, 0x70, 0x29, 0x12, 0x07, 0x2A, 0x75, + 0x83, 0x92, 0xE5, 0x40, 0x12, 0x07, 0x29, 0x75, +/*0B90*/0x83, 0xC6, 0xE5, 0x40, 0x12, 0x07, 0x29, 0x75, + 0x83, 0xC8, 0x80, 0x0E, 0x90, 0x07, 0x26, 0x12, +/*0BA0*/0x07, 0x35, 0x12, 0x08, 0x6E, 0x70, 0x06, 0x12, + 0x07, 0x32, 0xE5, 0x40, 0xF0, 0xAF, 0x69, 0x7E, +/*0BB0*/0x00, 0xAD, 0x67, 0xAC, 0x66, 0x12, 0x04, 0x44, + 0x12, 0x07, 0x2A, 0x75, 0x83, 0xCA, 0xE0, 0xD3, +/*0BC0*/0x94, 0x00, 0x50, 0x0C, 0x05, 0x68, 0xE5, 0x68, + 0xC3, 0x94, 0x05, 0x50, 0x03, 0x02, 0x0B, 0x14, +/*0BD0*/0x22, 0x8C, 0x60, 0x8D, 0x61, 0x12, 0x08, 0xDA, + 0x74, 0x20, 0x40, 0x0D, 0x2F, 0xF5, 0x82, 0x74, +/*0BE0*/0x03, 0x3E, 0xF5, 0x83, 0xE5, 0x3E, 0xF0, 0x80, + 0x0B, 0x2F, 0xF5, 0x82, 0x74, 0x03, 0x3E, 0xF5, +/*0BF0*/0x83, 0xE5, 0x3C, 0xF0, 0xE5, 0x3C, 0xD3, 0x95, + 0x3E, 0x40, 0x3C, 0xE5, 0x61, 0x45, 0x60, 0x70, +/*0C00*/0x10, 0xE9, 0x12, 0x09, 0x04, 0xE5, 0x3E, 0x12, + 0x07, 0x68, 0x40, 0x3B, 0x12, 0x08, 0x95, 0x80, +/*0C10*/0x18, 0xE5, 0x3E, 0xC3, 0x95, 0x38, 0x40, 0x1D, + 0x85, 0x3E, 0x38, 0xE5, 0x3E, 0x60, 0x05, 0x85, +/*0C20*/0x3F, 0x39, 0x80, 0x03, 0x85, 0x39, 0x39, 0x8F, + 0x3A, 0x12, 0x08, 0x14, 0xE5, 0x3E, 0x12, 0x07, +/*0C30*/0xC0, 0xE5, 0x3F, 0xF0, 0x22, 0x80, 0x43, 0xE5, + 0x61, 0x45, 0x60, 0x70, 0x19, 0x12, 0x07, 0x5F, +/*0C40*/0x40, 0x05, 0x12, 0x08, 0x9E, 0x80, 0x27, 0x12, + 0x09, 0x0B, 0x12, 0x08, 0x14, 0xE5, 0x42, 0x12, +/*0C50*/0x07, 0xC0, 0xE5, 0x41, 0xF0, 0x22, 0xE5, 0x3C, + 0xC3, 0x95, 0x38, 0x40, 0x1D, 0x85, 0x3C, 0x38, +/*0C60*/0xE5, 0x3C, 0x60, 0x05, 0x85, 0x3D, 0x39, 0x80, + 0x03, 0x85, 0x39, 0x39, 0x8F, 0x3A, 0x12, 0x08, +/*0C70*/0x14, 0xE5, 0x3C, 0x12, 0x07, 0xC0, 0xE5, 0x3D, + 0xF0, 0x22, 0x85, 0x38, 0x38, 0x85, 0x39, 0x39, +/*0C80*/0x85, 0x3A, 0x3A, 0x12, 0x08, 0x14, 0xE5, 0x38, + 0x12, 0x07, 0xC0, 0xE5, 0x39, 0xF0, 0x22, 0x7F, +/*0C90*/0x06, 0x12, 0x17, 0x31, 0x12, 0x1D, 0x23, 0x12, + 0x0E, 0x04, 0x12, 0x0E, 0x33, 0xE0, 0x44, 0x0A, +/*0CA0*/0xF0, 0x74, 0x8E, 0xFE, 0x12, 0x0E, 0x04, 0x12, + 0x0E, 0x0B, 0xEF, 0xF0, 0xE5, 0x28, 0x30, 0xE5, +/*0CB0*/0x03, 0xD3, 0x80, 0x01, 0xC3, 0x40, 0x05, 0x75, + 0x14, 0x20, 0x80, 0x03, 0x75, 0x14, 0x08, 0x12, +/*0CC0*/0x0E, 0x04, 0x75, 0x83, 0x8A, 0xE5, 0x14, 0xF0, + 0xB4, 0xFF, 0x05, 0x75, 0x12, 0x80, 0x80, 0x06, +/*0CD0*/0xE5, 0x14, 0xC3, 0x13, 0xF5, 0x12, 0xE4, 0xF5, + 0x16, 0xF5, 0x7F, 0x12, 0x19, 0x36, 0x12, 0x13, +/*0CE0*/0xA3, 0xE5, 0x0A, 0xC3, 0x94, 0x01, 0x50, 0x09, + 0x05, 0x16, 0xE5, 0x16, 0xC3, 0x94, 0x14, 0x40, +/*0CF0*/0xEA, 0xE5, 0xE4, 0x20, 0xE7, 0x28, 0x12, 0x0E, + 0x04, 0x75, 0x83, 0xD2, 0xE0, 0x54, 0x08, 0xD3, +/*0D00*/0x94, 0x00, 0x40, 0x04, 0x7F, 0x01, 0x80, 0x02, + 0x7F, 0x00, 0xE5, 0x0A, 0xC3, 0x94, 0x01, 0x40, +/*0D10*/0x04, 0x7E, 0x01, 0x80, 0x02, 0x7E, 0x00, 0xEF, + 0x5E, 0x60, 0x03, 0x12, 0x1D, 0xD7, 0xE5, 0x7F, +/*0D20*/0xC3, 0x94, 0x11, 0x40, 0x14, 0x12, 0x0E, 0x04, + 0x75, 0x83, 0xD2, 0xE0, 0x44, 0x80, 0xF0, 0xE5, +/*0D30*/0xE4, 0x20, 0xE7, 0x0F, 0x12, 0x1D, 0xD7, 0x80, + 0x0A, 0x12, 0x0E, 0x04, 0x75, 0x83, 0xD2, 0xE0, +/*0D40*/0x54, 0x7F, 0xF0, 0x12, 0x1D, 0x23, 0x22, 0x74, + 0x8A, 0x85, 0x08, 0x82, 0xF5, 0x83, 0xE5, 0x17, +/*0D50*/0xF0, 0x12, 0x0E, 0x3A, 0xE4, 0xF0, 0x90, 0x07, + 0x02, 0xE0, 0x12, 0x0E, 0x17, 0x75, 0x83, 0x90, +/*0D60*/0xEF, 0xF0, 0x74, 0x92, 0xFE, 0xE5, 0x08, 0x44, + 0x07, 0xFF, 0xF5, 0x82, 0x8E, 0x83, 0xE0, 0x54, +/*0D70*/0xC0, 0xFD, 0x90, 0x07, 0x03, 0xE0, 0x54, 0x3F, + 0x4D, 0x8F, 0x82, 0x8E, 0x83, 0xF0, 0x90, 0x07, +/*0D80*/0x04, 0xE0, 0x12, 0x0E, 0x17, 0x75, 0x83, 0x82, + 0xEF, 0xF0, 0x90, 0x07, 0x05, 0xE0, 0xFF, 0xED, +/*0D90*/0x44, 0x07, 0xF5, 0x82, 0x75, 0x83, 0xB4, 0xEF, + 0x12, 0x0E, 0x03, 0x75, 0x83, 0x80, 0xE0, 0x54, +/*0DA0*/0xBF, 0xF0, 0x30, 0x37, 0x0A, 0x12, 0x0E, 0x91, + 0x75, 0x83, 0x94, 0xE0, 0x44, 0x80, 0xF0, 0x30, +/*0DB0*/0x38, 0x0A, 0x12, 0x0E, 0x91, 0x75, 0x83, 0x92, + 0xE0, 0x44, 0x80, 0xF0, 0xE5, 0x28, 0x30, 0xE4, +/*0DC0*/0x1A, 0x20, 0x39, 0x0A, 0x12, 0x0E, 0x04, 0x75, + 0x83, 0x88, 0xE0, 0x54, 0x7F, 0xF0, 0x20, 0x3A, +/*0DD0*/0x0A, 0x12, 0x0E, 0x04, 0x75, 0x83, 0x88, 0xE0, + 0x54, 0xBF, 0xF0, 0x74, 0x8C, 0xFE, 0x12, 0x0E, +/*0DE0*/0x04, 0x8E, 0x83, 0xE0, 0x54, 0x0F, 0x12, 0x0E, + 0x03, 0x75, 0x83, 0x86, 0xE0, 0x54, 0xBF, 0xF0, +/*0DF0*/0xE5, 0x08, 0x44, 0x06, 0x12, 0x0D, 0xFD, 0x75, + 0x83, 0x8A, 0xE4, 0xF0, 0x22, 0xF5, 0x82, 0x75, +/*0E00*/0x83, 0x82, 0xE4, 0xF0, 0xE5, 0x08, 0x44, 0x07, + 0xF5, 0x82, 0x22, 0x8E, 0x83, 0xE0, 0xF5, 0x10, +/*0E10*/0x54, 0xFE, 0xF0, 0xE5, 0x10, 0x44, 0x01, 0xFF, + 0xE5, 0x08, 0xFD, 0xED, 0x44, 0x07, 0xF5, 0x82, +/*0E20*/0x22, 0xE5, 0x15, 0xC4, 0x54, 0x07, 0xFF, 0xE5, + 0x08, 0xFD, 0xED, 0x44, 0x08, 0xF5, 0x82, 0x75, +/*0E30*/0x83, 0x82, 0x22, 0x75, 0x83, 0x80, 0xE0, 0x44, + 0x40, 0xF0, 0xE5, 0x08, 0x44, 0x08, 0xF5, 0x82, +/*0E40*/0x75, 0x83, 0x8A, 0x22, 0xE5, 0x16, 0x25, 0xE0, + 0x25, 0xE0, 0x24, 0xAF, 0xF5, 0x82, 0xE4, 0x34, +/*0E50*/0x1A, 0xF5, 0x83, 0xE4, 0x93, 0xF5, 0x0D, 0x22, + 0x43, 0xE1, 0x10, 0x43, 0xE1, 0x80, 0x53, 0xE1, +/*0E60*/0xFD, 0x85, 0xE1, 0x10, 0x22, 0xE5, 0x16, 0x25, + 0xE0, 0x25, 0xE0, 0x24, 0xB2, 0xF5, 0x82, 0xE4, +/*0E70*/0x34, 0x1A, 0xF5, 0x83, 0xE4, 0x93, 0x22, 0x85, + 0x55, 0x82, 0x85, 0x54, 0x83, 0xE5, 0x15, 0xF0, +/*0E80*/0x22, 0xE5, 0xE2, 0x54, 0x20, 0xD3, 0x94, 0x00, + 0x22, 0xE5, 0xE2, 0x54, 0x40, 0xD3, 0x94, 0x00, +/*0E90*/0x22, 0xE5, 0x08, 0x44, 0x06, 0xF5, 0x82, 0x22, + 0xFD, 0xE5, 0x08, 0xFB, 0xEB, 0x44, 0x07, 0xF5, +/*0EA0*/0x82, 0x22, 0x53, 0xF9, 0xF7, 0x75, 0xFE, 0x30, + 0x22, 0xEF, 0x4E, 0x70, 0x26, 0x12, 0x07, 0xCC, +/*0EB0*/0xE0, 0xFD, 0x90, 0x07, 0x26, 0x12, 0x07, 0x7B, + 0x12, 0x07, 0xD8, 0xE0, 0xFD, 0x90, 0x07, 0x28, +/*0EC0*/0x12, 0x07, 0x7B, 0x12, 0x08, 0x81, 0x12, 0x07, + 0x72, 0x12, 0x08, 0x35, 0xE0, 0x90, 0x07, 0x24, +/*0ED0*/0x12, 0x07, 0x78, 0xEF, 0x64, 0x04, 0x4E, 0x70, + 0x29, 0x12, 0x07, 0xE4, 0xE0, 0xFD, 0x90, 0x07, +/*0EE0*/0x26, 0x12, 0x07, 0x7B, 0x12, 0x07, 0xF0, 0xE0, + 0xFD, 0x90, 0x07, 0x28, 0x12, 0x07, 0x7B, 0x12, +/*0EF0*/0x08, 0x8B, 0x12, 0x07, 0x72, 0x12, 0x08, 0x41, + 0xE0, 0x54, 0x1F, 0xFD, 0x90, 0x07, 0x24, 0x12, +/*0F00*/0x07, 0x7B, 0xEF, 0x64, 0x01, 0x4E, 0x70, 0x04, + 0x7D, 0x01, 0x80, 0x02, 0x7D, 0x00, 0xEF, 0x64, +/*0F10*/0x02, 0x4E, 0x70, 0x04, 0x7F, 0x01, 0x80, 0x02, + 0x7F, 0x00, 0xEF, 0x4D, 0x60, 0x35, 0x12, 0x07, +/*0F20*/0xFC, 0xE0, 0xFF, 0x90, 0x07, 0x26, 0x12, 0x07, + 0x89, 0xEF, 0xF0, 0x12, 0x08, 0x08, 0xE0, 0xFF, +/*0F30*/0x90, 0x07, 0x28, 0x12, 0x07, 0x89, 0xEF, 0xF0, + 0x12, 0x08, 0x4D, 0xE0, 0x54, 0x1F, 0xFF, 0x12, +/*0F40*/0x07, 0x86, 0xEF, 0xF0, 0x12, 0x08, 0x59, 0xE0, + 0x54, 0x1F, 0xFF, 0x90, 0x07, 0x24, 0x12, 0x07, +/*0F50*/0x89, 0xEF, 0xF0, 0x22, 0xE4, 0xF5, 0x53, 0x12, + 0x0E, 0x81, 0x40, 0x04, 0x7F, 0x01, 0x80, 0x02, +/*0F60*/0x7F, 0x00, 0x12, 0x0E, 0x89, 0x40, 0x04, 0x7E, + 0x01, 0x80, 0x02, 0x7E, 0x00, 0xEE, 0x4F, 0x70, +/*0F70*/0x03, 0x02, 0x0F, 0xF6, 0x85, 0xE1, 0x10, 0x43, + 0xE1, 0x02, 0x53, 0xE1, 0x0F, 0x85, 0xE1, 0x10, +/*0F80*/0xE4, 0xF5, 0x51, 0xE5, 0xE3, 0x54, 0x3F, 0xF5, + 0x52, 0x12, 0x0E, 0x89, 0x40, 0x1D, 0xAD, 0x52, +/*0F90*/0xAF, 0x51, 0x12, 0x11, 0x18, 0xEF, 0x60, 0x08, + 0x85, 0xE1, 0x10, 0x43, 0xE1, 0x40, 0x80, 0x0B, +/*0FA0*/0x53, 0xE1, 0xBF, 0x12, 0x0E, 0x58, 0x12, 0x00, + 0x06, 0x80, 0xFB, 0xE5, 0xE3, 0x54, 0x3F, 0xF5, +/*0FB0*/0x51, 0xE5, 0xE4, 0x54, 0x3F, 0xF5, 0x52, 0x12, + 0x0E, 0x81, 0x40, 0x1D, 0xAD, 0x52, 0xAF, 0x51, +/*0FC0*/0x12, 0x11, 0x18, 0xEF, 0x60, 0x08, 0x85, 0xE1, + 0x10, 0x43, 0xE1, 0x20, 0x80, 0x0B, 0x53, 0xE1, +/*0FD0*/0xDF, 0x12, 0x0E, 0x58, 0x12, 0x00, 0x06, 0x80, + 0xFB, 0x12, 0x0E, 0x81, 0x40, 0x04, 0x7F, 0x01, +/*0FE0*/0x80, 0x02, 0x7F, 0x00, 0x12, 0x0E, 0x89, 0x40, + 0x04, 0x7E, 0x01, 0x80, 0x02, 0x7E, 0x00, 0xEE, +/*0FF0*/0x4F, 0x60, 0x03, 0x12, 0x0E, 0x5B, 0x22, 0x12, + 0x0E, 0x21, 0xEF, 0xF0, 0x12, 0x10, 0x91, 0x22, +/*1000*/0x02, 0x11, 0x00, 0x02, 0x10, 0x40, 0x02, 0x10, + 0x90, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +/*1010*/0x01, 0x20, 0x01, 0x20, 0xE4, 0xF5, 0x57, 0x12, + 0x16, 0xBD, 0x12, 0x16, 0x44, 0xE4, 0x12, 0x10, +/*1020*/0x56, 0x12, 0x14, 0xB7, 0x90, 0x07, 0x26, 0x12, + 0x07, 0x35, 0xE4, 0x12, 0x07, 0x31, 0xE4, 0xF0, +/*1030*/0x12, 0x10, 0x56, 0x12, 0x14, 0xB7, 0x90, 0x07, + 0x26, 0x12, 0x07, 0x35, 0xE5, 0x41, 0x12, 0x07, +/*1040*/0x31, 0xE5, 0x40, 0xF0, 0xAF, 0x57, 0x7E, 0x00, + 0xAD, 0x56, 0x7C, 0x00, 0x12, 0x04, 0x44, 0xAF, +/*1050*/0x56, 0x7E, 0x00, 0x02, 0x11, 0xEE, 0xFF, 0x90, + 0x07, 0x20, 0xA3, 0xE0, 0xFD, 0xE4, 0xF5, 0x56, +/*1060*/0xF5, 0x40, 0xFE, 0xFC, 0xAB, 0x56, 0xFA, 0x12, + 0x11, 0x51, 0x7F, 0x0F, 0x7D, 0x18, 0xE4, 0xF5, +/*1070*/0x56, 0xF5, 0x40, 0xFE, 0xFC, 0xAB, 0x56, 0xFA, + 0x12, 0x15, 0x41, 0xAF, 0x56, 0x7E, 0x00, 0x12, +/*1080*/0x1A, 0xFF, 0xE4, 0xFF, 0xF5, 0x56, 0x7D, 0x1F, + 0xF5, 0x40, 0xFE, 0xFC, 0xAB, 0x56, 0xFA, 0x22, +/*1090*/0x22, 0xE4, 0xF5, 0x55, 0xE5, 0x08, 0xFD, 0x74, + 0xA0, 0xF5, 0x56, 0xED, 0x44, 0x07, 0xF5, 0x57, +/*10A0*/0xE5, 0x28, 0x30, 0xE5, 0x03, 0xD3, 0x80, 0x01, + 0xC3, 0x40, 0x05, 0x7F, 0x28, 0xEF, 0x80, 0x04, +/*10B0*/0x7F, 0x14, 0xEF, 0xC3, 0x13, 0xF5, 0x54, 0xE4, + 0xF9, 0x12, 0x0E, 0x18, 0x75, 0x83, 0x8E, 0xE0, +/*10C0*/0xF5, 0x10, 0xCE, 0xEF, 0xCE, 0xEE, 0xD3, 0x94, + 0x00, 0x40, 0x26, 0xE5, 0x10, 0x54, 0xFE, 0x12, +/*10D0*/0x0E, 0x98, 0x75, 0x83, 0x8E, 0xED, 0xF0, 0xE5, + 0x10, 0x44, 0x01, 0xFD, 0xEB, 0x44, 0x07, 0xF5, +/*10E0*/0x82, 0xED, 0xF0, 0x85, 0x57, 0x82, 0x85, 0x56, + 0x83, 0xE0, 0x30, 0xE3, 0x01, 0x09, 0x1E, 0x80, +/*10F0*/0xD4, 0xC2, 0x34, 0xE9, 0xC3, 0x95, 0x54, 0x40, + 0x02, 0xD2, 0x34, 0x22, 0x02, 0x00, 0x06, 0x22, +/*1100*/0x30, 0x30, 0x11, 0x90, 0x10, 0x00, 0xE4, 0x93, + 0xF5, 0x10, 0x90, 0x10, 0x10, 0xE4, 0x93, 0xF5, +/*1110*/0x10, 0x12, 0x10, 0x90, 0x12, 0x11, 0x50, 0x22, + 0xE4, 0xFC, 0xC3, 0xED, 0x9F, 0xFA, 0xEF, 0xF5, +/*1120*/0x83, 0x75, 0x82, 0x00, 0x79, 0xFF, 0xE4, 0x93, + 0xCC, 0x6C, 0xCC, 0xA3, 0xD9, 0xF8, 0xDA, 0xF6, +/*1130*/0xE5, 0xE2, 0x30, 0xE4, 0x02, 0x8C, 0xE5, 0xED, + 0x24, 0xFF, 0xFF, 0xEF, 0x75, 0x82, 0xFF, 0xF5, +/*1140*/0x83, 0xE4, 0x93, 0x6C, 0x70, 0x03, 0x7F, 0x01, + 0x22, 0x7F, 0x00, 0x22, 0x22, 0x11, 0x00, 0x00, +/*1150*/0x22, 0x8E, 0x58, 0x8F, 0x59, 0x8C, 0x5A, 0x8D, + 0x5B, 0x8A, 0x5C, 0x8B, 0x5D, 0x75, 0x5E, 0x01, +/*1160*/0xE4, 0xF5, 0x5F, 0xF5, 0x60, 0xF5, 0x62, 0x12, + 0x07, 0x2A, 0x75, 0x83, 0xD0, 0xE0, 0xFF, 0xC4, +/*1170*/0x54, 0x0F, 0xF5, 0x61, 0x12, 0x1E, 0xA5, 0x85, + 0x59, 0x5E, 0xD3, 0xE5, 0x5E, 0x95, 0x5B, 0xE5, +/*1180*/0x5A, 0x12, 0x07, 0x6B, 0x50, 0x4B, 0x12, 0x07, + 0x03, 0x75, 0x83, 0xBC, 0xE0, 0x45, 0x5E, 0x12, +/*1190*/0x07, 0x29, 0x75, 0x83, 0xBE, 0xE0, 0x45, 0x5E, + 0x12, 0x07, 0x29, 0x75, 0x83, 0xC0, 0xE0, 0x45, +/*11A0*/0x5E, 0xF0, 0xAF, 0x5F, 0xE5, 0x60, 0x12, 0x08, + 0x78, 0x12, 0x0A, 0xFF, 0xAF, 0x62, 0x7E, 0x00, +/*11B0*/0xAD, 0x5D, 0xAC, 0x5C, 0x12, 0x04, 0x44, 0xE5, + 0x61, 0xAF, 0x5E, 0x7E, 0x00, 0xB4, 0x03, 0x05, +/*11C0*/0x12, 0x1E, 0x21, 0x80, 0x07, 0xAD, 0x5D, 0xAC, + 0x5C, 0x12, 0x13, 0x17, 0x05, 0x5E, 0x02, 0x11, +/*11D0*/0x7A, 0x12, 0x07, 0x03, 0x75, 0x83, 0xBC, 0xE0, + 0x45, 0x40, 0x12, 0x07, 0x29, 0x75, 0x83, 0xBE, +/*11E0*/0xE0, 0x45, 0x40, 0x12, 0x07, 0x29, 0x75, 0x83, + 0xC0, 0xE0, 0x45, 0x40, 0xF0, 0x22, 0x8E, 0x58, +/*11F0*/0x8F, 0x59, 0x75, 0x5A, 0x01, 0x79, 0x01, 0x75, + 0x5B, 0x01, 0xE4, 0xFB, 0x12, 0x07, 0x2A, 0x75, +/*1200*/0x83, 0xAE, 0xE0, 0x54, 0x1A, 0xFF, 0x12, 0x08, + 0x65, 0xE0, 0xC4, 0x13, 0x54, 0x07, 0xFE, 0xEF, +/*1210*/0x70, 0x0C, 0xEE, 0x65, 0x35, 0x70, 0x07, 0x90, + 0x07, 0x2F, 0xE0, 0xB4, 0x01, 0x0D, 0xAF, 0x35, +/*1220*/0x7E, 0x00, 0x12, 0x0E, 0xA9, 0xCF, 0xEB, 0xCF, + 0x02, 0x1E, 0x60, 0xE5, 0x59, 0x64, 0x02, 0x45, +/*1230*/0x58, 0x70, 0x04, 0x7F, 0x01, 0x80, 0x02, 0x7F, + 0x00, 0xE5, 0x59, 0x45, 0x58, 0x70, 0x04, 0x7E, +/*1240*/0x01, 0x80, 0x02, 0x7E, 0x00, 0xEE, 0x4F, 0x60, + 0x23, 0x85, 0x41, 0x49, 0x85, 0x40, 0x4B, 0xE5, +/*1250*/0x59, 0x45, 0x58, 0x70, 0x2C, 0xAF, 0x5A, 0xFE, + 0xCD, 0xE9, 0xCD, 0xFC, 0xAB, 0x59, 0xAA, 0x58, +/*1260*/0x12, 0x0A, 0xFF, 0xAF, 0x5B, 0x7E, 0x00, 0x12, + 0x1E, 0x60, 0x80, 0x15, 0xAF, 0x5B, 0x7E, 0x00, +/*1270*/0x12, 0x1E, 0x60, 0x90, 0x07, 0x26, 0x12, 0x07, + 0x35, 0xE5, 0x49, 0x12, 0x07, 0x31, 0xE5, 0x4B, +/*1280*/0xF0, 0xE4, 0xFD, 0xAF, 0x35, 0xFE, 0xFC, 0x12, + 0x09, 0x15, 0x22, 0x8C, 0x64, 0x8D, 0x65, 0x12, +/*1290*/0x08, 0xDA, 0x40, 0x3C, 0xE5, 0x65, 0x45, 0x64, + 0x70, 0x10, 0x12, 0x09, 0x04, 0xC3, 0xE5, 0x3E, +/*12A0*/0x12, 0x07, 0x69, 0x40, 0x3B, 0x12, 0x08, 0x95, + 0x80, 0x18, 0xE5, 0x3E, 0xC3, 0x95, 0x38, 0x40, +/*12B0*/0x1D, 0x85, 0x3E, 0x38, 0xE5, 0x3E, 0x60, 0x05, + 0x85, 0x3F, 0x39, 0x80, 0x03, 0x85, 0x39, 0x39, +/*12C0*/0x8F, 0x3A, 0x12, 0x07, 0xA8, 0xE5, 0x3E, 0x12, + 0x07, 0x53, 0xE5, 0x3F, 0xF0, 0x22, 0x80, 0x3B, +/*12D0*/0xE5, 0x65, 0x45, 0x64, 0x70, 0x11, 0x12, 0x07, + 0x5F, 0x40, 0x05, 0x12, 0x08, 0x9E, 0x80, 0x1F, +/*12E0*/0x12, 0x07, 0x3E, 0xE5, 0x41, 0xF0, 0x22, 0xE5, + 0x3C, 0xC3, 0x95, 0x38, 0x40, 0x1D, 0x85, 0x3C, +/*12F0*/0x38, 0xE5, 0x3C, 0x60, 0x05, 0x85, 0x3D, 0x39, + 0x80, 0x03, 0x85, 0x39, 0x39, 0x8F, 0x3A, 0x12, +/*1300*/0x07, 0xA8, 0xE5, 0x3C, 0x12, 0x07, 0x53, 0xE5, + 0x3D, 0xF0, 0x22, 0x12, 0x07, 0x9F, 0xE5, 0x38, +/*1310*/0x12, 0x07, 0x53, 0xE5, 0x39, 0xF0, 0x22, 0x8C, + 0x63, 0x8D, 0x64, 0x12, 0x08, 0xDA, 0x40, 0x3C, +/*1320*/0xE5, 0x64, 0x45, 0x63, 0x70, 0x10, 0x12, 0x09, + 0x04, 0xC3, 0xE5, 0x3E, 0x12, 0x07, 0x69, 0x40, +/*1330*/0x3B, 0x12, 0x08, 0x95, 0x80, 0x18, 0xE5, 0x3E, + 0xC3, 0x95, 0x38, 0x40, 0x1D, 0x85, 0x3E, 0x38, +/*1340*/0xE5, 0x3E, 0x60, 0x05, 0x85, 0x3F, 0x39, 0x80, + 0x03, 0x85, 0x39, 0x39, 0x8F, 0x3A, 0x12, 0x07, +/*1350*/0xA8, 0xE5, 0x3E, 0x12, 0x07, 0x53, 0xE5, 0x3F, + 0xF0, 0x22, 0x80, 0x3B, 0xE5, 0x64, 0x45, 0x63, +/*1360*/0x70, 0x11, 0x12, 0x07, 0x5F, 0x40, 0x05, 0x12, + 0x08, 0x9E, 0x80, 0x1F, 0x12, 0x07, 0x3E, 0xE5, +/*1370*/0x41, 0xF0, 0x22, 0xE5, 0x3C, 0xC3, 0x95, 0x38, + 0x40, 0x1D, 0x85, 0x3C, 0x38, 0xE5, 0x3C, 0x60, +/*1380*/0x05, 0x85, 0x3D, 0x39, 0x80, 0x03, 0x85, 0x39, + 0x39, 0x8F, 0x3A, 0x12, 0x07, 0xA8, 0xE5, 0x3C, +/*1390*/0x12, 0x07, 0x53, 0xE5, 0x3D, 0xF0, 0x22, 0x12, + 0x07, 0x9F, 0xE5, 0x38, 0x12, 0x07, 0x53, 0xE5, +/*13A0*/0x39, 0xF0, 0x22, 0xE5, 0x0D, 0xFE, 0xE5, 0x08, + 0x8E, 0x54, 0x44, 0x05, 0xF5, 0x55, 0x75, 0x15, +/*13B0*/0x0F, 0xF5, 0x82, 0x12, 0x0E, 0x7A, 0x12, 0x17, + 0xA3, 0x20, 0x31, 0x05, 0x75, 0x15, 0x03, 0x80, +/*13C0*/0x03, 0x75, 0x15, 0x0B, 0xE5, 0x0A, 0xC3, 0x94, + 0x01, 0x50, 0x38, 0x12, 0x14, 0x20, 0x20, 0x31, +/*13D0*/0x06, 0x05, 0x15, 0x05, 0x15, 0x80, 0x04, 0x15, + 0x15, 0x15, 0x15, 0xE5, 0x0A, 0xC3, 0x94, 0x01, +/*13E0*/0x50, 0x21, 0x12, 0x14, 0x20, 0x20, 0x31, 0x04, + 0x05, 0x15, 0x80, 0x02, 0x15, 0x15, 0xE5, 0x0A, +/*13F0*/0xC3, 0x94, 0x01, 0x50, 0x0E, 0x12, 0x0E, 0x77, + 0x12, 0x17, 0xA3, 0x20, 0x31, 0x05, 0x05, 0x15, +/*1400*/0x12, 0x0E, 0x77, 0xE5, 0x15, 0xB4, 0x08, 0x04, + 0x7F, 0x01, 0x80, 0x02, 0x7F, 0x00, 0xE5, 0x15, +/*1410*/0xB4, 0x07, 0x04, 0x7E, 0x01, 0x80, 0x02, 0x7E, + 0x00, 0xEE, 0x4F, 0x60, 0x02, 0x05, 0x7F, 0x22, +/*1420*/0x85, 0x55, 0x82, 0x85, 0x54, 0x83, 0xE5, 0x15, + 0xF0, 0x12, 0x17, 0xA3, 0x22, 0x12, 0x07, 0x2A, +/*1430*/0x75, 0x83, 0xAE, 0x74, 0xFF, 0x12, 0x07, 0x29, + 0xE0, 0x54, 0x1A, 0xF5, 0x34, 0xE0, 0xC4, 0x13, +/*1440*/0x54, 0x07, 0xF5, 0x35, 0x24, 0xFE, 0x60, 0x24, + 0x24, 0xFE, 0x60, 0x3C, 0x24, 0x04, 0x70, 0x63, +/*1450*/0x75, 0x31, 0x2D, 0xE5, 0x08, 0xFD, 0x74, 0xB6, + 0x12, 0x07, 0x92, 0x74, 0xBC, 0x90, 0x07, 0x22, +/*1460*/0x12, 0x07, 0x95, 0x74, 0x90, 0x12, 0x07, 0xB3, + 0x74, 0x92, 0x80, 0x3C, 0x75, 0x31, 0x3A, 0xE5, +/*1470*/0x08, 0xFD, 0x74, 0xBA, 0x12, 0x07, 0x92, 0x74, + 0xC0, 0x90, 0x07, 0x22, 0x12, 0x07, 0xB6, 0x74, +/*1480*/0xC4, 0x12, 0x07, 0xB3, 0x74, 0xC8, 0x80, 0x20, + 0x75, 0x31, 0x35, 0xE5, 0x08, 0xFD, 0x74, 0xB8, +/*1490*/0x12, 0x07, 0x92, 0x74, 0xBE, 0xFF, 0xED, 0x44, + 0x07, 0x90, 0x07, 0x22, 0xCF, 0xF0, 0xA3, 0xEF, +/*14A0*/0xF0, 0x74, 0xC2, 0x12, 0x07, 0xB3, 0x74, 0xC6, + 0xFF, 0xED, 0x44, 0x07, 0xA3, 0xCF, 0xF0, 0xA3, +/*14B0*/0xEF, 0xF0, 0x22, 0x75, 0x34, 0x01, 0x22, 0x8E, + 0x58, 0x8F, 0x59, 0x8C, 0x5A, 0x8D, 0x5B, 0x8A, +/*14C0*/0x5C, 0x8B, 0x5D, 0x75, 0x5E, 0x01, 0xE4, 0xF5, + 0x5F, 0x12, 0x1E, 0xA5, 0x85, 0x59, 0x5E, 0xD3, +/*14D0*/0xE5, 0x5E, 0x95, 0x5B, 0xE5, 0x5A, 0x12, 0x07, + 0x6B, 0x50, 0x57, 0xE5, 0x5D, 0x45, 0x5C, 0x70, +/*14E0*/0x30, 0x12, 0x07, 0x2A, 0x75, 0x83, 0x92, 0xE5, + 0x5E, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC6, 0xE5, +/*14F0*/0x5E, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC8, 0xE5, + 0x5E, 0x12, 0x07, 0x29, 0x75, 0x83, 0x90, 0xE5, +/*1500*/0x5E, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC2, 0xE5, + 0x5E, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC4, 0x80, +/*1510*/0x03, 0x12, 0x07, 0x32, 0xE5, 0x5E, 0xF0, 0xAF, + 0x5F, 0x7E, 0x00, 0xAD, 0x5D, 0xAC, 0x5C, 0x12, +/*1520*/0x04, 0x44, 0xAF, 0x5E, 0x7E, 0x00, 0xAD, 0x5D, + 0xAC, 0x5C, 0x12, 0x0B, 0xD1, 0x05, 0x5E, 0x02, +/*1530*/0x14, 0xCF, 0xAB, 0x5D, 0xAA, 0x5C, 0xAD, 0x5B, + 0xAC, 0x5A, 0xAF, 0x59, 0xAE, 0x58, 0x02, 0x1B, +/*1540*/0xFB, 0x8C, 0x5C, 0x8D, 0x5D, 0x8A, 0x5E, 0x8B, + 0x5F, 0x75, 0x60, 0x01, 0xE4, 0xF5, 0x61, 0xF5, +/*1550*/0x62, 0xF5, 0x63, 0x12, 0x1E, 0xA5, 0x8F, 0x60, + 0xD3, 0xE5, 0x60, 0x95, 0x5D, 0xE5, 0x5C, 0x12, +/*1560*/0x07, 0x6B, 0x50, 0x61, 0xE5, 0x5F, 0x45, 0x5E, + 0x70, 0x27, 0x12, 0x07, 0x2A, 0x75, 0x83, 0xB6, +/*1570*/0xE5, 0x60, 0x12, 0x07, 0x29, 0x75, 0x83, 0xB8, + 0xE5, 0x60, 0x12, 0x07, 0x29, 0x75, 0x83, 0xBA, +/*1580*/0xE5, 0x60, 0xF0, 0xAF, 0x61, 0x7E, 0x00, 0xE5, + 0x62, 0x12, 0x08, 0x7A, 0x12, 0x0A, 0xFF, 0x80, +/*1590*/0x19, 0x90, 0x07, 0x24, 0x12, 0x07, 0x35, 0xE5, + 0x60, 0x12, 0x07, 0x29, 0x75, 0x83, 0x8E, 0xE4, +/*15A0*/0x12, 0x07, 0x29, 0x74, 0x01, 0x12, 0x07, 0x29, + 0xE4, 0xF0, 0xAF, 0x63, 0x7E, 0x00, 0xAD, 0x5F, +/*15B0*/0xAC, 0x5E, 0x12, 0x04, 0x44, 0xAF, 0x60, 0x7E, + 0x00, 0xAD, 0x5F, 0xAC, 0x5E, 0x12, 0x12, 0x8B, +/*15C0*/0x05, 0x60, 0x02, 0x15, 0x58, 0x22, 0x90, 0x11, + 0x4D, 0xE4, 0x93, 0x90, 0x07, 0x2E, 0xF0, 0x12, +/*15D0*/0x08, 0x1F, 0x75, 0x83, 0xAE, 0xE0, 0x54, 0x1A, + 0xF5, 0x34, 0x70, 0x67, 0xEF, 0x44, 0x07, 0xF5, +/*15E0*/0x82, 0x75, 0x83, 0xCE, 0xE0, 0xFF, 0x13, 0x13, + 0x13, 0x54, 0x07, 0xF5, 0x36, 0x54, 0x0F, 0xD3, +/*15F0*/0x94, 0x00, 0x40, 0x06, 0x12, 0x14, 0x2D, 0x12, + 0x1B, 0xA9, 0xE5, 0x36, 0x54, 0x0F, 0x24, 0xFE, +/*1600*/0x60, 0x0C, 0x14, 0x60, 0x0C, 0x14, 0x60, 0x19, + 0x24, 0x03, 0x70, 0x37, 0x80, 0x10, 0x02, 0x1E, +/*1610*/0x91, 0x12, 0x1E, 0x91, 0x12, 0x07, 0x2A, 0x75, + 0x83, 0xCE, 0xE0, 0x54, 0xEF, 0xF0, 0x02, 0x1D, +/*1620*/0xAE, 0x12, 0x10, 0x14, 0xE4, 0xF5, 0x55, 0x12, + 0x1D, 0x85, 0x05, 0x55, 0xE5, 0x55, 0xC3, 0x94, +/*1630*/0x05, 0x40, 0xF4, 0x12, 0x07, 0x2A, 0x75, 0x83, + 0xCE, 0xE0, 0x54, 0xC7, 0x12, 0x07, 0x29, 0xE0, +/*1640*/0x44, 0x08, 0xF0, 0x22, 0xE4, 0xF5, 0x58, 0xF5, + 0x59, 0xAF, 0x08, 0xEF, 0x44, 0x07, 0xF5, 0x82, +/*1650*/0x75, 0x83, 0xD0, 0xE0, 0xFD, 0xC4, 0x54, 0x0F, + 0xF5, 0x5A, 0xEF, 0x44, 0x07, 0xF5, 0x82, 0x75, +/*1660*/0x83, 0x80, 0x74, 0x01, 0xF0, 0x12, 0x08, 0x21, + 0x75, 0x83, 0x82, 0xE5, 0x45, 0xF0, 0xEF, 0x44, +/*1670*/0x07, 0xF5, 0x82, 0x75, 0x83, 0x8A, 0x74, 0xFF, + 0xF0, 0x12, 0x1A, 0x4D, 0x12, 0x07, 0x2A, 0x75, +/*1680*/0x83, 0xBC, 0xE0, 0x54, 0xEF, 0x12, 0x07, 0x29, + 0x75, 0x83, 0xBE, 0xE0, 0x54, 0xEF, 0x12, 0x07, +/*1690*/0x29, 0x75, 0x83, 0xC0, 0xE0, 0x54, 0xEF, 0x12, + 0x07, 0x29, 0x75, 0x83, 0xBC, 0xE0, 0x44, 0x10, +/*16A0*/0x12, 0x07, 0x29, 0x75, 0x83, 0xBE, 0xE0, 0x44, + 0x10, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC0, 0xE0, +/*16B0*/0x44, 0x10, 0xF0, 0xAF, 0x58, 0xE5, 0x59, 0x12, + 0x08, 0x78, 0x02, 0x0A, 0xFF, 0xE4, 0xF5, 0x58, +/*16C0*/0x7D, 0x01, 0xF5, 0x59, 0xAF, 0x35, 0xFE, 0xFC, + 0x12, 0x09, 0x15, 0x12, 0x07, 0x2A, 0x75, 0x83, +/*16D0*/0xB6, 0x74, 0x10, 0x12, 0x07, 0x29, 0x75, 0x83, + 0xB8, 0x74, 0x10, 0x12, 0x07, 0x29, 0x75, 0x83, +/*16E0*/0xBA, 0x74, 0x10, 0x12, 0x07, 0x29, 0x75, 0x83, + 0xBC, 0x74, 0x10, 0x12, 0x07, 0x29, 0x75, 0x83, +/*16F0*/0xBE, 0x74, 0x10, 0x12, 0x07, 0x29, 0x75, 0x83, + 0xC0, 0x74, 0x10, 0x12, 0x07, 0x29, 0x75, 0x83, +/*1700*/0x90, 0xE4, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC2, + 0xE4, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC4, 0xE4, +/*1710*/0x12, 0x07, 0x29, 0x75, 0x83, 0x92, 0xE4, 0x12, + 0x07, 0x29, 0x75, 0x83, 0xC6, 0xE4, 0x12, 0x07, +/*1720*/0x29, 0x75, 0x83, 0xC8, 0xE4, 0xF0, 0xAF, 0x58, + 0xFE, 0xE5, 0x59, 0x12, 0x08, 0x7A, 0x02, 0x0A, +/*1730*/0xFF, 0xE5, 0xE2, 0x30, 0xE4, 0x6C, 0xE5, 0xE7, + 0x54, 0xC0, 0x64, 0x40, 0x70, 0x64, 0xE5, 0x09, +/*1740*/0xC4, 0x54, 0x30, 0xFE, 0xE5, 0x08, 0x25, 0xE0, + 0x25, 0xE0, 0x54, 0xC0, 0x4E, 0xFE, 0xEF, 0x54, +/*1750*/0x3F, 0x4E, 0xFD, 0xE5, 0x2B, 0xAE, 0x2A, 0x78, + 0x02, 0xC3, 0x33, 0xCE, 0x33, 0xCE, 0xD8, 0xF9, +/*1760*/0xF5, 0x82, 0x8E, 0x83, 0xED, 0xF0, 0xE5, 0x2B, + 0xAE, 0x2A, 0x78, 0x02, 0xC3, 0x33, 0xCE, 0x33, +/*1770*/0xCE, 0xD8, 0xF9, 0xFF, 0xF5, 0x82, 0x8E, 0x83, + 0xA3, 0xE5, 0xFE, 0xF0, 0x8F, 0x82, 0x8E, 0x83, +/*1780*/0xA3, 0xA3, 0xE5, 0xFD, 0xF0, 0x8F, 0x82, 0x8E, + 0x83, 0xA3, 0xA3, 0xA3, 0xE5, 0xFC, 0xF0, 0xC3, +/*1790*/0xE5, 0x2B, 0x94, 0xFA, 0xE5, 0x2A, 0x94, 0x00, + 0x50, 0x08, 0x05, 0x2B, 0xE5, 0x2B, 0x70, 0x02, +/*17A0*/0x05, 0x2A, 0x22, 0xE4, 0xFF, 0xE4, 0xF5, 0x58, + 0xF5, 0x56, 0xF5, 0x57, 0x74, 0x82, 0xFC, 0x12, +/*17B0*/0x0E, 0x04, 0x8C, 0x83, 0xE0, 0xF5, 0x10, 0x54, + 0x7F, 0xF0, 0xE5, 0x10, 0x44, 0x80, 0x12, 0x0E, +/*17C0*/0x98, 0xED, 0xF0, 0x7E, 0x0A, 0x12, 0x0E, 0x04, + 0x75, 0x83, 0xA0, 0xE0, 0x20, 0xE0, 0x26, 0xDE, +/*17D0*/0xF4, 0x05, 0x57, 0xE5, 0x57, 0x70, 0x02, 0x05, + 0x56, 0xE5, 0x14, 0x24, 0x01, 0xFD, 0xE4, 0x33, +/*17E0*/0xFC, 0xD3, 0xE5, 0x57, 0x9D, 0xE5, 0x56, 0x9C, + 0x40, 0xD9, 0xE5, 0x0A, 0x94, 0x20, 0x50, 0x02, +/*17F0*/0x05, 0x0A, 0x43, 0xE1, 0x08, 0xC2, 0x31, 0x12, + 0x0E, 0x04, 0x75, 0x83, 0xA6, 0xE0, 0x55, 0x12, +/*1800*/0x65, 0x12, 0x70, 0x03, 0xD2, 0x31, 0x22, 0xC2, + 0x31, 0x22, 0x90, 0x07, 0x26, 0xE0, 0xFA, 0xA3, +/*1810*/0xE0, 0xF5, 0x82, 0x8A, 0x83, 0xE0, 0xF5, 0x41, + 0xE5, 0x39, 0xC3, 0x95, 0x41, 0x40, 0x26, 0xE5, +/*1820*/0x39, 0x95, 0x41, 0xC3, 0x9F, 0xEE, 0x12, 0x07, + 0x6B, 0x40, 0x04, 0x7C, 0x01, 0x80, 0x02, 0x7C, +/*1830*/0x00, 0xE5, 0x41, 0x64, 0x3F, 0x60, 0x04, 0x7B, + 0x01, 0x80, 0x02, 0x7B, 0x00, 0xEC, 0x5B, 0x60, +/*1840*/0x29, 0x05, 0x41, 0x80, 0x28, 0xC3, 0xE5, 0x41, + 0x95, 0x39, 0xC3, 0x9F, 0xEE, 0x12, 0x07, 0x6B, +/*1850*/0x40, 0x04, 0x7F, 0x01, 0x80, 0x02, 0x7F, 0x00, + 0xE5, 0x41, 0x60, 0x04, 0x7E, 0x01, 0x80, 0x02, +/*1860*/0x7E, 0x00, 0xEF, 0x5E, 0x60, 0x04, 0x15, 0x41, + 0x80, 0x03, 0x85, 0x39, 0x41, 0x85, 0x3A, 0x40, +/*1870*/0x22, 0xE5, 0xE2, 0x30, 0xE4, 0x60, 0xE5, 0xE1, + 0x30, 0xE2, 0x5B, 0xE5, 0x09, 0x70, 0x04, 0x7F, +/*1880*/0x01, 0x80, 0x02, 0x7F, 0x00, 0xE5, 0x08, 0x70, + 0x04, 0x7E, 0x01, 0x80, 0x02, 0x7E, 0x00, 0xEE, +/*1890*/0x5F, 0x60, 0x43, 0x53, 0xF9, 0xF8, 0xE5, 0xE2, + 0x30, 0xE4, 0x3B, 0xE5, 0xE1, 0x30, 0xE2, 0x2E, +/*18A0*/0x43, 0xFA, 0x02, 0x53, 0xFA, 0xFB, 0xE4, 0xF5, + 0x10, 0x90, 0x94, 0x70, 0xE5, 0x10, 0xF0, 0xE5, +/*18B0*/0xE1, 0x30, 0xE2, 0xE7, 0x90, 0x94, 0x70, 0xE0, + 0x65, 0x10, 0x60, 0x03, 0x43, 0xFA, 0x04, 0x05, +/*18C0*/0x10, 0x90, 0x94, 0x70, 0xE5, 0x10, 0xF0, 0x70, + 0xE6, 0x12, 0x00, 0x06, 0x80, 0xE1, 0x53, 0xFA, +/*18D0*/0xFD, 0x53, 0xFA, 0xFB, 0x80, 0xC0, 0x22, 0x8F, + 0x54, 0x12, 0x00, 0x06, 0xE5, 0xE1, 0x30, 0xE0, +/*18E0*/0x04, 0x7F, 0x01, 0x80, 0x02, 0x7F, 0x00, 0xE5, + 0x7E, 0xD3, 0x94, 0x05, 0x40, 0x04, 0x7E, 0x01, +/*18F0*/0x80, 0x02, 0x7E, 0x00, 0xEE, 0x4F, 0x60, 0x3D, + 0x85, 0x54, 0x11, 0xE5, 0xE2, 0x20, 0xE1, 0x32, +/*1900*/0x74, 0xCE, 0x12, 0x1A, 0x05, 0x30, 0xE7, 0x04, + 0x7D, 0x01, 0x80, 0x02, 0x7D, 0x00, 0x8F, 0x82, +/*1910*/0x8E, 0x83, 0xE0, 0x30, 0xE6, 0x04, 0x7F, 0x01, + 0x80, 0x02, 0x7F, 0x00, 0xEF, 0x5D, 0x70, 0x15, +/*1920*/0x12, 0x15, 0xC6, 0x74, 0xCE, 0x12, 0x1A, 0x05, + 0x30, 0xE6, 0x07, 0xE0, 0x44, 0x80, 0xF0, 0x43, +/*1930*/0xF9, 0x80, 0x12, 0x18, 0x71, 0x22, 0x12, 0x0E, + 0x44, 0xE5, 0x16, 0x25, 0xE0, 0x25, 0xE0, 0x24, +/*1940*/0xB0, 0xF5, 0x82, 0xE4, 0x34, 0x1A, 0xF5, 0x83, + 0xE4, 0x93, 0xF5, 0x0F, 0xE5, 0x16, 0x25, 0xE0, +/*1950*/0x25, 0xE0, 0x24, 0xB1, 0xF5, 0x82, 0xE4, 0x34, + 0x1A, 0xF5, 0x83, 0xE4, 0x93, 0xF5, 0x0E, 0x12, +/*1960*/0x0E, 0x65, 0xF5, 0x10, 0xE5, 0x0F, 0x54, 0xF0, + 0x12, 0x0E, 0x17, 0x75, 0x83, 0x8C, 0xEF, 0xF0, +/*1970*/0xE5, 0x0F, 0x30, 0xE0, 0x0C, 0x12, 0x0E, 0x04, + 0x75, 0x83, 0x86, 0xE0, 0x44, 0x40, 0xF0, 0x80, +/*1980*/0x0A, 0x12, 0x0E, 0x04, 0x75, 0x83, 0x86, 0xE0, + 0x54, 0xBF, 0xF0, 0x12, 0x0E, 0x91, 0x75, 0x83, +/*1990*/0x82, 0xE5, 0x0E, 0xF0, 0x22, 0x7F, 0x05, 0x12, + 0x17, 0x31, 0x12, 0x0E, 0x04, 0x12, 0x0E, 0x33, +/*19A0*/0x74, 0x02, 0xF0, 0x74, 0x8E, 0xFE, 0x12, 0x0E, + 0x04, 0x12, 0x0E, 0x0B, 0xEF, 0xF0, 0x75, 0x15, +/*19B0*/0x70, 0x12, 0x0F, 0xF7, 0x20, 0x34, 0x05, 0x75, + 0x15, 0x10, 0x80, 0x03, 0x75, 0x15, 0x50, 0x12, +/*19C0*/0x0F, 0xF7, 0x20, 0x34, 0x04, 0x74, 0x10, 0x80, + 0x02, 0x74, 0xF0, 0x25, 0x15, 0xF5, 0x15, 0x12, +/*19D0*/0x0E, 0x21, 0xEF, 0xF0, 0x12, 0x10, 0x91, 0x20, + 0x34, 0x17, 0xE5, 0x15, 0x64, 0x30, 0x60, 0x0C, +/*19E0*/0x74, 0x10, 0x25, 0x15, 0xF5, 0x15, 0xB4, 0x80, + 0x03, 0xE4, 0xF5, 0x15, 0x12, 0x0E, 0x21, 0xEF, +/*19F0*/0xF0, 0x22, 0xF0, 0xE5, 0x0B, 0x25, 0xE0, 0x25, + 0xE0, 0x24, 0x82, 0xF5, 0x82, 0xE4, 0x34, 0x07, +/*1A00*/0xF5, 0x83, 0x22, 0x74, 0x88, 0xFE, 0xE5, 0x08, + 0x44, 0x07, 0xFF, 0xF5, 0x82, 0x8E, 0x83, 0xE0, +/*1A10*/0x22, 0xF0, 0xE5, 0x08, 0x44, 0x07, 0xF5, 0x82, + 0x22, 0xF0, 0xE0, 0x54, 0xC0, 0x8F, 0x82, 0x8E, +/*1A20*/0x83, 0xF0, 0x22, 0xEF, 0x44, 0x07, 0xF5, 0x82, + 0x75, 0x83, 0x86, 0xE0, 0x54, 0x10, 0xD3, 0x94, +/*1A30*/0x00, 0x22, 0xF0, 0x90, 0x07, 0x15, 0xE0, 0x04, + 0xF0, 0x22, 0x44, 0x06, 0xF5, 0x82, 0x75, 0x83, +/*1A40*/0x9E, 0xE0, 0x22, 0xFE, 0xEF, 0x44, 0x07, 0xF5, + 0x82, 0x8E, 0x83, 0xE0, 0x22, 0xE4, 0x90, 0x07, +/*1A50*/0x2A, 0xF0, 0xA3, 0xF0, 0x12, 0x07, 0x2A, 0x75, + 0x83, 0x82, 0xE0, 0x54, 0x7F, 0x12, 0x07, 0x29, +/*1A60*/0xE0, 0x44, 0x80, 0xF0, 0x12, 0x10, 0xFC, 0x12, + 0x08, 0x1F, 0x75, 0x83, 0xA0, 0xE0, 0x20, 0xE0, +/*1A70*/0x1A, 0x90, 0x07, 0x2B, 0xE0, 0x04, 0xF0, 0x70, + 0x06, 0x90, 0x07, 0x2A, 0xE0, 0x04, 0xF0, 0x90, +/*1A80*/0x07, 0x2A, 0xE0, 0xB4, 0x10, 0xE1, 0xA3, 0xE0, + 0xB4, 0x00, 0xDC, 0xEE, 0x44, 0xA6, 0xFC, 0xEF, +/*1A90*/0x44, 0x07, 0xF5, 0x82, 0x8C, 0x83, 0xE0, 0xF5, + 0x32, 0xEE, 0x44, 0xA8, 0xFE, 0xEF, 0x44, 0x07, +/*1AA0*/0xF5, 0x82, 0x8E, 0x83, 0xE0, 0xF5, 0x33, 0x22, + 0x01, 0x20, 0x11, 0x00, 0x04, 0x20, 0x00, 0x90, +/*1AB0*/0x00, 0x20, 0x0F, 0x92, 0x00, 0x21, 0x0F, 0x94, + 0x00, 0x22, 0x0F, 0x96, 0x00, 0x23, 0x0F, 0x98, +/*1AC0*/0x00, 0x24, 0x0F, 0x9A, 0x00, 0x25, 0x0F, 0x9C, + 0x00, 0x26, 0x0F, 0x9E, 0x00, 0x27, 0x0F, 0xA0, +/*1AD0*/0x01, 0x20, 0x01, 0xA2, 0x01, 0x21, 0x01, 0xA4, + 0x01, 0x22, 0x01, 0xA6, 0x01, 0x23, 0x01, 0xA8, +/*1AE0*/0x01, 0x24, 0x01, 0xAA, 0x01, 0x25, 0x01, 0xAC, + 0x01, 0x26, 0x01, 0xAE, 0x01, 0x27, 0x01, 0xB0, +/*1AF0*/0x01, 0x28, 0x01, 0xB4, 0x00, 0x28, 0x0F, 0xB6, + 0x40, 0x28, 0x0F, 0xB8, 0x61, 0x28, 0x01, 0xCB, +/*1B00*/0xEF, 0xCB, 0xCA, 0xEE, 0xCA, 0x7F, 0x01, 0xE4, + 0xFD, 0xEB, 0x4A, 0x70, 0x24, 0xE5, 0x08, 0xF5, +/*1B10*/0x82, 0x74, 0xB6, 0x12, 0x08, 0x29, 0xE5, 0x08, + 0xF5, 0x82, 0x74, 0xB8, 0x12, 0x08, 0x29, 0xE5, +/*1B20*/0x08, 0xF5, 0x82, 0x74, 0xBA, 0x12, 0x08, 0x29, + 0x7E, 0x00, 0x7C, 0x00, 0x12, 0x0A, 0xFF, 0x80, +/*1B30*/0x12, 0x90, 0x07, 0x26, 0x12, 0x07, 0x35, 0xE5, + 0x41, 0xF0, 0x90, 0x07, 0x24, 0x12, 0x07, 0x35, +/*1B40*/0xE5, 0x40, 0xF0, 0x12, 0x07, 0x2A, 0x75, 0x83, + 0x8E, 0xE4, 0x12, 0x07, 0x29, 0x74, 0x01, 0x12, +/*1B50*/0x07, 0x29, 0xE4, 0xF0, 0x22, 0xE4, 0xF5, 0x26, + 0xF5, 0x27, 0x53, 0xE1, 0xFE, 0xF5, 0x2A, 0x75, +/*1B60*/0x2B, 0x01, 0xF5, 0x08, 0x7F, 0x01, 0x12, 0x17, + 0x31, 0x30, 0x30, 0x1C, 0x90, 0x1A, 0xA9, 0xE4, +/*1B70*/0x93, 0xF5, 0x10, 0x90, 0x1F, 0xF9, 0xE4, 0x93, + 0xF5, 0x10, 0x90, 0x00, 0x41, 0xE4, 0x93, 0xF5, +/*1B80*/0x10, 0x90, 0x1E, 0xCA, 0xE4, 0x93, 0xF5, 0x10, + 0x7F, 0x02, 0x12, 0x17, 0x31, 0x12, 0x0F, 0x54, +/*1B90*/0x7F, 0x03, 0x12, 0x17, 0x31, 0x12, 0x00, 0x06, + 0xE5, 0xE2, 0x30, 0xE7, 0x09, 0x12, 0x10, 0x00, +/*1BA0*/0x30, 0x30, 0x03, 0x12, 0x11, 0x00, 0x02, 0x00, + 0x47, 0x12, 0x08, 0x1F, 0x75, 0x83, 0xD0, 0xE0, +/*1BB0*/0xC4, 0x54, 0x0F, 0xFD, 0x75, 0x43, 0x01, 0x75, + 0x44, 0xFF, 0x12, 0x08, 0xAA, 0x74, 0x04, 0xF0, +/*1BC0*/0x75, 0x3B, 0x01, 0xED, 0x14, 0x60, 0x0C, 0x14, + 0x60, 0x0B, 0x14, 0x60, 0x0F, 0x24, 0x03, 0x70, +/*1BD0*/0x0B, 0x80, 0x09, 0x80, 0x00, 0x12, 0x08, 0xA7, + 0x04, 0xF0, 0x80, 0x06, 0x12, 0x08, 0xA7, 0x74, +/*1BE0*/0x04, 0xF0, 0xEE, 0x44, 0x82, 0xFE, 0xEF, 0x44, + 0x07, 0xF5, 0x82, 0x8E, 0x83, 0xE5, 0x45, 0x12, +/*1BF0*/0x08, 0xBE, 0x75, 0x83, 0x82, 0xE5, 0x31, 0xF0, + 0x02, 0x11, 0x4C, 0x8E, 0x60, 0x8F, 0x61, 0x12, +/*1C00*/0x1E, 0xA5, 0xE4, 0xFF, 0xCE, 0xED, 0xCE, 0xEE, + 0xD3, 0x95, 0x61, 0xE5, 0x60, 0x12, 0x07, 0x6B, +/*1C10*/0x40, 0x39, 0x74, 0x20, 0x2E, 0xF5, 0x82, 0xE4, + 0x34, 0x03, 0xF5, 0x83, 0xE0, 0x70, 0x03, 0xFF, +/*1C20*/0x80, 0x26, 0x12, 0x08, 0xE2, 0xFD, 0xC3, 0x9F, + 0x40, 0x1E, 0xCF, 0xED, 0xCF, 0xEB, 0x4A, 0x70, +/*1C30*/0x0B, 0x8D, 0x42, 0x12, 0x08, 0xEE, 0xF5, 0x41, + 0x8E, 0x40, 0x80, 0x0C, 0x12, 0x08, 0xE2, 0xF5, +/*1C40*/0x38, 0x12, 0x08, 0xEE, 0xF5, 0x39, 0x8E, 0x3A, + 0x1E, 0x80, 0xBC, 0x22, 0x75, 0x58, 0x01, 0xE5, +/*1C50*/0x35, 0x70, 0x0C, 0x12, 0x07, 0xCC, 0xE0, 0xF5, + 0x4A, 0x12, 0x07, 0xD8, 0xE0, 0xF5, 0x4C, 0xE5, +/*1C60*/0x35, 0xB4, 0x04, 0x0C, 0x12, 0x07, 0xE4, 0xE0, + 0xF5, 0x4A, 0x12, 0x07, 0xF0, 0xE0, 0xF5, 0x4C, +/*1C70*/0xE5, 0x35, 0xB4, 0x01, 0x04, 0x7F, 0x01, 0x80, + 0x02, 0x7F, 0x00, 0xE5, 0x35, 0xB4, 0x02, 0x04, +/*1C80*/0x7E, 0x01, 0x80, 0x02, 0x7E, 0x00, 0xEE, 0x4F, + 0x60, 0x0C, 0x12, 0x07, 0xFC, 0xE0, 0xF5, 0x4A, +/*1C90*/0x12, 0x08, 0x08, 0xE0, 0xF5, 0x4C, 0x85, 0x41, + 0x49, 0x85, 0x40, 0x4B, 0x22, 0x75, 0x5B, 0x01, +/*1CA0*/0x90, 0x07, 0x24, 0x12, 0x07, 0x35, 0xE0, 0x54, + 0x1F, 0xFF, 0xD3, 0x94, 0x02, 0x50, 0x04, 0x8F, +/*1CB0*/0x58, 0x80, 0x05, 0xEF, 0x24, 0xFE, 0xF5, 0x58, + 0xEF, 0xC3, 0x94, 0x18, 0x40, 0x05, 0x75, 0x59, +/*1CC0*/0x18, 0x80, 0x04, 0xEF, 0x04, 0xF5, 0x59, 0x85, + 0x43, 0x5A, 0xAF, 0x58, 0x7E, 0x00, 0xAD, 0x59, +/*1CD0*/0x7C, 0x00, 0xAB, 0x5B, 0x7A, 0x00, 0x12, 0x15, + 0x41, 0xAF, 0x5A, 0x7E, 0x00, 0x12, 0x18, 0x0A, +/*1CE0*/0xAF, 0x5B, 0x7E, 0x00, 0x02, 0x1A, 0xFF, 0xE5, + 0xE2, 0x30, 0xE7, 0x0E, 0x12, 0x10, 0x03, 0xC2, +/*1CF0*/0x30, 0x30, 0x30, 0x03, 0x12, 0x10, 0xFF, 0x20, + 0x33, 0x28, 0xE5, 0xE7, 0x30, 0xE7, 0x05, 0x12, +/*1D00*/0x0E, 0xA2, 0x80, 0x0D, 0xE5, 0xFE, 0xC3, 0x94, + 0x20, 0x50, 0x06, 0x12, 0x0E, 0xA2, 0x43, 0xF9, +/*1D10*/0x08, 0xE5, 0xF2, 0x30, 0xE7, 0x03, 0x53, 0xF9, + 0x7F, 0xE5, 0xF1, 0x54, 0x70, 0xD3, 0x94, 0x00, +/*1D20*/0x50, 0xD8, 0x22, 0x12, 0x0E, 0x04, 0x75, 0x83, + 0x80, 0xE4, 0xF0, 0xE5, 0x08, 0x44, 0x07, 0x12, +/*1D30*/0x0D, 0xFD, 0x75, 0x83, 0x84, 0x12, 0x0E, 0x02, + 0x75, 0x83, 0x86, 0x12, 0x0E, 0x02, 0x75, 0x83, +/*1D40*/0x8C, 0xE0, 0x54, 0xF3, 0x12, 0x0E, 0x03, 0x75, + 0x83, 0x8E, 0x12, 0x0E, 0x02, 0x75, 0x83, 0x94, +/*1D50*/0xE0, 0x54, 0xFB, 0xF0, 0x22, 0x12, 0x07, 0x2A, + 0x75, 0x83, 0x8E, 0xE4, 0x12, 0x07, 0x29, 0x74, +/*1D60*/0x01, 0x12, 0x07, 0x29, 0xE4, 0x12, 0x08, 0xBE, + 0x75, 0x83, 0x8C, 0xE0, 0x44, 0x20, 0x12, 0x08, +/*1D70*/0xBE, 0xE0, 0x54, 0xDF, 0xF0, 0x74, 0x84, 0x85, + 0x08, 0x82, 0xF5, 0x83, 0xE0, 0x54, 0x7F, 0xF0, +/*1D80*/0xE0, 0x44, 0x80, 0xF0, 0x22, 0x75, 0x56, 0x01, + 0xE4, 0xFD, 0xF5, 0x57, 0xAF, 0x35, 0xFE, 0xFC, +/*1D90*/0x12, 0x09, 0x15, 0x12, 0x1C, 0x9D, 0x12, 0x1E, + 0x7A, 0x12, 0x1C, 0x4C, 0xAF, 0x57, 0x7E, 0x00, +/*1DA0*/0xAD, 0x56, 0x7C, 0x00, 0x12, 0x04, 0x44, 0xAF, + 0x56, 0x7E, 0x00, 0x02, 0x11, 0xEE, 0x75, 0x56, +/*1DB0*/0x01, 0xE4, 0xFD, 0xF5, 0x57, 0xAF, 0x35, 0xFE, + 0xFC, 0x12, 0x09, 0x15, 0x12, 0x1C, 0x9D, 0x12, +/*1DC0*/0x1E, 0x7A, 0x12, 0x1C, 0x4C, 0xAF, 0x57, 0x7E, + 0x00, 0xAD, 0x56, 0x7C, 0x00, 0x12, 0x04, 0x44, +/*1DD0*/0xAF, 0x56, 0x7E, 0x00, 0x02, 0x11, 0xEE, 0xE4, + 0xF5, 0x16, 0x12, 0x0E, 0x44, 0xFE, 0xE5, 0x08, +/*1DE0*/0x44, 0x05, 0xFF, 0x12, 0x0E, 0x65, 0x8F, 0x82, + 0x8E, 0x83, 0xF0, 0x05, 0x16, 0xE5, 0x16, 0xC3, +/*1DF0*/0x94, 0x14, 0x40, 0xE6, 0xE5, 0x08, 0x12, 0x0E, + 0x2B, 0xE4, 0xF0, 0x22, 0xE4, 0xF5, 0x58, 0xF5, +/*1E00*/0x59, 0xF5, 0x5A, 0xFF, 0xFE, 0xAD, 0x58, 0xFC, + 0x12, 0x09, 0x15, 0x7F, 0x04, 0x7E, 0x00, 0xAD, +/*1E10*/0x58, 0x7C, 0x00, 0x12, 0x09, 0x15, 0x7F, 0x02, + 0x7E, 0x00, 0xAD, 0x58, 0x7C, 0x00, 0x02, 0x09, +/*1E20*/0x15, 0xE5, 0x3C, 0x25, 0x3E, 0xFC, 0xE5, 0x42, + 0x24, 0x00, 0xFB, 0xE4, 0x33, 0xFA, 0xEC, 0xC3, +/*1E30*/0x9B, 0xEA, 0x12, 0x07, 0x6B, 0x40, 0x0B, 0x8C, + 0x42, 0xE5, 0x3D, 0x25, 0x3F, 0xF5, 0x41, 0x8F, +/*1E40*/0x40, 0x22, 0x12, 0x09, 0x0B, 0x22, 0x74, 0x84, + 0xF5, 0x18, 0x85, 0x08, 0x19, 0x85, 0x19, 0x82, +/*1E50*/0x85, 0x18, 0x83, 0xE0, 0x54, 0x7F, 0xF0, 0xE0, + 0x44, 0x80, 0xF0, 0xE0, 0x44, 0x80, 0xF0, 0x22, +/*1E60*/0xEF, 0x4E, 0x70, 0x0B, 0x12, 0x07, 0x2A, 0x75, + 0x83, 0xD2, 0xE0, 0x54, 0xDF, 0xF0, 0x22, 0x12, +/*1E70*/0x07, 0x2A, 0x75, 0x83, 0xD2, 0xE0, 0x44, 0x20, + 0xF0, 0x22, 0x75, 0x58, 0x01, 0x90, 0x07, 0x26, +/*1E80*/0x12, 0x07, 0x35, 0xE0, 0x54, 0x3F, 0xF5, 0x41, + 0x12, 0x07, 0x32, 0xE0, 0x54, 0x3F, 0xF5, 0x40, +/*1E90*/0x22, 0x75, 0x56, 0x02, 0xE4, 0xF5, 0x57, 0x12, + 0x1D, 0xFC, 0xAF, 0x57, 0x7E, 0x00, 0xAD, 0x56, +/*1EA0*/0x7C, 0x00, 0x02, 0x04, 0x44, 0xE4, 0xF5, 0x42, + 0xF5, 0x41, 0xF5, 0x40, 0xF5, 0x38, 0xF5, 0x39, +/*1EB0*/0xF5, 0x3A, 0x22, 0xEF, 0x54, 0x07, 0xFF, 0xE5, + 0xF9, 0x54, 0xF8, 0x4F, 0xF5, 0xF9, 0x22, 0x7F, +/*1EC0*/0x01, 0xE4, 0xFE, 0x0F, 0x0E, 0xBE, 0xFF, 0xFB, + 0x22, 0x01, 0x20, 0x00, 0x01, 0x04, 0x20, 0x00, +/*1ED0*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +/*1EE0*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +/*1EF0*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +/*1F00*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +/*1F10*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +/*1F20*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +/*1F30*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +/*1F40*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +/*1F50*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +/*1F60*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +/*1F70*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +/*1F80*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +/*1F90*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +/*1FA0*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +/*1FB0*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +/*1FC0*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +/*1FD0*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +/*1FE0*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +/*1FF0*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x01, 0x20, 0x11, 0x00, 0x04, 0x20, 0x00, 0x81 +}; + +int ipath_sd7220_ib_load(struct ipath_devdata *dd) +{ + return ipath_sd7220_prog_ld(dd, IB_7220_SERDES, ipath_sd7220_ib_img, + sizeof(ipath_sd7220_ib_img), 0); +} + +int ipath_sd7220_ib_vfy(struct ipath_devdata *dd) +{ + return ipath_sd7220_prog_vfy(dd, IB_7220_SERDES, ipath_sd7220_ib_img, + sizeof(ipath_sd7220_ib_img), 0); +} From ralph.campbell at qlogic.com Wed Apr 2 15:50:18 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:50:18 -0700 Subject: [ofa-general] [PATCH 15/20] IB/ipath - Add code for IBA7220 send DMA In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: <20080402225018.28598.2373.stgit@eng-46.mv.qlogic.com> From: John Gregor The IBA7220 HCA has a new feature to DMA data to the on chip send buffers instead of or in addition to the host CPU doing the data transfer. This patch adds code to support the send DMA queue. Signed-off-by: John Gregor --- drivers/infiniband/hw/ipath/ipath_sdma.c | 743 ++++++++++++++++++++++++++++++ 1 files changed, 743 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_sdma.c b/drivers/infiniband/hw/ipath/ipath_sdma.c new file mode 100644 index 0000000..5918caf --- /dev/null +++ b/drivers/infiniband/hw/ipath/ipath_sdma.c @@ -0,0 +1,743 @@ +/* + * Copyright (c) 2007, 2008 QLogic Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include + +#include "ipath_kernel.h" +#include "ipath_verbs.h" +#include "ipath_common.h" + +#define SDMA_DESCQ_SZ PAGE_SIZE /* 256 entries per 4KB page */ + +static void vl15_watchdog_enq(struct ipath_devdata *dd) +{ + /* ipath_sdma_lock must already be held */ + if (atomic_inc_return(&dd->ipath_sdma_vl15_count) == 1) { + unsigned long interval = (HZ + 19) / 20; + dd->ipath_sdma_vl15_timer.expires = jiffies + interval; + add_timer(&dd->ipath_sdma_vl15_timer); + } +} + +static void vl15_watchdog_deq(struct ipath_devdata *dd) +{ + /* ipath_sdma_lock must already be held */ + if (atomic_dec_return(&dd->ipath_sdma_vl15_count) != 0) { + unsigned long interval = (HZ + 19) / 20; + mod_timer(&dd->ipath_sdma_vl15_timer, jiffies + interval); + } else { + del_timer(&dd->ipath_sdma_vl15_timer); + } +} + +static void vl15_watchdog_timeout(unsigned long opaque) +{ + struct ipath_devdata *dd = (struct ipath_devdata *)opaque; + + if (atomic_read(&dd->ipath_sdma_vl15_count) != 0) { + ipath_dbg("vl15 watchdog timeout - clearing\n"); + ipath_cancel_sends(dd, 1); + ipath_hol_down(dd); + } else { + ipath_dbg("vl15 watchdog timeout - " + "condition already cleared\n"); + } +} + +static void unmap_desc(struct ipath_devdata *dd, unsigned head) +{ + __le64 *descqp = &dd->ipath_sdma_descq[head].qw[0]; + u64 desc[2]; + dma_addr_t addr; + size_t len; + + desc[0] = le64_to_cpu(descqp[0]); + desc[1] = le64_to_cpu(descqp[1]); + + addr = (desc[1] << 32) | (desc[0] >> 32); + len = (desc[0] >> 14) & (0x7ffULL << 2); + dma_unmap_single(&dd->pcidev->dev, addr, len, DMA_TO_DEVICE); +} + +/* + * ipath_sdma_lock should be locked before calling this. + */ +int ipath_sdma_make_progress(struct ipath_devdata *dd) +{ + struct list_head *lp = NULL; + struct ipath_sdma_txreq *txp = NULL; + u16 dmahead; + u16 start_idx = 0; + int progress = 0; + + if (!list_empty(&dd->ipath_sdma_activelist)) { + lp = dd->ipath_sdma_activelist.next; + txp = list_entry(lp, struct ipath_sdma_txreq, list); + start_idx = txp->start_idx; + } + + /* + * Read the SDMA head register in order to know that the + * interrupt clear has been written to the chip. + * Otherwise, we may not get an interrupt for the last + * descriptor in the queue. + */ + dmahead = (u16)ipath_read_kreg32(dd, dd->ipath_kregs->kr_senddmahead); + /* sanity check return value for error handling (chip reset, etc.) */ + if (dmahead >= dd->ipath_sdma_descq_cnt) + goto done; + + while (dd->ipath_sdma_descq_head != dmahead) { + if (txp && txp->flags & IPATH_SDMA_TXREQ_F_FREEDESC && + dd->ipath_sdma_descq_head == start_idx) { + unmap_desc(dd, dd->ipath_sdma_descq_head); + start_idx++; + if (start_idx == dd->ipath_sdma_descq_cnt) + start_idx = 0; + } + + /* increment free count and head */ + dd->ipath_sdma_descq_removed++; + if (++dd->ipath_sdma_descq_head == dd->ipath_sdma_descq_cnt) + dd->ipath_sdma_descq_head = 0; + + if (txp && txp->next_descq_idx == dd->ipath_sdma_descq_head) { + /* move to notify list */ + if (txp->flags & IPATH_SDMA_TXREQ_F_VL15) + vl15_watchdog_deq(dd); + list_move_tail(lp, &dd->ipath_sdma_notifylist); + if (!list_empty(&dd->ipath_sdma_activelist)) { + lp = dd->ipath_sdma_activelist.next; + txp = list_entry(lp, struct ipath_sdma_txreq, + list); + start_idx = txp->start_idx; + } else { + lp = NULL; + txp = NULL; + } + } + progress = 1; + } + + if (progress) + tasklet_hi_schedule(&dd->ipath_sdma_notify_task); + +done: + return progress; +} + +static void ipath_sdma_notify(struct ipath_devdata *dd, struct list_head *list) +{ + struct ipath_sdma_txreq *txp, *txp_next; + + list_for_each_entry_safe(txp, txp_next, list, list) { + list_del_init(&txp->list); + + if (txp->callback) + (*txp->callback)(txp->callback_cookie, + txp->callback_status); + } +} + +static void sdma_notify_taskbody(struct ipath_devdata *dd) +{ + unsigned long flags; + struct list_head list; + + INIT_LIST_HEAD(&list); + + spin_lock_irqsave(&dd->ipath_sdma_lock, flags); + + list_splice_init(&dd->ipath_sdma_notifylist, &list); + + spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags); + + ipath_sdma_notify(dd, &list); + + /* + * The IB verbs layer needs to see the callback before getting + * the call to ipath_ib_piobufavail() because the callback + * handles releasing resources the next send will need. + * Otherwise, we could do these calls in + * ipath_sdma_make_progress(). + */ + ipath_ib_piobufavail(dd->verbs_dev); +} + +static void sdma_notify_task(unsigned long opaque) +{ + struct ipath_devdata *dd = (struct ipath_devdata *)opaque; + + if (!test_bit(IPATH_SDMA_SHUTDOWN, &dd->ipath_sdma_status)) + sdma_notify_taskbody(dd); +} + +static void dump_sdma_state(struct ipath_devdata *dd) +{ + unsigned long reg; + + reg = ipath_read_kreg64(dd, dd->ipath_kregs->kr_senddmastatus); + ipath_cdbg(VERBOSE, "kr_senddmastatus: 0x%016lx\n", reg); + + reg = ipath_read_kreg64(dd, dd->ipath_kregs->kr_sendctrl); + ipath_cdbg(VERBOSE, "kr_sendctrl: 0x%016lx\n", reg); + + reg = ipath_read_kreg64(dd, dd->ipath_kregs->kr_senddmabufmask0); + ipath_cdbg(VERBOSE, "kr_senddmabufmask0: 0x%016lx\n", reg); + + reg = ipath_read_kreg64(dd, dd->ipath_kregs->kr_senddmabufmask1); + ipath_cdbg(VERBOSE, "kr_senddmabufmask1: 0x%016lx\n", reg); + + reg = ipath_read_kreg64(dd, dd->ipath_kregs->kr_senddmabufmask2); + ipath_cdbg(VERBOSE, "kr_senddmabufmask2: 0x%016lx\n", reg); + + reg = ipath_read_kreg64(dd, dd->ipath_kregs->kr_senddmatail); + ipath_cdbg(VERBOSE, "kr_senddmatail: 0x%016lx\n", reg); + + reg = ipath_read_kreg64(dd, dd->ipath_kregs->kr_senddmahead); + ipath_cdbg(VERBOSE, "kr_senddmahead: 0x%016lx\n", reg); +} + +static void sdma_abort_task(unsigned long opaque) +{ + struct ipath_devdata *dd = (struct ipath_devdata *) opaque; + int kick = 0; + u64 status; + unsigned long flags; + + if (test_bit(IPATH_SDMA_SHUTDOWN, &dd->ipath_sdma_status)) + return; + + spin_lock_irqsave(&dd->ipath_sdma_lock, flags); + + status = dd->ipath_sdma_status & IPATH_SDMA_ABORT_MASK; + + /* nothing to do */ + if (status == IPATH_SDMA_ABORT_NONE) + goto unlock; + + /* ipath_sdma_abort() is done, waiting for interrupt */ + if (status == IPATH_SDMA_ABORT_DISARMED) { + if (jiffies < dd->ipath_sdma_abort_intr_timeout) + goto resched_noprint; + /* give up, intr got lost somewhere */ + ipath_dbg("give up waiting for SDMADISABLED intr\n"); + __set_bit(IPATH_SDMA_DISABLED, &dd->ipath_sdma_status); + status = IPATH_SDMA_ABORT_ABORTED; + } + + /* everything is stopped, time to clean up and restart */ + if (status == IPATH_SDMA_ABORT_ABORTED) { + struct ipath_sdma_txreq *txp, *txpnext; + u64 hwstatus; + int notify = 0; + + hwstatus = ipath_read_kreg64(dd, + dd->ipath_kregs->kr_senddmastatus); + + if (/* ScoreBoardDrainInProg */ + test_bit(63, &hwstatus) || + /* AbortInProg */ + test_bit(62, &hwstatus) || + /* InternalSDmaEnable */ + test_bit(61, &hwstatus) || + /* ScbEmpty */ + !test_bit(30, &hwstatus)) { + if (dd->ipath_sdma_reset_wait > 0) { + /* not done shutting down sdma */ + --dd->ipath_sdma_reset_wait; + goto resched; + } + ipath_cdbg(VERBOSE, "gave up waiting for quiescent " + "status after SDMA reset, continuing\n"); + dump_sdma_state(dd); + } + + /* dequeue all "sent" requests */ + list_for_each_entry_safe(txp, txpnext, + &dd->ipath_sdma_activelist, list) { + txp->callback_status = IPATH_SDMA_TXREQ_S_ABORTED; + if (txp->flags & IPATH_SDMA_TXREQ_F_VL15) + vl15_watchdog_deq(dd); + list_move_tail(&txp->list, &dd->ipath_sdma_notifylist); + notify = 1; + } + if (notify) + tasklet_hi_schedule(&dd->ipath_sdma_notify_task); + + /* reset our notion of head and tail */ + dd->ipath_sdma_descq_tail = 0; + dd->ipath_sdma_descq_head = 0; + dd->ipath_sdma_head_dma[0] = 0; + dd->ipath_sdma_generation = 0; + dd->ipath_sdma_descq_removed = dd->ipath_sdma_descq_added; + + /* Reset SendDmaLenGen */ + ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmalengen, + (u64) dd->ipath_sdma_descq_cnt | (1ULL << 18)); + + /* done with sdma state for a bit */ + spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags); + + /* restart sdma engine */ + spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags); + dd->ipath_sendctrl &= ~INFINIPATH_S_SDMAENABLE; + ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, + dd->ipath_sendctrl); + ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); + dd->ipath_sendctrl |= INFINIPATH_S_SDMAENABLE; + ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, + dd->ipath_sendctrl); + ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); + spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags); + kick = 1; + ipath_dbg("sdma restarted from abort\n"); + + /* now clear status bits */ + spin_lock_irqsave(&dd->ipath_sdma_lock, flags); + __clear_bit(IPATH_SDMA_ABORTING, &dd->ipath_sdma_status); + __clear_bit(IPATH_SDMA_DISARMED, &dd->ipath_sdma_status); + __clear_bit(IPATH_SDMA_DISABLED, &dd->ipath_sdma_status); + + /* make sure I see next message */ + dd->ipath_sdma_abort_jiffies = 0; + + goto unlock; + } + +resched: + /* + * for now, keep spinning + * JAG - this is bad to just have default be a loop without + * state change + */ + if (jiffies > dd->ipath_sdma_abort_jiffies) { + ipath_dbg("looping with status 0x%016llx\n", + dd->ipath_sdma_status); + dd->ipath_sdma_abort_jiffies = jiffies + 5 * HZ; + } +resched_noprint: + spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags); + if (!test_bit(IPATH_SDMA_SHUTDOWN, &dd->ipath_sdma_status)) + tasklet_hi_schedule(&dd->ipath_sdma_abort_task); + return; + +unlock: + spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags); + + /* kick upper layers */ + if (kick) + ipath_ib_piobufavail(dd->verbs_dev); +} + +/* + * This is called from interrupt context. + */ +void ipath_sdma_intr(struct ipath_devdata *dd) +{ + unsigned long flags; + + spin_lock_irqsave(&dd->ipath_sdma_lock, flags); + + (void) ipath_sdma_make_progress(dd); + + spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags); +} + +static int alloc_sdma(struct ipath_devdata *dd) +{ + int ret = 0; + + /* Allocate memory for SendDMA descriptor FIFO */ + dd->ipath_sdma_descq = dma_alloc_coherent(&dd->pcidev->dev, + SDMA_DESCQ_SZ, &dd->ipath_sdma_descq_phys, GFP_KERNEL); + + if (!dd->ipath_sdma_descq) { + ipath_dev_err(dd, "failed to allocate SendDMA descriptor " + "FIFO memory\n"); + ret = -ENOMEM; + goto done; + } + + dd->ipath_sdma_descq_cnt = + SDMA_DESCQ_SZ / sizeof(struct ipath_sdma_desc); + + /* Allocate memory for DMA of head register to memory */ + dd->ipath_sdma_head_dma = dma_alloc_coherent(&dd->pcidev->dev, + PAGE_SIZE, &dd->ipath_sdma_head_phys, GFP_KERNEL); + if (!dd->ipath_sdma_head_dma) { + ipath_dev_err(dd, "failed to allocate SendDMA head memory\n"); + ret = -ENOMEM; + goto cleanup_descq; + } + dd->ipath_sdma_head_dma[0] = 0; + + init_timer(&dd->ipath_sdma_vl15_timer); + dd->ipath_sdma_vl15_timer.function = vl15_watchdog_timeout; + dd->ipath_sdma_vl15_timer.data = (unsigned long)dd; + atomic_set(&dd->ipath_sdma_vl15_count, 0); + + goto done; + +cleanup_descq: + dma_free_coherent(&dd->pcidev->dev, SDMA_DESCQ_SZ, + (void *)dd->ipath_sdma_descq, dd->ipath_sdma_descq_phys); + dd->ipath_sdma_descq = NULL; + dd->ipath_sdma_descq_phys = 0; +done: + return ret; +} + +int setup_sdma(struct ipath_devdata *dd) +{ + int ret = 0; + unsigned i, n; + u64 tmp64; + u64 senddmabufmask[3] = { 0 }; + unsigned long flags; + + ret = alloc_sdma(dd); + if (ret) + goto done; + + if (!dd->ipath_sdma_descq) { + ipath_dev_err(dd, "SendDMA memory not allocated\n"); + goto done; + } + + dd->ipath_sdma_status = 0; + dd->ipath_sdma_abort_jiffies = 0; + dd->ipath_sdma_generation = 0; + dd->ipath_sdma_descq_tail = 0; + dd->ipath_sdma_descq_head = 0; + dd->ipath_sdma_descq_removed = 0; + dd->ipath_sdma_descq_added = 0; + + /* Set SendDmaBase */ + ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmabase, + dd->ipath_sdma_descq_phys); + /* Set SendDmaLenGen */ + tmp64 = dd->ipath_sdma_descq_cnt; + tmp64 |= 1<<18; /* enable generation checking */ + ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmalengen, tmp64); + /* Set SendDmaTail */ + ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmatail, + dd->ipath_sdma_descq_tail); + /* Set SendDmaHeadAddr */ + ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmaheadaddr, + dd->ipath_sdma_head_phys); + + /* Reserve all the former "kernel" piobufs */ + n = dd->ipath_piobcnt2k + dd->ipath_piobcnt4k - dd->ipath_pioreserved; + for (i = dd->ipath_lastport_piobuf; i < n; ++i) { + unsigned word = i / 64; + unsigned bit = i & 63; + BUG_ON(word >= 3); + senddmabufmask[word] |= 1ULL << bit; + } + ipath_chg_pioavailkernel(dd, dd->ipath_lastport_piobuf, + n - dd->ipath_lastport_piobuf, 0); + ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmabufmask0, + senddmabufmask[0]); + ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmabufmask1, + senddmabufmask[1]); + ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmabufmask2, + senddmabufmask[2]); + + INIT_LIST_HEAD(&dd->ipath_sdma_activelist); + INIT_LIST_HEAD(&dd->ipath_sdma_notifylist); + + tasklet_init(&dd->ipath_sdma_notify_task, sdma_notify_task, + (unsigned long) dd); + tasklet_init(&dd->ipath_sdma_abort_task, sdma_abort_task, + (unsigned long) dd); + + /* Turn on SDMA */ + spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags); + dd->ipath_sendctrl |= INFINIPATH_S_SDMAENABLE | + INFINIPATH_S_SDMAINTENABLE; + ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, dd->ipath_sendctrl); + ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); + __set_bit(IPATH_SDMA_RUNNING, &dd->ipath_sdma_status); + spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags); + +done: + return ret; +} + +void teardown_sdma(struct ipath_devdata *dd) +{ + struct ipath_sdma_txreq *txp, *txpnext; + unsigned long flags; + dma_addr_t sdma_head_phys = 0; + dma_addr_t sdma_descq_phys = 0; + void *sdma_descq = NULL; + void *sdma_head_dma = NULL; + + spin_lock_irqsave(&dd->ipath_sdma_lock, flags); + __clear_bit(IPATH_SDMA_RUNNING, &dd->ipath_sdma_status); + __set_bit(IPATH_SDMA_ABORTING, &dd->ipath_sdma_status); + __set_bit(IPATH_SDMA_SHUTDOWN, &dd->ipath_sdma_status); + spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags); + + tasklet_kill(&dd->ipath_sdma_abort_task); + tasklet_kill(&dd->ipath_sdma_notify_task); + + /* turn off sdma */ + spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags); + dd->ipath_sendctrl &= ~INFINIPATH_S_SDMAENABLE; + ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, + dd->ipath_sendctrl); + ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); + spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags); + + spin_lock_irqsave(&dd->ipath_sdma_lock, flags); + /* dequeue all "sent" requests */ + list_for_each_entry_safe(txp, txpnext, &dd->ipath_sdma_activelist, + list) { + txp->callback_status = IPATH_SDMA_TXREQ_S_SHUTDOWN; + if (txp->flags & IPATH_SDMA_TXREQ_F_VL15) + vl15_watchdog_deq(dd); + list_move_tail(&txp->list, &dd->ipath_sdma_notifylist); + } + spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags); + + sdma_notify_taskbody(dd); + + del_timer_sync(&dd->ipath_sdma_vl15_timer); + + spin_lock_irqsave(&dd->ipath_sdma_lock, flags); + + dd->ipath_sdma_abort_jiffies = 0; + + ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmabase, 0); + ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmalengen, 0); + ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmatail, 0); + ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmaheadaddr, 0); + ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmabufmask0, 0); + ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmabufmask1, 0); + ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmabufmask2, 0); + + if (dd->ipath_sdma_head_dma) { + sdma_head_dma = (void *) dd->ipath_sdma_head_dma; + sdma_head_phys = dd->ipath_sdma_head_phys; + dd->ipath_sdma_head_dma = NULL; + dd->ipath_sdma_head_phys = 0; + } + + if (dd->ipath_sdma_descq) { + sdma_descq = dd->ipath_sdma_descq; + sdma_descq_phys = dd->ipath_sdma_descq_phys; + dd->ipath_sdma_descq = NULL; + dd->ipath_sdma_descq_phys = 0; + } + + spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags); + + if (sdma_head_dma) + dma_free_coherent(&dd->pcidev->dev, PAGE_SIZE, + sdma_head_dma, sdma_head_phys); + + if (sdma_descq) + dma_free_coherent(&dd->pcidev->dev, SDMA_DESCQ_SZ, + sdma_descq, sdma_descq_phys); +} + +static inline void make_sdma_desc(struct ipath_devdata *dd, + u64 *sdmadesc, u64 addr, u64 dwlen, u64 dwoffset) +{ + WARN_ON(addr & 3); + /* SDmaPhyAddr[47:32] */ + sdmadesc[1] = addr >> 32; + /* SDmaPhyAddr[31:0] */ + sdmadesc[0] = (addr & 0xfffffffcULL) << 32; + /* SDmaGeneration[1:0] */ + sdmadesc[0] |= (dd->ipath_sdma_generation & 3ULL) << 30; + /* SDmaDwordCount[10:0] */ + sdmadesc[0] |= (dwlen & 0x7ffULL) << 16; + /* SDmaBufOffset[12:2] */ + sdmadesc[0] |= dwoffset & 0x7ffULL; +} + +/* + * This function queues one IB packet onto the send DMA queue per call. + * The caller is responsible for checking: + * 1) The number of send DMA descriptor entries is less than the size of + * the descriptor queue. + * 2) The IB SGE addresses and lengths are 32-bit aligned + * (except possibly the last SGE's length) + * 3) The SGE addresses are suitable for passing to dma_map_single(). + */ +int ipath_sdma_verbs_send(struct ipath_devdata *dd, + struct ipath_sge_state *ss, u32 dwords, + struct ipath_verbs_txreq *tx) +{ + + unsigned long flags; + struct ipath_sge *sge; + int ret = 0; + u16 tail; + __le64 *descqp; + u64 sdmadesc[2]; + u32 dwoffset; + dma_addr_t addr; + + if ((tx->map_len + (dwords<<2)) > dd->ipath_ibmaxlen) { + ipath_dbg("packet size %X > ibmax %X, fail\n", + tx->map_len + (dwords<<2), dd->ipath_ibmaxlen); + ret = -EMSGSIZE; + goto fail; + } + + spin_lock_irqsave(&dd->ipath_sdma_lock, flags); + +retry: + if (unlikely(test_bit(IPATH_SDMA_ABORTING, &dd->ipath_sdma_status))) { + ret = -EBUSY; + goto unlock; + } + + if (tx->txreq.sg_count > ipath_sdma_descq_freecnt(dd)) { + if (ipath_sdma_make_progress(dd)) + goto retry; + ret = -ENOBUFS; + goto unlock; + } + + addr = dma_map_single(&dd->pcidev->dev, tx->txreq.map_addr, + tx->map_len, DMA_TO_DEVICE); + if (dma_mapping_error(addr)) { + ret = -EIO; + goto unlock; + } + + dwoffset = tx->map_len >> 2; + make_sdma_desc(dd, sdmadesc, (u64) addr, dwoffset, 0); + + /* SDmaFirstDesc */ + sdmadesc[0] |= 1ULL << 12; + if (tx->txreq.flags & IPATH_SDMA_TXREQ_F_USELARGEBUF) + sdmadesc[0] |= 1ULL << 14; /* SDmaUseLargeBuf */ + + /* write to the descq */ + tail = dd->ipath_sdma_descq_tail; + descqp = &dd->ipath_sdma_descq[tail].qw[0]; + *descqp++ = cpu_to_le64(sdmadesc[0]); + *descqp++ = cpu_to_le64(sdmadesc[1]); + + if (tx->txreq.flags & IPATH_SDMA_TXREQ_F_FREEDESC) + tx->txreq.start_idx = tail; + + /* increment the tail */ + if (++tail == dd->ipath_sdma_descq_cnt) { + tail = 0; + descqp = &dd->ipath_sdma_descq[0].qw[0]; + ++dd->ipath_sdma_generation; + } + + sge = &ss->sge; + while (dwords) { + u32 dw; + u32 len; + + len = dwords << 2; + if (len > sge->length) + len = sge->length; + if (len > sge->sge_length) + len = sge->sge_length; + BUG_ON(len == 0); + dw = (len + 3) >> 2; + addr = dma_map_single(&dd->pcidev->dev, sge->vaddr, dw << 2, + DMA_TO_DEVICE); + make_sdma_desc(dd, sdmadesc, (u64) addr, dw, dwoffset); + /* SDmaUseLargeBuf has to be set in every descriptor */ + if (tx->txreq.flags & IPATH_SDMA_TXREQ_F_USELARGEBUF) + sdmadesc[0] |= 1ULL << 14; + /* write to the descq */ + *descqp++ = cpu_to_le64(sdmadesc[0]); + *descqp++ = cpu_to_le64(sdmadesc[1]); + + /* increment the tail */ + if (++tail == dd->ipath_sdma_descq_cnt) { + tail = 0; + descqp = &dd->ipath_sdma_descq[0].qw[0]; + ++dd->ipath_sdma_generation; + } + sge->vaddr += len; + sge->length -= len; + sge->sge_length -= len; + if (sge->sge_length == 0) { + if (--ss->num_sge) + *sge = *ss->sg_list++; + } else if (sge->length == 0 && sge->mr != NULL) { + if (++sge->n >= IPATH_SEGSZ) { + if (++sge->m >= sge->mr->mapsz) + break; + sge->n = 0; + } + sge->vaddr = + sge->mr->map[sge->m]->segs[sge->n].vaddr; + sge->length = + sge->mr->map[sge->m]->segs[sge->n].length; + } + + dwoffset += dw; + dwords -= dw; + } + + if (!tail) + descqp = &dd->ipath_sdma_descq[dd->ipath_sdma_descq_cnt].qw[0]; + descqp -= 2; + /* SDmaLastDesc */ + descqp[0] |= __constant_cpu_to_le64(1ULL << 11); + if (tx->txreq.flags & IPATH_SDMA_TXREQ_F_INTREQ) { + /* SDmaIntReq */ + descqp[0] |= __constant_cpu_to_le64(1ULL << 15); + } + + /* Commit writes to memory and advance the tail on the chip */ + wmb(); + ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmatail, tail); + + tx->txreq.next_descq_idx = tail; + tx->txreq.callback_status = IPATH_SDMA_TXREQ_S_OK; + dd->ipath_sdma_descq_tail = tail; + dd->ipath_sdma_descq_added += tx->txreq.sg_count; + list_add_tail(&tx->txreq.list, &dd->ipath_sdma_activelist); + if (tx->txreq.flags & IPATH_SDMA_TXREQ_F_VL15) + vl15_watchdog_enq(dd); + +unlock: + spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags); +fail: + return ret; +} From ralph.campbell at qlogic.com Wed Apr 2 15:50:23 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:50:23 -0700 Subject: [ofa-general] [PATCH 16/20] IB/ipath - user mode send DMA header file In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: <20080402225023.28598.53701.stgit@eng-46.mv.qlogic.com> From: Arthur Jones A new header file which allows the iba7220 send DMA engine to be used from userland. The definitions here are not used yet, that will happen in a follow-on patch... Signed-off-by: Arthur Jones --- drivers/infiniband/hw/ipath/ipath_user_sdma.h | 56 +++++++++++++++++++++++++ 1 files changed, 56 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_user_sdma.h b/drivers/infiniband/hw/ipath/ipath_user_sdma.h new file mode 100644 index 0000000..ce0448f --- /dev/null +++ b/drivers/infiniband/hw/ipath/ipath_user_sdma.h @@ -0,0 +1,56 @@ +/* + * Copyright (c) 2007, 2008 QLogic Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#include + +struct ipath_user_sdma_queue; + +struct ipath_user_sdma_queue * +ipath_user_sdma_queue_create(struct device *dev, int unit, int port, int sport); +void ipath_user_sdma_queue_destroy(struct ipath_user_sdma_queue *pq); + +int ipath_user_sdma_writev(struct ipath_devdata *dd, + struct ipath_user_sdma_queue *pq, + const struct iovec *iov, + unsigned long dim); + +int ipath_user_sdma_make_progress(struct ipath_devdata *dd, + struct ipath_user_sdma_queue *pq); + +int ipath_user_sdma_pkt_sent(const struct ipath_user_sdma_queue *pq, + u32 counter); +void ipath_user_sdma_queue_drain(struct ipath_devdata *dd, + struct ipath_user_sdma_queue *pq); + +u32 ipath_user_sdma_complete_counter(const struct ipath_user_sdma_queue *pq); +void ipath_user_sdma_set_complete_counter(struct ipath_user_sdma_queue *pq, + u32 c); +u32 ipath_user_sdma_inflight_counter(struct ipath_user_sdma_queue *pq); From ralph.campbell at qlogic.com Wed Apr 2 15:50:28 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:50:28 -0700 Subject: [ofa-general] [PATCH 17/20] IB/ipath - user mode send DMA In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: <20080402225028.28598.648.stgit@eng-46.mv.qlogic.com> From: Arthur Jones A new file which allows the iba7220 send DMA engine to be used from userland. The routines here are not linked in yet, that will happen in a follow-on patch... Signed-off-by: Arthur Jones --- drivers/infiniband/hw/ipath/ipath_user_sdma.c | 888 +++++++++++++++++++++++++ 1 files changed, 888 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_user_sdma.c b/drivers/infiniband/hw/ipath/ipath_user_sdma.c new file mode 100644 index 0000000..44020c8 --- /dev/null +++ b/drivers/infiniband/hw/ipath/ipath_user_sdma.c @@ -0,0 +1,888 @@ +/* + * Copyright (c) 2007, 2008 QLogic Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "ipath_kernel.h" +#include "ipath_user_sdma.h" + +/* minimum size of header */ +#define IPATH_USER_SDMA_MIN_HEADER_LENGTH 64 +/* expected size of headers (for dma_pool) */ +#define IPATH_USER_SDMA_EXP_HEADER_LENGTH 64 +/* length mask in PBC (lower 11 bits) */ +#define IPATH_PBC_LENGTH_MASK ((1 << 11) - 1) + +struct ipath_user_sdma_pkt { + u8 naddr; /* dimension of addr (1..3) ... */ + u32 counter; /* sdma pkts queued counter for this entry */ + u64 added; /* global descq number of entries */ + + struct { + u32 offset; /* offset for kvaddr, addr */ + u32 length; /* length in page */ + u8 put_page; /* should we put_page? */ + u8 dma_mapped; /* is page dma_mapped? */ + struct page *page; /* may be NULL (coherent mem) */ + void *kvaddr; /* FIXME: only for pio hack */ + dma_addr_t addr; + } addr[4]; /* max pages, any more and we coalesce */ + struct list_head list; /* list element */ +}; + +struct ipath_user_sdma_queue { + /* + * pkts sent to dma engine are queued on this + * list head. the type of the elements of this + * list are struct ipath_user_sdma_pkt... + */ + struct list_head sent; + + /* headers with expected length are allocated from here... */ + char header_cache_name[64]; + struct dma_pool *header_cache; + + /* packets are allocated from the slab cache... */ + char pkt_slab_name[64]; + struct kmem_cache *pkt_slab; + + /* as packets go on the queued queue, they are counted... */ + u32 counter; + u32 sent_counter; + + /* dma page table */ + struct rb_root dma_pages_root; + + /* protect everything above... */ + struct mutex lock; +}; + +struct ipath_user_sdma_queue * +ipath_user_sdma_queue_create(struct device *dev, int unit, int port, int sport) +{ + struct ipath_user_sdma_queue *pq = + kmalloc(sizeof(struct ipath_user_sdma_queue), GFP_KERNEL); + + if (!pq) + goto done; + + pq->counter = 0; + pq->sent_counter = 0; + INIT_LIST_HEAD(&pq->sent); + + mutex_init(&pq->lock); + + snprintf(pq->pkt_slab_name, sizeof(pq->pkt_slab_name), + "ipath-user-sdma-pkts-%u-%02u.%02u", unit, port, sport); + pq->pkt_slab = kmem_cache_create(pq->pkt_slab_name, + sizeof(struct ipath_user_sdma_pkt), + 0, 0, NULL); + + if (!pq->pkt_slab) + goto err_kfree; + + snprintf(pq->header_cache_name, sizeof(pq->header_cache_name), + "ipath-user-sdma-headers-%u-%02u.%02u", unit, port, sport); + pq->header_cache = dma_pool_create(pq->header_cache_name, + dev, + IPATH_USER_SDMA_EXP_HEADER_LENGTH, + 4, 0); + if (!pq->header_cache) + goto err_slab; + + pq->dma_pages_root = RB_ROOT; + + goto done; + +err_slab: + kmem_cache_destroy(pq->pkt_slab); +err_kfree: + kfree(pq); + pq = NULL; + +done: + return pq; +} + +static void ipath_user_sdma_init_frag(struct ipath_user_sdma_pkt *pkt, + int i, size_t offset, size_t len, + int put_page, int dma_mapped, + struct page *page, + void *kvaddr, dma_addr_t dma_addr) +{ + pkt->addr[i].offset = offset; + pkt->addr[i].length = len; + pkt->addr[i].put_page = put_page; + pkt->addr[i].dma_mapped = dma_mapped; + pkt->addr[i].page = page; + pkt->addr[i].kvaddr = kvaddr; + pkt->addr[i].addr = dma_addr; +} + +static void ipath_user_sdma_init_header(struct ipath_user_sdma_pkt *pkt, + u32 counter, size_t offset, + size_t len, int dma_mapped, + struct page *page, + void *kvaddr, dma_addr_t dma_addr) +{ + pkt->naddr = 1; + pkt->counter = counter; + ipath_user_sdma_init_frag(pkt, 0, offset, len, 0, dma_mapped, page, + kvaddr, dma_addr); +} + +/* we've too many pages in the iovec, coalesce to a single page */ +static int ipath_user_sdma_coalesce(const struct ipath_devdata *dd, + struct ipath_user_sdma_pkt *pkt, + const struct iovec *iov, + unsigned long niov) { + int ret = 0; + struct page *page = alloc_page(GFP_KERNEL); + void *mpage_save; + char *mpage; + int i; + int len = 0; + dma_addr_t dma_addr; + + if (!page) { + ret = -ENOMEM; + goto done; + } + + mpage = kmap(page); + mpage_save = mpage; + for (i = 0; i < niov; i++) { + int cfur; + + cfur = copy_from_user(mpage, + iov[i].iov_base, iov[i].iov_len); + if (cfur) { + ret = -EFAULT; + goto free_unmap; + } + + mpage += iov[i].iov_len; + len += iov[i].iov_len; + } + + dma_addr = dma_map_page(&dd->pcidev->dev, page, 0, len, + DMA_TO_DEVICE); + if (dma_mapping_error(dma_addr)) { + ret = -ENOMEM; + goto free_unmap; + } + + ipath_user_sdma_init_frag(pkt, 1, 0, len, 0, 1, page, mpage_save, + dma_addr); + pkt->naddr = 2; + + goto done; + +free_unmap: + kunmap(page); + __free_page(page); +done: + return ret; +} + +/* how many pages in this iovec element? */ +static int ipath_user_sdma_num_pages(const struct iovec *iov) +{ + const unsigned long addr = (unsigned long) iov->iov_base; + const unsigned long len = iov->iov_len; + const unsigned long spage = addr & PAGE_MASK; + const unsigned long epage = (addr + len - 1) & PAGE_MASK; + + return 1 + ((epage - spage) >> PAGE_SHIFT); +} + +/* truncate length to page boundry */ +static int ipath_user_sdma_page_length(unsigned long addr, unsigned long len) +{ + const unsigned long offset = addr & ~PAGE_MASK; + + return ((offset + len) > PAGE_SIZE) ? (PAGE_SIZE - offset) : len; +} + +static void ipath_user_sdma_free_pkt_frag(struct device *dev, + struct ipath_user_sdma_queue *pq, + struct ipath_user_sdma_pkt *pkt, + int frag) +{ + const int i = frag; + + if (pkt->addr[i].page) { + if (pkt->addr[i].dma_mapped) + dma_unmap_page(dev, + pkt->addr[i].addr, + pkt->addr[i].length, + DMA_TO_DEVICE); + + if (pkt->addr[i].kvaddr) + kunmap(pkt->addr[i].page); + + if (pkt->addr[i].put_page) + put_page(pkt->addr[i].page); + else + __free_page(pkt->addr[i].page); + } else if (pkt->addr[i].kvaddr) + /* free coherent mem from cache... */ + dma_pool_free(pq->header_cache, + pkt->addr[i].kvaddr, pkt->addr[i].addr); +} + +/* return number of pages pinned... */ +static int ipath_user_sdma_pin_pages(const struct ipath_devdata *dd, + struct ipath_user_sdma_pkt *pkt, + unsigned long addr, int tlen, int npages) +{ + struct page *pages[2]; + int j; + int ret; + + ret = get_user_pages(current, current->mm, addr, + npages, 0, 1, pages, NULL); + + if (ret != npages) { + int i; + + for (i = 0; i < ret; i++) + put_page(pages[i]); + + ret = -ENOMEM; + goto done; + } + + for (j = 0; j < npages; j++) { + /* map the pages... */ + const int flen = + ipath_user_sdma_page_length(addr, tlen); + dma_addr_t dma_addr = + dma_map_page(&dd->pcidev->dev, + pages[j], 0, flen, DMA_TO_DEVICE); + unsigned long fofs = addr & ~PAGE_MASK; + + if (dma_mapping_error(dma_addr)) { + ret = -ENOMEM; + goto done; + } + + ipath_user_sdma_init_frag(pkt, pkt->naddr, fofs, flen, 1, 1, + pages[j], kmap(pages[j]), + dma_addr); + + pkt->naddr++; + addr += flen; + tlen -= flen; + } + +done: + return ret; +} + +static int ipath_user_sdma_pin_pkt(const struct ipath_devdata *dd, + struct ipath_user_sdma_queue *pq, + struct ipath_user_sdma_pkt *pkt, + const struct iovec *iov, + unsigned long niov) +{ + int ret = 0; + unsigned long idx; + + for (idx = 0; idx < niov; idx++) { + const int npages = ipath_user_sdma_num_pages(iov + idx); + const unsigned long addr = (unsigned long) iov[idx].iov_base; + + ret = ipath_user_sdma_pin_pages(dd, pkt, + addr, iov[idx].iov_len, + npages); + if (ret < 0) + goto free_pkt; + } + + goto done; + +free_pkt: + for (idx = 0; idx < pkt->naddr; idx++) + ipath_user_sdma_free_pkt_frag(&dd->pcidev->dev, pq, pkt, idx); + +done: + return ret; +} + +static int ipath_user_sdma_init_payload(const struct ipath_devdata *dd, + struct ipath_user_sdma_queue *pq, + struct ipath_user_sdma_pkt *pkt, + const struct iovec *iov, + unsigned long niov, int npages) +{ + int ret = 0; + + if (npages >= ARRAY_SIZE(pkt->addr)) + ret = ipath_user_sdma_coalesce(dd, pkt, iov, niov); + else + ret = ipath_user_sdma_pin_pkt(dd, pq, pkt, iov, niov); + + return ret; +} + +/* free a packet list -- return counter value of last packet */ +static void ipath_user_sdma_free_pkt_list(struct device *dev, + struct ipath_user_sdma_queue *pq, + struct list_head *list) +{ + struct ipath_user_sdma_pkt *pkt, *pkt_next; + + list_for_each_entry_safe(pkt, pkt_next, list, list) { + int i; + + for (i = 0; i < pkt->naddr; i++) + ipath_user_sdma_free_pkt_frag(dev, pq, pkt, i); + + kmem_cache_free(pq->pkt_slab, pkt); + } +} + +/* + * copy headers, coalesce etc -- pq->lock must be held + * + * we queue all the packets to list, returning the + * number of bytes total. list must be empty initially, + * as, if there is an error we clean it... + */ +static int ipath_user_sdma_queue_pkts(const struct ipath_devdata *dd, + struct ipath_user_sdma_queue *pq, + struct list_head *list, + const struct iovec *iov, + unsigned long niov, + int maxpkts) +{ + unsigned long idx = 0; + int ret = 0; + int npkts = 0; + struct page *page = NULL; + __le32 *pbc; + dma_addr_t dma_addr; + struct ipath_user_sdma_pkt *pkt = NULL; + size_t len; + size_t nw; + u32 counter = pq->counter; + int dma_mapped = 0; + + while (idx < niov && npkts < maxpkts) { + const unsigned long addr = (unsigned long) iov[idx].iov_base; + const unsigned long idx_save = idx; + unsigned pktnw; + unsigned pktnwc; + int nfrags = 0; + int npages = 0; + int cfur; + + dma_mapped = 0; + len = iov[idx].iov_len; + nw = len >> 2; + page = NULL; + + pkt = kmem_cache_alloc(pq->pkt_slab, GFP_KERNEL); + if (!pkt) { + ret = -ENOMEM; + goto free_list; + } + + if (len < IPATH_USER_SDMA_MIN_HEADER_LENGTH || + len > PAGE_SIZE || len & 3 || addr & 3) { + ret = -EINVAL; + goto free_pkt; + } + + if (len == IPATH_USER_SDMA_EXP_HEADER_LENGTH) + pbc = dma_pool_alloc(pq->header_cache, GFP_KERNEL, + &dma_addr); + else + pbc = NULL; + + if (!pbc) { + page = alloc_page(GFP_KERNEL); + if (!page) { + ret = -ENOMEM; + goto free_pkt; + } + pbc = kmap(page); + } + + cfur = copy_from_user(pbc, iov[idx].iov_base, len); + if (cfur) { + ret = -EFAULT; + goto free_pbc; + } + + /* + * this assignment is a bit strange. it's because the + * the pbc counts the number of 32 bit words in the full + * packet _except_ the first word of the pbc itself... + */ + pktnwc = nw - 1; + + /* + * pktnw computation yields the number of 32 bit words + * that the caller has indicated in the PBC. note that + * this is one less than the total number of words that + * goes to the send DMA engine as the first 32 bit word + * of the PBC itself is not counted. Armed with this count, + * we can verify that the packet is consistent with the + * iovec lengths. + */ + pktnw = le32_to_cpu(*pbc) & IPATH_PBC_LENGTH_MASK; + if (pktnw < pktnwc || pktnw > pktnwc + (PAGE_SIZE >> 2)) { + ret = -EINVAL; + goto free_pbc; + } + + + idx++; + while (pktnwc < pktnw && idx < niov) { + const size_t slen = iov[idx].iov_len; + const unsigned long faddr = + (unsigned long) iov[idx].iov_base; + + if (slen & 3 || faddr & 3 || !slen || + slen > PAGE_SIZE) { + ret = -EINVAL; + goto free_pbc; + } + + npages++; + if ((faddr & PAGE_MASK) != + ((faddr + slen - 1) & PAGE_MASK)) + npages++; + + pktnwc += slen >> 2; + idx++; + nfrags++; + } + + if (pktnwc != pktnw) { + ret = -EINVAL; + goto free_pbc; + } + + if (page) { + dma_addr = dma_map_page(&dd->pcidev->dev, + page, 0, len, DMA_TO_DEVICE); + if (dma_mapping_error(dma_addr)) { + ret = -ENOMEM; + goto free_pbc; + } + + dma_mapped = 1; + } + + ipath_user_sdma_init_header(pkt, counter, 0, len, dma_mapped, + page, pbc, dma_addr); + + if (nfrags) { + ret = ipath_user_sdma_init_payload(dd, pq, pkt, + iov + idx_save + 1, + nfrags, npages); + if (ret < 0) + goto free_pbc_dma; + } + + counter++; + npkts++; + + list_add_tail(&pkt->list, list); + } + + ret = idx; + goto done; + +free_pbc_dma: + if (dma_mapped) + dma_unmap_page(&dd->pcidev->dev, dma_addr, len, DMA_TO_DEVICE); +free_pbc: + if (page) { + kunmap(page); + __free_page(page); + } else + dma_pool_free(pq->header_cache, pbc, dma_addr); +free_pkt: + kmem_cache_free(pq->pkt_slab, pkt); +free_list: + ipath_user_sdma_free_pkt_list(&dd->pcidev->dev, pq, list); +done: + return ret; +} + +/* try to clean out queue -- needs pq->lock */ +static int ipath_user_sdma_queue_clean(const struct ipath_devdata *dd, + struct ipath_user_sdma_queue *pq) +{ + struct list_head free_list; + struct ipath_user_sdma_pkt *pkt; + struct ipath_user_sdma_pkt *pkt_prev; + int ret = 0; + + INIT_LIST_HEAD(&free_list); + + list_for_each_entry_safe(pkt, pkt_prev, &pq->sent, list) { + s64 descd = dd->ipath_sdma_descq_removed - pkt->added; + + if (descd < 0) + break; + + list_move_tail(&pkt->list, &free_list); + + /* one more packet cleaned */ + ret++; + } + + if (!list_empty(&free_list)) { + u32 counter; + + pkt = list_entry(free_list.prev, + struct ipath_user_sdma_pkt, list); + counter = pkt->counter; + + ipath_user_sdma_free_pkt_list(&dd->pcidev->dev, pq, &free_list); + ipath_user_sdma_set_complete_counter(pq, counter); + } + + return ret; +} + +void ipath_user_sdma_queue_destroy(struct ipath_user_sdma_queue *pq) +{ + if (!pq) + return; + + kmem_cache_destroy(pq->pkt_slab); + dma_pool_destroy(pq->header_cache); + kfree(pq); +} + +/* clean descriptor queue, returns > 0 if some elements cleaned */ +static int ipath_user_sdma_hwqueue_clean(struct ipath_devdata *dd) +{ + int ret; + unsigned long flags; + + spin_lock_irqsave(&dd->ipath_sdma_lock, flags); + ret = ipath_sdma_make_progress(dd); + spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags); + + return ret; +} + +/* we're in close, drain packets so that we can cleanup successfully... */ +void ipath_user_sdma_queue_drain(struct ipath_devdata *dd, + struct ipath_user_sdma_queue *pq) +{ + int i; + + if (!pq) + return; + + for (i = 0; i < 100; i++) { + mutex_lock(&pq->lock); + if (list_empty(&pq->sent)) { + mutex_unlock(&pq->lock); + break; + } + ipath_user_sdma_hwqueue_clean(dd); + ipath_user_sdma_queue_clean(dd, pq); + mutex_unlock(&pq->lock); + msleep(10); + } + + if (!list_empty(&pq->sent)) { + struct list_head free_list; + + printk(KERN_INFO "drain: lists not empty: forcing!\n"); + INIT_LIST_HEAD(&free_list); + mutex_lock(&pq->lock); + list_splice_init(&pq->sent, &free_list); + ipath_user_sdma_free_pkt_list(&dd->pcidev->dev, pq, &free_list); + mutex_unlock(&pq->lock); + } +} + +static inline __le64 ipath_sdma_make_desc0(struct ipath_devdata *dd, + u64 addr, u64 dwlen, u64 dwoffset) +{ + return cpu_to_le64(/* SDmaPhyAddr[31:0] */ + ((addr & 0xfffffffcULL) << 32) | + /* SDmaGeneration[1:0] */ + ((dd->ipath_sdma_generation & 3ULL) << 30) | + /* SDmaDwordCount[10:0] */ + ((dwlen & 0x7ffULL) << 16) | + /* SDmaBufOffset[12:2] */ + (dwoffset & 0x7ffULL)); +} + +static inline __le64 ipath_sdma_make_first_desc0(__le64 descq) +{ + return descq | __constant_cpu_to_le64(1ULL << 12); +} + +static inline __le64 ipath_sdma_make_last_desc0(__le64 descq) +{ + /* last */ /* dma head */ + return descq | __constant_cpu_to_le64(1ULL << 11 | 1ULL << 13); +} + +static inline __le64 ipath_sdma_make_desc1(u64 addr) +{ + /* SDmaPhyAddr[47:32] */ + return cpu_to_le64(addr >> 32); +} + +static void ipath_user_sdma_send_frag(struct ipath_devdata *dd, + struct ipath_user_sdma_pkt *pkt, int idx, + unsigned ofs, u16 tail) +{ + const u64 addr = (u64) pkt->addr[idx].addr + + (u64) pkt->addr[idx].offset; + const u64 dwlen = (u64) pkt->addr[idx].length / 4; + __le64 *descqp; + __le64 descq0; + + descqp = &dd->ipath_sdma_descq[tail].qw[0]; + + descq0 = ipath_sdma_make_desc0(dd, addr, dwlen, ofs); + if (idx == 0) + descq0 = ipath_sdma_make_first_desc0(descq0); + if (idx == pkt->naddr - 1) + descq0 = ipath_sdma_make_last_desc0(descq0); + + descqp[0] = descq0; + descqp[1] = ipath_sdma_make_desc1(addr); +} + +/* pq->lock must be held, get packets on the wire... */ +static int ipath_user_sdma_push_pkts(struct ipath_devdata *dd, + struct ipath_user_sdma_queue *pq, + struct list_head *pktlist) +{ + int ret = 0; + unsigned long flags; + u16 tail; + + if (list_empty(pktlist)) + return 0; + + if (unlikely(!(dd->ipath_flags & IPATH_LINKACTIVE))) + return -ECOMM; + + spin_lock_irqsave(&dd->ipath_sdma_lock, flags); + + if (unlikely(dd->ipath_sdma_status & IPATH_SDMA_ABORT_MASK)) { + ret = -ECOMM; + goto unlock; + } + + tail = dd->ipath_sdma_descq_tail; + while (!list_empty(pktlist)) { + struct ipath_user_sdma_pkt *pkt = + list_entry(pktlist->next, struct ipath_user_sdma_pkt, + list); + int i; + unsigned ofs = 0; + u16 dtail = tail; + + if (pkt->naddr > ipath_sdma_descq_freecnt(dd)) + goto unlock_check_tail; + + for (i = 0; i < pkt->naddr; i++) { + ipath_user_sdma_send_frag(dd, pkt, i, ofs, tail); + ofs += pkt->addr[i].length >> 2; + + if (++tail == dd->ipath_sdma_descq_cnt) { + tail = 0; + ++dd->ipath_sdma_generation; + } + } + + if ((ofs<<2) > dd->ipath_ibmaxlen) { + ipath_dbg("packet size %X > ibmax %X, fail\n", + ofs<<2, dd->ipath_ibmaxlen); + ret = -EMSGSIZE; + goto unlock; + } + + /* + * if the packet is >= 2KB mtu equivalent, we have to use + * the large buffers, and have to mark each descriptor as + * part of a large buffer packet. + */ + if (ofs >= IPATH_SMALLBUF_DWORDS) { + for (i = 0; i < pkt->naddr; i++) { + dd->ipath_sdma_descq[dtail].qw[0] |= + __constant_cpu_to_le64(1ULL << 14); + if (++dtail == dd->ipath_sdma_descq_cnt) + dtail = 0; + } + } + + dd->ipath_sdma_descq_added += pkt->naddr; + pkt->added = dd->ipath_sdma_descq_added; + list_move_tail(&pkt->list, &pq->sent); + ret++; + } + +unlock_check_tail: + /* advance the tail on the chip if necessary */ + if (dd->ipath_sdma_descq_tail != tail) { + wmb(); + ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmatail, tail); + dd->ipath_sdma_descq_tail = tail; + } + +unlock: + spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags); + + return ret; +} + +int ipath_user_sdma_writev(struct ipath_devdata *dd, + struct ipath_user_sdma_queue *pq, + const struct iovec *iov, + unsigned long dim) +{ + int ret = 0; + struct list_head list; + int npkts = 0; + + INIT_LIST_HEAD(&list); + + mutex_lock(&pq->lock); + + if (dd->ipath_sdma_descq_added != dd->ipath_sdma_descq_removed) { + ipath_user_sdma_hwqueue_clean(dd); + ipath_user_sdma_queue_clean(dd, pq); + } + + while (dim) { + const int mxp = 8; + + down_write(¤t->mm->mmap_sem); + ret = ipath_user_sdma_queue_pkts(dd, pq, &list, iov, dim, mxp); + up_write(¤t->mm->mmap_sem); + + if (ret <= 0) + goto done_unlock; + else { + dim -= ret; + iov += ret; + } + + /* force packets onto the sdma hw queue... */ + if (!list_empty(&list)) { + /* + * lazily clean hw queue. the 4 is a guess of about + * how many sdma descriptors a packet will take (it + * doesn't have to be perfect). + */ + if (ipath_sdma_descq_freecnt(dd) < ret * 4) { + ipath_user_sdma_hwqueue_clean(dd); + ipath_user_sdma_queue_clean(dd, pq); + } + + ret = ipath_user_sdma_push_pkts(dd, pq, &list); + if (ret < 0) + goto done_unlock; + else { + npkts += ret; + pq->counter += ret; + + if (!list_empty(&list)) + goto done_unlock; + } + } + } + +done_unlock: + if (!list_empty(&list)) + ipath_user_sdma_free_pkt_list(&dd->pcidev->dev, pq, &list); + mutex_unlock(&pq->lock); + + return (ret < 0) ? ret : npkts; +} + +int ipath_user_sdma_make_progress(struct ipath_devdata *dd, + struct ipath_user_sdma_queue *pq) +{ + int ret = 0; + + mutex_lock(&pq->lock); + ipath_user_sdma_hwqueue_clean(dd); + ret = ipath_user_sdma_queue_clean(dd, pq); + mutex_unlock(&pq->lock); + + return ret; +} + +int ipath_user_sdma_pkt_sent(const struct ipath_user_sdma_queue *pq, + u32 counter) +{ + const u32 scounter = ipath_user_sdma_complete_counter(pq); + const s32 dcounter = scounter - counter; + + return dcounter >= 0; +} + +u32 ipath_user_sdma_complete_counter(const struct ipath_user_sdma_queue *pq) +{ + return pq->sent_counter; +} + +void ipath_user_sdma_set_complete_counter(struct ipath_user_sdma_queue *pq, + u32 c) +{ + pq->sent_counter = c; +} + +u32 ipath_user_sdma_inflight_counter(struct ipath_user_sdma_queue *pq) +{ + return pq->counter; +} + From ralph.campbell at qlogic.com Wed Apr 2 15:50:33 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:50:33 -0700 Subject: [ofa-general] [PATCH 18/20] IB/ipath - misc changes to prepare for iba7220 introduction In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: <20080402225033.28598.56443.stgit@eng-46.mv.qlogic.com> From: Arthur Jones The patch adds a number of minor changes to support newer HCAs * New send buffer control bits * New error condition bits * Locking and initialization changes * More send buffers Signed-off-by: Ralph Campbell --- drivers/infiniband/hw/ipath/ipath_driver.c | 61 ++++++++++++++++++++----- drivers/infiniband/hw/ipath/ipath_file_ops.c | 2 - drivers/infiniband/hw/ipath/ipath_init_chip.c | 24 +++++++--- drivers/infiniband/hw/ipath/ipath_intr.c | 11 +++-- drivers/infiniband/hw/ipath/ipath_kernel.h | 1 drivers/infiniband/hw/ipath/ipath_sysfs.c | 18 ++++--- 6 files changed, 83 insertions(+), 34 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index b4a69ef..66982a9 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -89,6 +89,10 @@ MODULE_LICENSE("GPL"); MODULE_AUTHOR("QLogic "); MODULE_DESCRIPTION("QLogic InfiniPath driver"); +/* + * Table to translate the LINKTRAININGSTATE portion of + * IBCStatus to a human-readable form. + */ const char *ipath_ibcstatus_str[] = { "Disabled", "LinkUp", @@ -103,9 +107,20 @@ const char *ipath_ibcstatus_str[] = { "CfgWaitRmt", "CfgIdle", "RecovRetrain", - "LState0xD", /* unused */ + "CfgTxRevLane", /* unused before IBA7220 */ "RecovWaitRmt", "RecovIdle", + /* below were added for IBA7220 */ + "CfgEnhanced", + "CfgTest", + "CfgWaitRmtTest", + "CfgWaitCfgEnhanced", + "SendTS_T", + "SendTstIdles", + "RcvTS_T", + "SendTst_TS1s", + "LTState18", "LTState19", "LTState1A", "LTState1B", + "LTState1C", "LTState1D", "LTState1E", "LTState1F" }; static void __devexit ipath_remove_one(struct pci_dev *); @@ -333,7 +348,14 @@ static void ipath_verify_pioperf(struct ipath_devdata *dd) ipath_disable_armlaunch(dd); - writeq(0, piobuf); /* length 0, no dwords actually sent */ + /* + * length 0, no dwords actually sent, and mark as VL15 + * on chips where that may matter (due to IB flowcontrol) + */ + if ((dd->ipath_flags & IPATH_HAS_PBC_CNT)) + writeq(1UL << 63, piobuf); + else + writeq(0, piobuf); ipath_flush_wc(); /* @@ -374,6 +396,7 @@ static int __devinit ipath_init_one(struct pci_dev *pdev, struct ipath_devdata *dd; unsigned long long addr; u32 bar0 = 0, bar1 = 0; + u8 rev; dd = ipath_alloc_devdata(pdev); if (IS_ERR(dd)) { @@ -405,7 +428,7 @@ static int __devinit ipath_init_one(struct pci_dev *pdev, } addr = pci_resource_start(pdev, 0); len = pci_resource_len(pdev, 0); - ipath_cdbg(VERBOSE, "regbase (0) %llx len %d pdev->irq %d, vend %x/%x " + ipath_cdbg(VERBOSE, "regbase (0) %llx len %d irq %d, vend %x/%x " "driver_data %lx\n", addr, len, pdev->irq, ent->vendor, ent->device, ent->driver_data); @@ -530,7 +553,13 @@ static int __devinit ipath_init_one(struct pci_dev *pdev, goto bail_regions; } - dd->ipath_pcirev = pdev->revision; + ret = pci_read_config_byte(pdev, PCI_REVISION_ID, &rev); + if (ret) { + ipath_dev_err(dd, "Failed to read PCI revision ID unit " + "%u: err %d\n", dd->ipath_unit, -ret); + goto bail_regions; /* shouldn't ever happen */ + } + dd->ipath_pcirev = rev; #if defined(__powerpc__) /* There isn't a generic way to specify writethrough mappings */ @@ -553,14 +582,6 @@ static int __devinit ipath_init_one(struct pci_dev *pdev, ipath_cdbg(VERBOSE, "mapped io addr %llx to kregbase %p\n", addr, dd->ipath_kregbase); - /* - * clear ipath_flags here instead of in ipath_init_chip as it is set - * by ipath_setup_htconfig. - */ - dd->ipath_flags = 0; - dd->ipath_lli_counter = 0; - dd->ipath_lli_errors = 0; - if (dd->ipath_f_bus(dd, pdev)) ipath_dev_err(dd, "Failed to setup config space; " "continuing anyway\n"); @@ -649,6 +670,10 @@ static void __devexit cleanup_device(struct ipath_devdata *dd) ipath_disable_wc(dd); } + if (dd->ipath_spectriggerhit) + dev_info(&dd->pcidev->dev, "%lu special trigger hits\n", + dd->ipath_spectriggerhit); + if (dd->ipath_pioavailregs_dma) { dma_free_coherent(&dd->pcidev->dev, PAGE_SIZE, (void *) dd->ipath_pioavailregs_dma, @@ -857,7 +882,7 @@ int ipath_wait_linkstate(struct ipath_devdata *dd, u32 state, int msecs) (unsigned long long) ipath_read_kreg64( dd, dd->ipath_kregs->kr_ibcctrl), (unsigned long long) val, - ipath_ibcstatus_str[val & 0xf]); + ipath_ibcstatus_str[val & dd->ibcs_lts_mask]); } return (dd->ipath_flags & state) ? 0 : -ETIMEDOUT; } @@ -906,6 +931,8 @@ int ipath_decode_err(char *buf, size_t blen, ipath_err_t err) strlcat(buf, "rbadversion ", blen); if (err & INFINIPATH_E_RHDR) strlcat(buf, "rhdr ", blen); + if (err & INFINIPATH_E_SENDSPECIALTRIGGER) + strlcat(buf, "sendspecialtrigger ", blen); if (err & INFINIPATH_E_RLONGPKTLEN) strlcat(buf, "rlongpktlen ", blen); if (err & INFINIPATH_E_RMAXPKTLEN) @@ -948,6 +975,8 @@ int ipath_decode_err(char *buf, size_t blen, ipath_err_t err) strlcat(buf, "hardware ", blen); if (err & INFINIPATH_E_RESET) strlcat(buf, "reset ", blen); + if (err & INFINIPATH_E_INVALIDEEPCMD) + strlcat(buf, "invalideepromcmd ", blen); done: return iserr; } @@ -1701,6 +1730,10 @@ bail: */ void ipath_cancel_sends(struct ipath_devdata *dd, int restore_sendctrl) { + if (dd->ipath_flags & IPATH_IB_AUTONEG_INPROG) { + ipath_cdbg(VERBOSE, "Ignore while in autonegotiation\n"); + goto bail; + } ipath_dbg("Cancelling all in-progress send buffers\n"); /* skip armlaunch errs for a while */ @@ -1721,6 +1754,7 @@ void ipath_cancel_sends(struct ipath_devdata *dd, int restore_sendctrl) /* and again, be sure all have hit the chip */ ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); +bail:; } /* @@ -2282,6 +2316,7 @@ static int __init infinipath_init(void) */ idr_init(&unit_table); if (!idr_pre_get(&unit_table, GFP_KERNEL)) { + printk(KERN_ERR IPATH_DRV_NAME ": idr_pre_get() failed\n"); ret = -ENOMEM; goto bail; } diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c index eab69df..b87d312 100644 --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c @@ -2074,7 +2074,7 @@ static int ipath_close(struct inode *in, struct file *fp) pd->port_rcvnowait = pd->port_pionowait = 0; } if (pd->port_flag) { - ipath_dbg("port %u port_flag still set to 0x%lx\n", + ipath_cdbg(PROC, "port %u port_flag set: 0x%lx\n", pd->port_port, pd->port_flag); pd->port_flag = 0; } diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c index 8d8e572..c012e05 100644 --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c @@ -230,6 +230,15 @@ static int init_chip_first(struct ipath_devdata *dd) int ret = 0; u64 val; + spin_lock_init(&dd->ipath_kernel_tid_lock); + spin_lock_init(&dd->ipath_user_tid_lock); + spin_lock_init(&dd->ipath_sendctrl_lock); + spin_lock_init(&dd->ipath_sdma_lock); + spin_lock_init(&dd->ipath_gpio_lock); + spin_lock_init(&dd->ipath_eep_st_lock); + spin_lock_init(&dd->ipath_sdepb_lock); + mutex_init(&dd->ipath_eep_lock); + /* * skip cfgports stuff because we are not allocating memory, * and we don't want problems if the portcnt changed due to @@ -319,12 +328,6 @@ static int init_chip_first(struct ipath_devdata *dd) else ipath_dbg("%u 2k piobufs @ %p\n", dd->ipath_piobcnt2k, dd->ipath_pio2kbase); - spin_lock_init(&dd->ipath_user_tid_lock); - spin_lock_init(&dd->ipath_sendctrl_lock); - spin_lock_init(&dd->ipath_gpio_lock); - spin_lock_init(&dd->ipath_eep_st_lock); - mutex_init(&dd->ipath_eep_lock); - done: return ret; } @@ -553,7 +556,7 @@ static void enable_chip(struct ipath_devdata *dd, int reinit) static int init_housekeeping(struct ipath_devdata *dd, int reinit) { - char boardn[32]; + char boardn[40]; int ret = 0; /* @@ -800,7 +803,12 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit) dd->ipath_pioupd_thresh = kpiobufs; } - dd->ipath_f_early_init(dd); + ret = dd->ipath_f_early_init(dd); + if (ret) { + ipath_dev_err(dd, "Early initialization failure\n"); + goto done; + } + /* * Cancel any possible active sends from early driver load. * Follows early_init because some chips have to initialize diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c index 3bad601..90b972f 100644 --- a/drivers/infiniband/hw/ipath/ipath_intr.c +++ b/drivers/infiniband/hw/ipath/ipath_intr.c @@ -73,7 +73,7 @@ static void ipath_clrpiobuf(struct ipath_devdata *dd, u32 pnum) * If rewrite is true, and bits are set in the sendbufferror registers, * we'll write to the buffer, for error recovery on parity errors. */ -static void ipath_disarm_senderrbufs(struct ipath_devdata *dd, int rewrite) +void ipath_disarm_senderrbufs(struct ipath_devdata *dd, int rewrite) { u32 piobcnt; unsigned long sbuf[4]; @@ -87,12 +87,14 @@ static void ipath_disarm_senderrbufs(struct ipath_devdata *dd, int rewrite) dd, dd->ipath_kregs->kr_sendbuffererror); sbuf[1] = ipath_read_kreg64( dd, dd->ipath_kregs->kr_sendbuffererror + 1); - if (piobcnt > 128) { + if (piobcnt > 128) sbuf[2] = ipath_read_kreg64( dd, dd->ipath_kregs->kr_sendbuffererror + 2); + if (piobcnt > 192) sbuf[3] = ipath_read_kreg64( dd, dd->ipath_kregs->kr_sendbuffererror + 3); - } + else + sbuf[3] = 0; if (sbuf[0] || sbuf[1] || (piobcnt > 128 && (sbuf[2] || sbuf[3]))) { int i; @@ -365,7 +367,8 @@ static void handle_e_ibstatuschanged(struct ipath_devdata *dd, */ if (lastlts == INFINIPATH_IBCS_LT_STATE_POLLACTIVE || lastlts == INFINIPATH_IBCS_LT_STATE_POLLQUIET) { - if (++dd->ipath_ibpollcnt == 40) { + if (!(dd->ipath_flags & IPATH_IB_AUTONEG_INPROG) && + (++dd->ipath_ibpollcnt == 40)) { dd->ipath_flags |= IPATH_NOCABLE; *dd->ipath_statusp |= IPATH_STATUS_IB_NOCABLE; diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index 8cdeab8..1d5adf6 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -1011,6 +1011,7 @@ void ipath_get_eeprom_info(struct ipath_devdata *); int ipath_update_eeprom_log(struct ipath_devdata *dd); void ipath_inc_eeprom_err(struct ipath_devdata *dd, u32 eidx, u32 incr); u64 ipath_snap_cntr(struct ipath_devdata *, ipath_creg); +void ipath_disarm_senderrbufs(struct ipath_devdata *, int); void ipath_force_pio_avail_update(struct ipath_devdata *); void signal_ib_event(struct ipath_devdata *dd, enum ib_event_type ev); diff --git a/drivers/infiniband/hw/ipath/ipath_sysfs.c b/drivers/infiniband/hw/ipath/ipath_sysfs.c index 7961d26..2e6d2aa 100644 --- a/drivers/infiniband/hw/ipath/ipath_sysfs.c +++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c @@ -34,6 +34,7 @@ #include #include "ipath_kernel.h" +#include "ipath_verbs.h" #include "ipath_common.h" /** @@ -320,6 +321,8 @@ static ssize_t store_guid(struct device *dev, dd->ipath_guid = new_guid; dd->ipath_nguid = 1; + if (dd->verbs_dev) + dd->verbs_dev->ibdev.node_guid = new_guid; ret = strlen(buf); goto bail; @@ -928,18 +931,17 @@ static ssize_t store_rx_polinv_enb(struct device *dev, u16 val; ret = ipath_parse_ushort(buf, &val); - if (ret < 0 || val > 1) - goto invalid; + if (ret >= 0 && val > 1) { + ipath_dev_err(dd, + "attempt to set invalid Rx Polarity (enable)\n"); + ret = -EINVAL; + goto bail; + } r = dd->ipath_f_set_ib_cfg(dd, IPATH_IB_CFG_RXPOL_ENB, val); - if (r < 0) { + if (r < 0) ret = r; - goto bail; - } - goto bail; -invalid: - ipath_dev_err(dd, "attempt to set invalid Rx Polarity (enable)\n"); bail: return ret; } From ralph.campbell at qlogic.com Wed Apr 2 15:50:38 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:50:38 -0700 Subject: [ofa-general] [PATCH 19/20] IB/ipath - add calls to new 7220 code and enable in build In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: <20080402225038.28598.43308.stgit@eng-46.mv.qlogic.com> From: Dave Olson This patch adds the initialization calls into the new 7220 HCA files, changes the Makefile to compile and link the new files, and code to handle send DMA. Signed-off-by: Dave Olson --- drivers/infiniband/hw/ipath/Makefile | 3 drivers/infiniband/hw/ipath/ipath_common.h | 16 + drivers/infiniband/hw/ipath/ipath_driver.c | 150 ++++++++-- drivers/infiniband/hw/ipath/ipath_file_ops.c | 97 ++++++ drivers/infiniband/hw/ipath/ipath_init_chip.c | 4 drivers/infiniband/hw/ipath/ipath_intr.c | 239 +++++++++++---- drivers/infiniband/hw/ipath/ipath_kernel.h | 4 drivers/infiniband/hw/ipath/ipath_qp.c | 14 + drivers/infiniband/hw/ipath/ipath_ruc.c | 18 + drivers/infiniband/hw/ipath/ipath_sdma.c | 91 ++++-- drivers/infiniband/hw/ipath/ipath_stats.c | 4 drivers/infiniband/hw/ipath/ipath_ud.c | 1 drivers/infiniband/hw/ipath/ipath_verbs.c | 391 ++++++++++++++++++++++++- 13 files changed, 896 insertions(+), 136 deletions(-) diff --git a/drivers/infiniband/hw/ipath/Makefile b/drivers/infiniband/hw/ipath/Makefile index fe67388..75a6c91 100644 --- a/drivers/infiniband/hw/ipath/Makefile +++ b/drivers/infiniband/hw/ipath/Makefile @@ -20,17 +20,20 @@ ib_ipath-y := \ ipath_qp.o \ ipath_rc.o \ ipath_ruc.o \ + ipath_sdma.o \ ipath_srq.o \ ipath_stats.o \ ipath_sysfs.o \ ipath_uc.o \ ipath_ud.o \ ipath_user_pages.o \ + ipath_user_sdma.o \ ipath_verbs_mcast.o \ ipath_verbs.o ib_ipath-$(CONFIG_HT_IRQ) += ipath_iba6110.o ib_ipath-$(CONFIG_PCI_MSI) += ipath_iba6120.o +ib_ipath-$(CONFIG_PCI_MSI) += ipath_iba7220.o ipath_sd7220.o ipath_sd7220_img.o ib_ipath-$(CONFIG_X86_64) += ipath_wc_x86_64.o ib_ipath-$(CONFIG_PPC64) += ipath_wc_ppc64.o diff --git a/drivers/infiniband/hw/ipath/ipath_common.h b/drivers/infiniband/hw/ipath/ipath_common.h index 02fd310..2cf7cd2 100644 --- a/drivers/infiniband/hw/ipath/ipath_common.h +++ b/drivers/infiniband/hw/ipath/ipath_common.h @@ -447,8 +447,9 @@ struct ipath_user_info { #define IPATH_CMD_PIOAVAILUPD 27 /* force an update of PIOAvail reg */ #define IPATH_CMD_POLL_TYPE 28 /* set the kind of polling we want */ #define IPATH_CMD_ARMLAUNCH_CTRL 29 /* armlaunch detection control */ - -#define IPATH_CMD_MAX 29 +/* 30 is unused */ +#define IPATH_CMD_SDMA_INFLIGHT 31 /* sdma inflight counter request */ +#define IPATH_CMD_SDMA_COMPLETE 32 /* sdma completion counter request */ /* * Poll types @@ -486,6 +487,17 @@ struct ipath_cmd { union { struct ipath_tid_info tid_info; struct ipath_user_info user_info; + + /* + * address in userspace where we should put the sdma + * inflight counter + */ + __u64 sdma_inflight; + /* + * address in userspace where we should put the sdma + * completion counter + */ + __u64 sdma_complete; /* address in userspace of struct ipath_port_info to write result to */ __u64 port_info; diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index 66982a9..8ccc915 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -129,8 +129,10 @@ static int __devinit ipath_init_one(struct pci_dev *, /* Only needed for registration, nothing else needs this info */ #define PCI_VENDOR_ID_PATHSCALE 0x1fc1 +#define PCI_VENDOR_ID_QLOGIC 0x1077 #define PCI_DEVICE_ID_INFINIPATH_HT 0xd #define PCI_DEVICE_ID_INFINIPATH_PE800 0x10 +#define PCI_DEVICE_ID_INFINIPATH_7220 0x7220 /* Number of seconds before our card status check... */ #define STATUS_TIMEOUT 60 @@ -138,6 +140,7 @@ static int __devinit ipath_init_one(struct pci_dev *, static const struct pci_device_id ipath_pci_tbl[] = { { PCI_DEVICE(PCI_VENDOR_ID_PATHSCALE, PCI_DEVICE_ID_INFINIPATH_HT) }, { PCI_DEVICE(PCI_VENDOR_ID_PATHSCALE, PCI_DEVICE_ID_INFINIPATH_PE800) }, + { PCI_DEVICE(PCI_VENDOR_ID_QLOGIC, PCI_DEVICE_ID_INFINIPATH_7220) }, { 0, } }; @@ -532,6 +535,13 @@ static int __devinit ipath_init_one(struct pci_dev *pdev, "CONFIG_PCI_MSI is not enabled\n", ent->device); return -ENODEV; #endif + case PCI_DEVICE_ID_INFINIPATH_7220: +#ifndef CONFIG_PCI_MSI + ipath_dbg("CONFIG_PCI_MSI is not enabled, " + "using IntX for unit %u\n", dd->ipath_unit); +#endif + ipath_init_iba7220_funcs(dd); + break; default: ipath_dev_err(dd, "Found unknown QLogic deviceid 0x%x, " "failing\n", ent->device); @@ -887,13 +897,47 @@ int ipath_wait_linkstate(struct ipath_devdata *dd, u32 state, int msecs) return (dd->ipath_flags & state) ? 0 : -ETIMEDOUT; } +static void decode_sdma_errs(struct ipath_devdata *dd, ipath_err_t err, + char *buf, size_t blen) +{ + static const struct { + ipath_err_t err; + const char *msg; + } errs[] = { + { INFINIPATH_E_SDMAGENMISMATCH, "SDmaGenMismatch" }, + { INFINIPATH_E_SDMAOUTOFBOUND, "SDmaOutOfBound" }, + { INFINIPATH_E_SDMATAILOUTOFBOUND, "SDmaTailOutOfBound" }, + { INFINIPATH_E_SDMABASE, "SDmaBase" }, + { INFINIPATH_E_SDMA1STDESC, "SDma1stDesc" }, + { INFINIPATH_E_SDMARPYTAG, "SDmaRpyTag" }, + { INFINIPATH_E_SDMADWEN, "SDmaDwEn" }, + { INFINIPATH_E_SDMAMISSINGDW, "SDmaMissingDw" }, + { INFINIPATH_E_SDMAUNEXPDATA, "SDmaUnexpData" }, + { INFINIPATH_E_SDMADESCADDRMISALIGN, "SDmaDescAddrMisalign" }, + { INFINIPATH_E_SENDBUFMISUSE, "SendBufMisuse" }, + { INFINIPATH_E_SDMADISABLED, "SDmaDisabled" }, + }; + int i; + int expected; + size_t bidx = 0; + + for (i = 0; i < ARRAY_SIZE(errs); i++) { + expected = (errs[i].err != INFINIPATH_E_SDMADISABLED) ? 0 : + test_bit(IPATH_SDMA_ABORTING, &dd->ipath_sdma_status); + if ((err & errs[i].err) && !expected) + bidx += snprintf(buf + bidx, blen - bidx, + "%s ", errs[i].msg); + } +} + /* * Decode the error status into strings, deciding whether to always * print * it or not depending on "normal packet errors" vs everything * else. Return 1 if "real" errors, otherwise 0 if only packet * errors, so caller can decide what to print with the string. */ -int ipath_decode_err(char *buf, size_t blen, ipath_err_t err) +int ipath_decode_err(struct ipath_devdata *dd, char *buf, size_t blen, + ipath_err_t err) { int iserr = 1; *buf = '\0'; @@ -975,6 +1019,8 @@ int ipath_decode_err(char *buf, size_t blen, ipath_err_t err) strlcat(buf, "hardware ", blen); if (err & INFINIPATH_E_RESET) strlcat(buf, "reset ", blen); + if (err & INFINIPATH_E_SDMAERRS) + decode_sdma_errs(dd, err, buf, blen); if (err & INFINIPATH_E_INVALIDEEPCMD) strlcat(buf, "invalideepromcmd ", blen); done: @@ -1730,30 +1776,80 @@ bail: */ void ipath_cancel_sends(struct ipath_devdata *dd, int restore_sendctrl) { + unsigned long flags; + if (dd->ipath_flags & IPATH_IB_AUTONEG_INPROG) { ipath_cdbg(VERBOSE, "Ignore while in autonegotiation\n"); goto bail; } + /* + * If we have SDMA, and it's not disabled, we have to kick off the + * abort state machine, provided we aren't already aborting. + * If we are in the process of aborting SDMA (!DISABLED, but ABORTING), + * we skip the rest of this routine. It is already "in progress" + */ + if (dd->ipath_flags & IPATH_HAS_SEND_DMA) { + int skip_cancel; + u64 *statp = &dd->ipath_sdma_status; + + spin_lock_irqsave(&dd->ipath_sdma_lock, flags); + skip_cancel = + !test_bit(IPATH_SDMA_DISABLED, statp) && + test_and_set_bit(IPATH_SDMA_ABORTING, statp); + spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags); + if (skip_cancel) + goto bail; + } + ipath_dbg("Cancelling all in-progress send buffers\n"); /* skip armlaunch errs for a while */ dd->ipath_lastcancel = jiffies + HZ / 2; /* - * the abort bit is auto-clearing. We read scratch to be sure - * that cancels and the abort have taken effect in the chip. + * The abort bit is auto-clearing. We also don't want pioavail + * update happening during this, and we don't want any other + * sends going out, so turn those off for the duration. We read + * the scratch register to be sure that cancels and the abort + * have taken effect in the chip. Otherwise two parts are same + * as ipath_force_pio_avail_update() */ + spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags); + dd->ipath_sendctrl &= ~(INFINIPATH_S_PIOBUFAVAILUPD + | INFINIPATH_S_PIOENABLE); ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, - INFINIPATH_S_ABORT); + dd->ipath_sendctrl | INFINIPATH_S_ABORT); ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); + spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags); + + /* disarm all send buffers */ ipath_disarm_piobufs(dd, 0, - (unsigned)(dd->ipath_piobcnt2k + dd->ipath_piobcnt4k)); - if (restore_sendctrl) /* else done by caller later */ + dd->ipath_piobcnt2k + dd->ipath_piobcnt4k); + + if (restore_sendctrl) { + /* else done by caller later if needed */ + spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags); + dd->ipath_sendctrl |= INFINIPATH_S_PIOBUFAVAILUPD | + INFINIPATH_S_PIOENABLE; ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, - dd->ipath_sendctrl); + dd->ipath_sendctrl); + /* and again, be sure all have hit the chip */ + ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); + spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags); + } - /* and again, be sure all have hit the chip */ - ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); + if ((dd->ipath_flags & IPATH_HAS_SEND_DMA) && + !test_bit(IPATH_SDMA_DISABLED, &dd->ipath_sdma_status) && + test_bit(IPATH_SDMA_RUNNING, &dd->ipath_sdma_status)) { + spin_lock_irqsave(&dd->ipath_sdma_lock, flags); + /* only wait so long for intr */ + dd->ipath_sdma_abort_intr_timeout = jiffies + HZ; + dd->ipath_sdma_reset_wait = 200; + __set_bit(IPATH_SDMA_DISARMED, &dd->ipath_sdma_status); + if (!test_bit(IPATH_SDMA_SHUTDOWN, &dd->ipath_sdma_status)) + tasklet_hi_schedule(&dd->ipath_sdma_abort_task); + spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags); + } bail:; } @@ -1952,7 +2048,7 @@ bail: * sanity checking on this, and we don't deal with what happens to * programs that are already running when the size changes. * NOTE: changing the MTU will usually cause the IBC to go back to - * link initialize (IPATH_IBSTATE_INIT) state... + * link INIT state... */ int ipath_set_mtu(struct ipath_devdata *dd, u16 arg) { @@ -2092,9 +2188,8 @@ static void ipath_run_led_override(unsigned long opaque) * but leave that to per-chip functions. */ val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibcstatus); - ltstate = (val >> INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT) & - dd->ibcs_lts_mask; - lstate = (val >> dd->ibcs_ls_shift) & INFINIPATH_IBCS_LINKSTATE_MASK; + ltstate = ipath_ib_linktrstate(dd, val); + lstate = ipath_ib_linkstate(dd, val); dd->ipath_f_setextled(dd, lstate, ltstate); mod_timer(&dd->ipath_led_override_timer, jiffies + timeoff); @@ -2170,6 +2265,9 @@ void ipath_shutdown_device(struct ipath_devdata *dd) ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, dd->ipath_rcvctrl); + if (dd->ipath_flags & IPATH_HAS_SEND_DMA) + teardown_sdma(dd); + /* * gracefully stop all sends allowing any in progress to trickle out * first. @@ -2187,9 +2285,16 @@ void ipath_shutdown_device(struct ipath_devdata *dd) */ udelay(5); + dd->ipath_f_setextled(dd, 0, 0); /* make sure LEDs are off */ + ipath_set_ib_lstate(dd, 0, INFINIPATH_IBCC_LINKINITCMD_DISABLE); ipath_cancel_sends(dd, 0); + /* + * we are shutting down, so tell components that care. We don't do + * this on just a link state change, much like ethernet, a cable + * unplug, etc. doesn't change driver state + */ signal_ib_event(dd, IB_EVENT_PORT_ERR); /* disable IBC */ @@ -2214,6 +2319,10 @@ void ipath_shutdown_device(struct ipath_devdata *dd) del_timer_sync(&dd->ipath_intrchk_timer); dd->ipath_intrchk_timer.data = 0; } + if (atomic_read(&dd->ipath_led_override_timer_active)) { + del_timer_sync(&dd->ipath_led_override_timer); + atomic_set(&dd->ipath_led_override_timer_active, 0); + } /* * clear all interrupts and errors, so that the next time the driver @@ -2408,13 +2517,18 @@ int ipath_reset_device(int unit) } } + if (dd->ipath_flags & IPATH_HAS_SEND_DMA) + teardown_sdma(dd); + dd->ipath_flags &= ~IPATH_INITTED; + ipath_write_kreg(dd, dd->ipath_kregs->kr_intmask, 0ULL); ret = dd->ipath_f_reset(dd); - if (ret != 1) - ipath_dbg("reset was not successful\n"); - ipath_dbg("Trying to reinitialize unit %u after reset attempt\n", - unit); - ret = ipath_init_chip(dd, 1); + if (ret == 1) { + ipath_dbg("Reinitializing unit %u after reset attempt\n", + unit); + ret = ipath_init_chip(dd, 1); + } else + ret = -EAGAIN; if (ret) ipath_dev_err(dd, "Reinitialize unit %u after " "reset failed with %d\n", unit, ret); diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c index b87d312..d38ba29 100644 --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c @@ -36,21 +36,28 @@ #include #include #include +#include +#include +#include #include #include "ipath_kernel.h" #include "ipath_common.h" +#include "ipath_user_sdma.h" static int ipath_open(struct inode *, struct file *); static int ipath_close(struct inode *, struct file *); static ssize_t ipath_write(struct file *, const char __user *, size_t, loff_t *); +static ssize_t ipath_writev(struct kiocb *, const struct iovec *, + unsigned long , loff_t); static unsigned int ipath_poll(struct file *, struct poll_table_struct *); static int ipath_mmap(struct file *, struct vm_area_struct *); static const struct file_operations ipath_file_ops = { .owner = THIS_MODULE, .write = ipath_write, + .aio_write = ipath_writev, .open = ipath_open, .release = ipath_close, .poll = ipath_poll, @@ -1870,10 +1877,9 @@ static int ipath_assign_port(struct file *fp, if (ipath_compatible_subports(swmajor, swminor) && uinfo->spu_subport_cnt && (ret = find_shared_port(fp, uinfo))) { - mutex_unlock(&ipath_mutex); if (ret > 0) ret = 0; - goto done; + goto done_chk_sdma; } i_minor = iminor(fp->f_path.dentry->d_inode) - IPATH_USER_MINOR_BASE; @@ -1885,6 +1891,21 @@ static int ipath_assign_port(struct file *fp, else ret = find_best_unit(fp, uinfo); +done_chk_sdma: + if (!ret) { + struct ipath_filedata *fd = fp->private_data; + const struct ipath_portdata *pd = fd->pd; + const struct ipath_devdata *dd = pd->port_dd; + + fd->pq = ipath_user_sdma_queue_create(&dd->pcidev->dev, + dd->ipath_unit, + pd->port_port, + fd->subport); + + if (!fd->pq) + ret = -ENOMEM; + } + mutex_unlock(&ipath_mutex); done: @@ -2042,6 +2063,13 @@ static int ipath_close(struct inode *in, struct file *fp) mutex_unlock(&ipath_mutex); goto bail; } + + dd = pd->port_dd; + + /* drain user sdma queue */ + ipath_user_sdma_queue_drain(dd, fd->pq); + ipath_user_sdma_queue_destroy(fd->pq); + if (--pd->port_cnt) { /* * XXX If the master closes the port before the slave(s), @@ -2054,7 +2082,6 @@ static int ipath_close(struct inode *in, struct file *fp) goto bail; } port = pd->port_port; - dd = pd->port_dd; if (pd->port_hdrqfull) { ipath_cdbg(PROC, "%s[%u] had %u rcvhdrqfull errors " @@ -2176,6 +2203,35 @@ static int ipath_get_slave_info(struct ipath_portdata *pd, return ret; } +static int ipath_sdma_get_inflight(struct ipath_user_sdma_queue *pq, + u32 __user *inflightp) +{ + const u32 val = ipath_user_sdma_inflight_counter(pq); + + if (put_user(val, inflightp)) + return -EFAULT; + + return 0; +} + +static int ipath_sdma_get_complete(struct ipath_devdata *dd, + struct ipath_user_sdma_queue *pq, + u32 __user *completep) +{ + u32 val; + int err; + + err = ipath_user_sdma_make_progress(dd, pq); + if (err < 0) + return err; + + val = ipath_user_sdma_complete_counter(pq); + if (put_user(val, completep)) + return -EFAULT; + + return 0; +} + static ssize_t ipath_write(struct file *fp, const char __user *data, size_t count, loff_t *off) { @@ -2250,6 +2306,16 @@ static ssize_t ipath_write(struct file *fp, const char __user *data, dest = &cmd.cmd.armlaunch_ctrl; src = &ucmd->cmd.armlaunch_ctrl; break; + case IPATH_CMD_SDMA_INFLIGHT: + copy = sizeof(cmd.cmd.sdma_inflight); + dest = &cmd.cmd.sdma_inflight; + src = &ucmd->cmd.sdma_inflight; + break; + case IPATH_CMD_SDMA_COMPLETE: + copy = sizeof(cmd.cmd.sdma_complete); + dest = &cmd.cmd.sdma_complete; + src = &ucmd->cmd.sdma_complete; + break; default: ret = -EINVAL; goto bail; @@ -2331,6 +2397,17 @@ static ssize_t ipath_write(struct file *fp, const char __user *data, else ipath_disable_armlaunch(pd->port_dd); break; + case IPATH_CMD_SDMA_INFLIGHT: + ret = ipath_sdma_get_inflight(user_sdma_queue_fp(fp), + (u32 __user *) (unsigned long) + cmd.cmd.sdma_inflight); + break; + case IPATH_CMD_SDMA_COMPLETE: + ret = ipath_sdma_get_complete(pd->port_dd, + user_sdma_queue_fp(fp), + (u32 __user *) (unsigned long) + cmd.cmd.sdma_complete); + break; } if (ret >= 0) @@ -2340,6 +2417,20 @@ bail: return ret; } +static ssize_t ipath_writev(struct kiocb *iocb, const struct iovec *iov, + unsigned long dim, loff_t off) +{ + struct file *filp = iocb->ki_filp; + struct ipath_filedata *fp = filp->private_data; + struct ipath_portdata *pd = port_fp(filp); + struct ipath_user_sdma_queue *pq = fp->pq; + + if (!dim) + return -EINVAL; + + return ipath_user_sdma_writev(pd->port_dd, pq, iov, dim); +} + static struct class *ipath_class; static int init_cdev(int minor, char *name, const struct file_operations *fops, diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c index c012e05..b43c2a1 100644 --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c @@ -980,6 +980,10 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit) dd->ipath_stats_timer_active = 1; } + /* Set up SendDMA if chip supports it */ + if (dd->ipath_flags & IPATH_HAS_SEND_DMA) + ret = setup_sdma(dd); + /* Set up HoL state */ init_timer(&dd->ipath_hol_timer); dd->ipath_hol_timer.function = ipath_hol_event; diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c index 90b972f..d0088d5 100644 --- a/drivers/infiniband/hw/ipath/ipath_intr.c +++ b/drivers/infiniband/hw/ipath/ipath_intr.c @@ -433,6 +433,8 @@ static void handle_e_ibstatuschanged(struct ipath_devdata *dd, dd->ipath_flags &= ~(IPATH_LINKUNK | IPATH_LINKINIT | IPATH_LINKDOWN | IPATH_LINKARMED | IPATH_NOCABLE); + if (dd->ipath_flags & IPATH_HAS_SEND_DMA) + ipath_restart_sdma(dd); signal_ib_event(dd, IB_EVENT_PORT_ACTIVE); /* LED active not handled in chip _f_updown */ dd->ipath_f_setextled(dd, lstate, ltstate); @@ -480,7 +482,7 @@ done: } static void handle_supp_msgs(struct ipath_devdata *dd, - unsigned supp_msgs, char *msg, int msgsz) + unsigned supp_msgs, char *msg, u32 msgsz) { /* * Print the message unless it's ibc status change only, which @@ -488,12 +490,19 @@ static void handle_supp_msgs(struct ipath_devdata *dd, */ if (dd->ipath_lasterror & ~INFINIPATH_E_IBSTATUSCHANGED) { int iserr; - iserr = ipath_decode_err(msg, msgsz, + ipath_err_t mask; + iserr = ipath_decode_err(dd, msg, msgsz, dd->ipath_lasterror & ~INFINIPATH_E_IBSTATUSCHANGED); - if (dd->ipath_lasterror & - ~(INFINIPATH_E_RRCVEGRFULL | - INFINIPATH_E_RRCVHDRFULL | INFINIPATH_E_PKTERRS)) + + mask = INFINIPATH_E_RRCVEGRFULL | INFINIPATH_E_RRCVHDRFULL | + INFINIPATH_E_PKTERRS | INFINIPATH_E_SDMADISABLED; + + /* if we're in debug, then don't mask SDMADISABLED msgs */ + if (ipath_debug & __IPATH_DBG) + mask &= ~INFINIPATH_E_SDMADISABLED; + + if (dd->ipath_lasterror & ~mask) ipath_dev_err(dd, "Suppressed %u messages for " "fast-repeating errors (%s) (%llx)\n", supp_msgs, msg, @@ -520,7 +529,7 @@ static void handle_supp_msgs(struct ipath_devdata *dd, static unsigned handle_frequent_errors(struct ipath_devdata *dd, ipath_err_t errs, char *msg, - int msgsz, int *noprint) + u32 msgsz, int *noprint) { unsigned long nc; static unsigned long nextmsg_time; @@ -550,19 +559,125 @@ static unsigned handle_frequent_errors(struct ipath_devdata *dd, return supp_msgs; } +static void handle_sdma_errors(struct ipath_devdata *dd, ipath_err_t errs) +{ + unsigned long flags; + int expected; + + if (ipath_debug & __IPATH_DBG) { + char msg[128]; + ipath_decode_err(dd, msg, sizeof msg, errs & + INFINIPATH_E_SDMAERRS); + ipath_dbg("errors %lx (%s)\n", (unsigned long)errs, msg); + } + if (ipath_debug & __IPATH_VERBDBG) { + unsigned long tl, hd, status, lengen; + tl = ipath_read_kreg64(dd, dd->ipath_kregs->kr_senddmatail); + hd = ipath_read_kreg64(dd, dd->ipath_kregs->kr_senddmahead); + status = ipath_read_kreg64(dd + , dd->ipath_kregs->kr_senddmastatus); + lengen = ipath_read_kreg64(dd, + dd->ipath_kregs->kr_senddmalengen); + ipath_cdbg(VERBOSE, "sdma tl 0x%lx hd 0x%lx status 0x%lx " + "lengen 0x%lx\n", tl, hd, status, lengen); + } + + spin_lock_irqsave(&dd->ipath_sdma_lock, flags); + __set_bit(IPATH_SDMA_DISABLED, &dd->ipath_sdma_status); + expected = test_bit(IPATH_SDMA_ABORTING, &dd->ipath_sdma_status); + spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags); + if (!expected) + ipath_cancel_sends(dd, 1); +} + +static void handle_sdma_intr(struct ipath_devdata *dd, u64 istat) +{ + unsigned long flags; + int expected; + + if ((istat & INFINIPATH_I_SDMAINT) && + !test_bit(IPATH_SDMA_SHUTDOWN, &dd->ipath_sdma_status)) + ipath_sdma_intr(dd); + + if (istat & INFINIPATH_I_SDMADISABLED) { + expected = test_bit(IPATH_SDMA_ABORTING, + &dd->ipath_sdma_status); + ipath_dbg("%s SDmaDisabled intr\n", + expected ? "expected" : "unexpected"); + spin_lock_irqsave(&dd->ipath_sdma_lock, flags); + __set_bit(IPATH_SDMA_DISABLED, &dd->ipath_sdma_status); + spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags); + if (!expected) + ipath_cancel_sends(dd, 1); + if (!test_bit(IPATH_SDMA_SHUTDOWN, &dd->ipath_sdma_status)) + tasklet_hi_schedule(&dd->ipath_sdma_abort_task); + } +} + +static int handle_hdrq_full(struct ipath_devdata *dd) +{ + int chkerrpkts = 0; + u32 hd, tl; + u32 i; + + ipath_stats.sps_hdrqfull++; + for (i = 0; i < dd->ipath_cfgports; i++) { + struct ipath_portdata *pd = dd->ipath_pd[i]; + + if (i == 0) { + /* + * For kernel receive queues, we just want to know + * if there are packets in the queue that we can + * process. + */ + if (pd->port_head != ipath_get_hdrqtail(pd)) + chkerrpkts |= 1 << i; + continue; + } + + /* Skip if user context is not open */ + if (!pd || !pd->port_cnt) + continue; + + /* Don't report the same point multiple times. */ + if (dd->ipath_flags & IPATH_NODMA_RTAIL) + tl = ipath_read_ureg32(dd, ur_rcvhdrtail, i); + else + tl = ipath_get_rcvhdrtail(pd); + if (tl == pd->port_lastrcvhdrqtail) + continue; + + hd = ipath_read_ureg32(dd, ur_rcvhdrhead, i); + if (hd == (tl + 1) || (!hd && tl == dd->ipath_hdrqlast)) { + pd->port_lastrcvhdrqtail = tl; + pd->port_hdrqfull++; + /* flush hdrqfull so that poll() sees it */ + wmb(); + wake_up_interruptible(&pd->port_wait); + } + } + + return chkerrpkts; +} + static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs) { char msg[128]; u64 ignore_this_time = 0; - int i, iserr = 0; + u64 iserr = 0; int chkerrpkts = 0, noprint = 0; unsigned supp_msgs; int log_idx; - supp_msgs = handle_frequent_errors(dd, errs, msg, sizeof msg, &noprint); + /* + * don't report errors that are masked, either at init + * (not set in ipath_errormask), or temporarily (set in + * ipath_maskederrs) + */ + errs &= dd->ipath_errormask & ~dd->ipath_maskederrs; - /* don't report errors that are masked */ - errs &= ~dd->ipath_maskederrs; + supp_msgs = handle_frequent_errors(dd, errs, msg, (u32)sizeof msg, + &noprint); /* do these first, they are most important */ if (errs & INFINIPATH_E_HARDWARE) { @@ -577,6 +692,9 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs) } } + if (errs & INFINIPATH_E_SDMAERRS) + handle_sdma_errors(dd, errs); + if (!noprint && (errs & ~dd->ipath_e_bitsextant)) ipath_dev_err(dd, "error interrupt with unknown errors " "%llx set\n", (unsigned long long) @@ -611,7 +729,7 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs) dd->ipath_errormask &= ~dd->ipath_maskederrs; ipath_write_kreg(dd, dd->ipath_kregs->kr_errormask, dd->ipath_errormask); - s_iserr = ipath_decode_err(msg, sizeof msg, + s_iserr = ipath_decode_err(dd, msg, sizeof msg, dd->ipath_maskederrs); if (dd->ipath_maskederrs & @@ -661,26 +779,43 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs) INFINIPATH_E_IBSTATUSCHANGED); } - /* likely due to cancel, so suppress */ + if (errs & INFINIPATH_E_SENDSPECIALTRIGGER) { + dd->ipath_spectriggerhit++; + ipath_dbg("%lu special trigger hits\n", + dd->ipath_spectriggerhit); + } + + /* likely due to cancel; so suppress message unless verbose */ if ((errs & (INFINIPATH_E_SPKTLEN | INFINIPATH_E_SPIOARMLAUNCH)) && dd->ipath_lastcancel > jiffies) { - ipath_dbg("Suppressed armlaunch/spktlen after error send cancel\n"); + /* armlaunch takes precedence; it often causes both. */ + ipath_cdbg(VERBOSE, + "Suppressed %s error (%llx) after sendbuf cancel\n", + (errs & INFINIPATH_E_SPIOARMLAUNCH) ? + "armlaunch" : "sendpktlen", (unsigned long long)errs); errs &= ~(INFINIPATH_E_SPIOARMLAUNCH | INFINIPATH_E_SPKTLEN); } if (!errs) return 0; - if (!noprint) + if (!noprint) { + ipath_err_t mask; /* - * the ones we mask off are handled specially below or above + * The ones we mask off are handled specially below + * or above. Also mask SDMADISABLED by default as it + * is too chatty. */ - ipath_decode_err(msg, sizeof msg, - errs & ~(INFINIPATH_E_IBSTATUSCHANGED | - INFINIPATH_E_RRCVEGRFULL | - INFINIPATH_E_RRCVHDRFULL | - INFINIPATH_E_HARDWARE)); - else + mask = INFINIPATH_E_IBSTATUSCHANGED | + INFINIPATH_E_RRCVEGRFULL | INFINIPATH_E_RRCVHDRFULL | + INFINIPATH_E_HARDWARE | INFINIPATH_E_SDMADISABLED; + + /* if we're in debug, then don't mask SDMADISABLED msgs */ + if (ipath_debug & __IPATH_DBG) + mask &= ~INFINIPATH_E_SDMADISABLED; + + ipath_decode_err(dd, msg, sizeof msg, errs & ~mask); + } else /* so we don't need if (!noprint) at strlcat's below */ *msg = 0; @@ -705,39 +840,8 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs) * fast_stats, no more than every 5 seconds, user ports get printed * on close */ - if (errs & INFINIPATH_E_RRCVHDRFULL) { - u32 hd, tl; - ipath_stats.sps_hdrqfull++; - for (i = 0; i < dd->ipath_cfgports; i++) { - struct ipath_portdata *pd = dd->ipath_pd[i]; - if (i == 0) { - hd = pd->port_head; - tl = ipath_get_hdrqtail(pd); - } else if (pd && pd->port_cnt && - pd->port_rcvhdrtail_kvaddr) { - /* - * don't report same point multiple times, - * except kernel - */ - tl = *(u64 *) pd->port_rcvhdrtail_kvaddr; - if (tl == pd->port_lastrcvhdrqtail) - continue; - hd = ipath_read_ureg32(dd, ur_rcvhdrhead, - i); - } else - continue; - if (hd == (tl + 1) || - (!hd && tl == dd->ipath_hdrqlast)) { - if (i == 0) - chkerrpkts = 1; - pd->port_lastrcvhdrqtail = tl; - pd->port_hdrqfull++; - /* flush hdrqfull so that poll() sees it */ - wmb(); - wake_up_interruptible(&pd->port_wait); - } - } - } + if (errs & INFINIPATH_E_RRCVHDRFULL) + chkerrpkts |= handle_hdrq_full(dd); if (errs & INFINIPATH_E_RRCVEGRFULL) { struct ipath_portdata *pd = dd->ipath_pd[0]; @@ -749,7 +853,7 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs) */ ipath_stats.sps_etidfull++; if (pd->port_head != ipath_get_hdrqtail(pd)) - chkerrpkts = 1; + chkerrpkts |= 1; } /* @@ -788,9 +892,6 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs) if (!noprint && *msg) { if (iserr) ipath_dev_err(dd, "%s error\n", msg); - else - dev_info(&dd->pcidev->dev, "%s packet problems\n", - msg); } if (dd->ipath_state_wanted & dd->ipath_flags) { ipath_cdbg(VERBOSE, "driver wanted state %x, iflags now %x, " @@ -1017,7 +1118,7 @@ static void handle_urcv(struct ipath_devdata *dd, u64 istat) irqreturn_t ipath_intr(int irq, void *data) { struct ipath_devdata *dd = data; - u32 istat, chk0rcv = 0; + u64 istat, chk0rcv = 0; ipath_err_t estat = 0; irqreturn_t ret; static unsigned unexpected = 0; @@ -1070,17 +1171,17 @@ irqreturn_t ipath_intr(int irq, void *data) if (unlikely(istat & ~dd->ipath_i_bitsextant)) ipath_dev_err(dd, - "interrupt with unknown interrupts %x set\n", - istat & (u32) ~ dd->ipath_i_bitsextant); - else - ipath_cdbg(VERBOSE, "intr stat=0x%x\n", istat); + "interrupt with unknown interrupts %Lx set\n", + istat & ~dd->ipath_i_bitsextant); + else if (istat & ~INFINIPATH_I_ERROR) /* errors do own printing */ + ipath_cdbg(VERBOSE, "intr stat=0x%Lx\n", istat); - if (unlikely(istat & INFINIPATH_I_ERROR)) { + if (istat & INFINIPATH_I_ERROR) { ipath_stats.sps_errints++; estat = ipath_read_kreg64(dd, dd->ipath_kregs->kr_errorstatus); if (!estat) - dev_info(&dd->pcidev->dev, "error interrupt (%x), " + dev_info(&dd->pcidev->dev, "error interrupt (%Lx), " "but no error bits set!\n", istat); else if (estat == -1LL) /* @@ -1198,6 +1299,9 @@ irqreturn_t ipath_intr(int irq, void *data) (dd->ipath_i_rcvurg_mask << dd->ipath_i_rcvurg_shift))) handle_urcv(dd, istat); + if (istat & (INFINIPATH_I_SDMAINT | INFINIPATH_I_SDMADISABLED)) + handle_sdma_intr(dd, istat); + if (istat & INFINIPATH_I_SPIOBUFAVAIL) { unsigned long flags; @@ -1208,7 +1312,10 @@ irqreturn_t ipath_intr(int irq, void *data) ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags); - handle_layer_pioavail(dd); + if (!(dd->ipath_flags & IPATH_HAS_SEND_DMA)) + handle_layer_pioavail(dd); + else + ipath_dbg("unexpected BUFAVAIL intr\n"); } ret = IRQ_HANDLED; diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index 1d5adf6..a4857b9 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -872,7 +872,8 @@ struct sk_buff *ipath_alloc_skb(struct ipath_devdata *dd, gfp_t); extern int ipath_diag_inuse; irqreturn_t ipath_intr(int irq, void *devid); -int ipath_decode_err(char *buf, size_t blen, ipath_err_t err); +int ipath_decode_err(struct ipath_devdata *dd, char *buf, size_t blen, + ipath_err_t err); #if __IPATH_INFO || __IPATH_DBG extern const char *ipath_ibcstatus_str[]; #endif @@ -1027,6 +1028,7 @@ void ipath_set_led_override(struct ipath_devdata *dd, unsigned int val); /* send dma routines */ int setup_sdma(struct ipath_devdata *); void teardown_sdma(struct ipath_devdata *); +void ipath_restart_sdma(struct ipath_devdata *); void ipath_sdma_intr(struct ipath_devdata *); int ipath_sdma_verbs_send(struct ipath_devdata *, struct ipath_sge_state *, u32, struct ipath_verbs_txreq *); diff --git a/drivers/infiniband/hw/ipath/ipath_qp.c b/drivers/infiniband/hw/ipath/ipath_qp.c index 812b42c..ded970b 100644 --- a/drivers/infiniband/hw/ipath/ipath_qp.c +++ b/drivers/infiniband/hw/ipath/ipath_qp.c @@ -340,6 +340,7 @@ static void ipath_reset_qp(struct ipath_qp *qp, enum ib_qp_type type) qp->s_flags &= IPATH_S_SIGNAL_REQ_WR; qp->s_hdrwords = 0; qp->s_wqe = NULL; + qp->s_pkt_delay = 0; qp->s_psn = 0; qp->r_psn = 0; qp->r_msn = 0; @@ -563,8 +564,10 @@ int ipath_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, if (attr_mask & IB_QP_ACCESS_FLAGS) qp->qp_access_flags = attr->qp_access_flags; - if (attr_mask & IB_QP_AV) + if (attr_mask & IB_QP_AV) { qp->remote_ah_attr = attr->ah_attr; + qp->s_dmult = ipath_ib_rate_to_mult(attr->ah_attr.static_rate); + } if (attr_mask & IB_QP_PATH_MTU) qp->path_mtu = attr->path_mtu; @@ -850,6 +853,7 @@ struct ib_qp *ipath_create_qp(struct ib_pd *ibpd, goto bail_qp; } qp->ip = NULL; + qp->s_tx = NULL; ipath_reset_qp(qp, init_attr->qp_type); break; @@ -955,12 +959,20 @@ int ipath_destroy_qp(struct ib_qp *ibqp) /* Stop the sending tasklet. */ tasklet_kill(&qp->s_task); + if (qp->s_tx) { + atomic_dec(&qp->refcount); + if (qp->s_tx->txreq.flags & IPATH_SDMA_TXREQ_F_FREEBUF) + kfree(qp->s_tx->txreq.map_addr); + } + /* Make sure the QP isn't on the timeout list. */ spin_lock_irqsave(&dev->pending_lock, flags); if (!list_empty(&qp->timerwait)) list_del_init(&qp->timerwait); if (!list_empty(&qp->piowait)) list_del_init(&qp->piowait); + if (qp->s_tx) + list_add(&qp->s_tx->txreq.list, &dev->txreq_free); spin_unlock_irqrestore(&dev->pending_lock, flags); /* diff --git a/drivers/infiniband/hw/ipath/ipath_ruc.c b/drivers/infiniband/hw/ipath/ipath_ruc.c index a59bdbd..bcaa291 100644 --- a/drivers/infiniband/hw/ipath/ipath_ruc.c +++ b/drivers/infiniband/hw/ipath/ipath_ruc.c @@ -483,14 +483,16 @@ done: static void want_buffer(struct ipath_devdata *dd) { - unsigned long flags; - - spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags); - dd->ipath_sendctrl |= INFINIPATH_S_PIOINTBUFAVAIL; - ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, - dd->ipath_sendctrl); - ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); - spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags); + if (!(dd->ipath_flags & IPATH_HAS_SEND_DMA)) { + unsigned long flags; + + spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags); + dd->ipath_sendctrl |= INFINIPATH_S_PIOINTBUFAVAIL; + ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, + dd->ipath_sendctrl); + ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); + spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags); + } } /** diff --git a/drivers/infiniband/hw/ipath/ipath_sdma.c b/drivers/infiniband/hw/ipath/ipath_sdma.c index 5918caf..1974df7 100644 --- a/drivers/infiniband/hw/ipath/ipath_sdma.c +++ b/drivers/infiniband/hw/ipath/ipath_sdma.c @@ -230,7 +230,6 @@ static void dump_sdma_state(struct ipath_devdata *dd) static void sdma_abort_task(unsigned long opaque) { struct ipath_devdata *dd = (struct ipath_devdata *) opaque; - int kick = 0; u64 status; unsigned long flags; @@ -308,30 +307,26 @@ static void sdma_abort_task(unsigned long opaque) /* done with sdma state for a bit */ spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags); - /* restart sdma engine */ + /* + * Don't restart sdma here. Wait until link is up to ACTIVE. + * VL15 MADs used to bring the link up use PIO, and multiple + * link transitions otherwise cause the sdma engine to be + * stopped and started multiple times. + * The disable is done here, including the shadow, so the + * state is kept consistent. + * See ipath_restart_sdma() for the actual starting of sdma. + */ spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags); dd->ipath_sendctrl &= ~INFINIPATH_S_SDMAENABLE; ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, dd->ipath_sendctrl); ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); - dd->ipath_sendctrl |= INFINIPATH_S_SDMAENABLE; - ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, - dd->ipath_sendctrl); - ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags); - kick = 1; - ipath_dbg("sdma restarted from abort\n"); - - /* now clear status bits */ - spin_lock_irqsave(&dd->ipath_sdma_lock, flags); - __clear_bit(IPATH_SDMA_ABORTING, &dd->ipath_sdma_status); - __clear_bit(IPATH_SDMA_DISARMED, &dd->ipath_sdma_status); - __clear_bit(IPATH_SDMA_DISABLED, &dd->ipath_sdma_status); /* make sure I see next message */ dd->ipath_sdma_abort_jiffies = 0; - goto unlock; + goto done; } resched: @@ -353,10 +348,8 @@ resched_noprint: unlock: spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags); - - /* kick upper layers */ - if (kick) - ipath_ib_piobufavail(dd->verbs_dev); +done: + return; } /* @@ -481,10 +474,14 @@ int setup_sdma(struct ipath_devdata *dd) tasklet_init(&dd->ipath_sdma_abort_task, sdma_abort_task, (unsigned long) dd); - /* Turn on SDMA */ + /* + * No use to turn on SDMA here, as link is probably not ACTIVE + * Just mark it RUNNING and enable the interrupt, and let the + * ipath_restart_sdma() on link transition to ACTIVE actually + * enable it. + */ spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags); - dd->ipath_sendctrl |= INFINIPATH_S_SDMAENABLE | - INFINIPATH_S_SDMAINTENABLE; + dd->ipath_sendctrl |= INFINIPATH_S_SDMAINTENABLE; ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, dd->ipath_sendctrl); ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); __set_bit(IPATH_SDMA_RUNNING, &dd->ipath_sdma_status); @@ -572,6 +569,56 @@ void teardown_sdma(struct ipath_devdata *dd) sdma_descq, sdma_descq_phys); } +/* + * [Re]start SDMA, if we use it, and it's not already OK. + * This is called on transition to link ACTIVE, either the first or + * subsequent times. + */ +void ipath_restart_sdma(struct ipath_devdata *dd) +{ + unsigned long flags; + int needed = 1; + + if (!(dd->ipath_flags & IPATH_HAS_SEND_DMA)) + goto bail; + + /* + * First, make sure we should, which is to say, + * check that we are "RUNNING" (not in teardown) + * and not "SHUTDOWN" + */ + spin_lock_irqsave(&dd->ipath_sdma_lock, flags); + if (!test_bit(IPATH_SDMA_RUNNING, &dd->ipath_sdma_status) + || test_bit(IPATH_SDMA_SHUTDOWN, &dd->ipath_sdma_status)) + needed = 0; + else { + __clear_bit(IPATH_SDMA_DISABLED, &dd->ipath_sdma_status); + __clear_bit(IPATH_SDMA_DISARMED, &dd->ipath_sdma_status); + __clear_bit(IPATH_SDMA_ABORTING, &dd->ipath_sdma_status); + } + spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags); + if (!needed) { + ipath_dbg("invalid attempt to restart SDMA, status 0x%016llx\n", + dd->ipath_sdma_status); + goto bail; + } + spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags); + /* + * First clear, just to be safe. Enable is only done + * in chip on 0->1 transition + */ + dd->ipath_sendctrl &= ~INFINIPATH_S_SDMAENABLE; + ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, dd->ipath_sendctrl); + ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); + dd->ipath_sendctrl |= INFINIPATH_S_SDMAENABLE; + ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, dd->ipath_sendctrl); + ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); + spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags); + +bail: + return; +} + static inline void make_sdma_desc(struct ipath_devdata *dd, u64 *sdmadesc, u64 addr, u64 dwlen, u64 dwoffset) { diff --git a/drivers/infiniband/hw/ipath/ipath_stats.c b/drivers/infiniband/hw/ipath/ipath_stats.c index adff2f1..1e36bac 100644 --- a/drivers/infiniband/hw/ipath/ipath_stats.c +++ b/drivers/infiniband/hw/ipath/ipath_stats.c @@ -292,8 +292,8 @@ void ipath_get_faststats(unsigned long opaque) && time_after(jiffies, dd->ipath_unmasktime)) { char ebuf[256]; int iserr; - iserr = ipath_decode_err(ebuf, sizeof ebuf, - dd->ipath_maskederrs); + iserr = ipath_decode_err(dd, ebuf, sizeof ebuf, + dd->ipath_maskederrs); if (dd->ipath_maskederrs & ~(INFINIPATH_E_RRCVEGRFULL | INFINIPATH_E_RRCVHDRFULL | INFINIPATH_E_PKTERRS)) diff --git a/drivers/infiniband/hw/ipath/ipath_ud.c b/drivers/infiniband/hw/ipath/ipath_ud.c index de67eed..4d4d58d 100644 --- a/drivers/infiniband/hw/ipath/ipath_ud.c +++ b/drivers/infiniband/hw/ipath/ipath_ud.c @@ -303,6 +303,7 @@ int ipath_make_ud_req(struct ipath_qp *qp) qp->s_hdrwords = 7; qp->s_cur_size = wqe->length; qp->s_cur_sge = &qp->s_sge; + qp->s_dmult = ah_attr->static_rate; qp->s_wqe = wqe; qp->s_sge.sge = wqe->sg_list[0]; qp->s_sge.sg_list = wqe->sg_list + 1; diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c index 2e6b6f6..434a0d8 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c @@ -242,6 +242,93 @@ static void ipath_flush_wqe(struct ipath_qp *qp, struct ib_send_wr *wr) ipath_cq_enter(to_icq(qp->ibqp.send_cq), &wc, 1); } +/* + * Count the number of DMA descriptors needed to send length bytes of data. + * Don't modify the ipath_sge_state to get the count. + * Return zero if any of the segments is not aligned. + */ +static u32 ipath_count_sge(struct ipath_sge_state *ss, u32 length) +{ + struct ipath_sge *sg_list = ss->sg_list; + struct ipath_sge sge = ss->sge; + u8 num_sge = ss->num_sge; + u32 ndesc = 1; /* count the header */ + + while (length) { + u32 len = sge.length; + + if (len > length) + len = length; + if (len > sge.sge_length) + len = sge.sge_length; + BUG_ON(len == 0); + if (((long) sge.vaddr & (sizeof(u32) - 1)) || + (len != length && (len & (sizeof(u32) - 1)))) { + ndesc = 0; + break; + } + ndesc++; + sge.vaddr += len; + sge.length -= len; + sge.sge_length -= len; + if (sge.sge_length == 0) { + if (--num_sge) + sge = *sg_list++; + } else if (sge.length == 0 && sge.mr != NULL) { + if (++sge.n >= IPATH_SEGSZ) { + if (++sge.m >= sge.mr->mapsz) + break; + sge.n = 0; + } + sge.vaddr = + sge.mr->map[sge.m]->segs[sge.n].vaddr; + sge.length = + sge.mr->map[sge.m]->segs[sge.n].length; + } + length -= len; + } + return ndesc; +} + +/* + * Copy from the SGEs to the data buffer. + */ +static void ipath_copy_from_sge(void *data, struct ipath_sge_state *ss, + u32 length) +{ + struct ipath_sge *sge = &ss->sge; + + while (length) { + u32 len = sge->length; + + if (len > length) + len = length; + if (len > sge->sge_length) + len = sge->sge_length; + BUG_ON(len == 0); + memcpy(data, sge->vaddr, len); + sge->vaddr += len; + sge->length -= len; + sge->sge_length -= len; + if (sge->sge_length == 0) { + if (--ss->num_sge) + *sge = *ss->sg_list++; + } else if (sge->length == 0 && sge->mr != NULL) { + if (++sge->n >= IPATH_SEGSZ) { + if (++sge->m >= sge->mr->mapsz) + break; + sge->n = 0; + } + sge->vaddr = + sge->mr->map[sge->m]->segs[sge->n].vaddr; + sge->length = + sge->mr->map[sge->m]->segs[sge->n].length; + } + data += len; + length -= len; + } +} + /** * ipath_post_one_send - post one RC, UC, or UD send work request * @qp: the QP to post on @@ -866,13 +953,231 @@ static void copy_io(u32 __iomem *piobuf, struct ipath_sge_state *ss, __raw_writel(last, piobuf); } -static int ipath_verbs_send_pio(struct ipath_qp *qp, u32 *hdr, u32 hdrwords, +/* + * Convert IB rate to delay multiplier. + */ +unsigned ipath_ib_rate_to_mult(enum ib_rate rate) +{ + switch (rate) { + case IB_RATE_2_5_GBPS: return 8; + case IB_RATE_5_GBPS: return 4; + case IB_RATE_10_GBPS: return 2; + case IB_RATE_20_GBPS: return 1; + default: return 0; + } +} + +/* + * Convert delay multiplier to IB rate + */ +enum ib_rate ipath_mult_to_ib_rate(unsigned mult) +{ + switch (mult) { + case 8: return IB_RATE_2_5_GBPS; + case 4: return IB_RATE_5_GBPS; + case 2: return IB_RATE_10_GBPS; + case 1: return IB_RATE_20_GBPS; + default: return IB_RATE_PORT_CURRENT; + } +} + +static inline struct ipath_verbs_txreq *get_txreq(struct ipath_ibdev *dev) +{ + struct ipath_verbs_txreq *tx = NULL; + unsigned long flags; + + spin_lock_irqsave(&dev->pending_lock, flags); + if (!list_empty(&dev->txreq_free)) { + struct list_head *l = dev->txreq_free.next; + + list_del(l); + tx = list_entry(l, struct ipath_verbs_txreq, txreq.list); + } + spin_unlock_irqrestore(&dev->pending_lock, flags); + return tx; +} + +static inline void put_txreq(struct ipath_ibdev *dev, + struct ipath_verbs_txreq *tx) +{ + unsigned long flags; + + spin_lock_irqsave(&dev->pending_lock, flags); + list_add(&tx->txreq.list, &dev->txreq_free); + spin_unlock_irqrestore(&dev->pending_lock, flags); +} + +static void sdma_complete(void *cookie, int status) +{ + struct ipath_verbs_txreq *tx = cookie; + struct ipath_qp *qp = tx->qp; + struct ipath_ibdev *dev = to_idev(qp->ibqp.device); + + /* Generate a completion queue entry if needed */ + if (qp->ibqp.qp_type != IB_QPT_RC && tx->wqe) { + enum ib_wc_status ibs = status == IPATH_SDMA_TXREQ_S_OK ? + IB_WC_SUCCESS : IB_WC_WR_FLUSH_ERR; + + ipath_send_complete(qp, tx->wqe, ibs); + } + + if (tx->txreq.flags & IPATH_SDMA_TXREQ_F_FREEBUF) + kfree(tx->txreq.map_addr); + put_txreq(dev, tx); + + if (atomic_dec_and_test(&qp->refcount)) + wake_up(&qp->wait); +} + +/* + * Compute the number of clock cycles of delay before sending the next packet. + * The multipliers reflect the number of clocks for the fastest rate so + * one tick at 4xDDR is 8 ticks at 1xSDR. + * If the destination port will take longer to receive a packet than + * the outgoing link can send it, we need to delay sending the next packet + * by the difference in time it takes the receiver to receive and the sender + * to send this packet. + * Note that this delay is always correct for UC and RC but not always + * optimal for UD. For UD, the destination HCA can be different for each + * packet, in which case, we could send packets to a different destination + * while "waiting" for the delay. The overhead for doing this without + * HW support is more than just paying the cost of delaying some packets + * unnecessarily. + */ +static inline unsigned ipath_pkt_delay(u32 plen, u8 snd_mult, u8 rcv_mult) +{ + return (rcv_mult > snd_mult) ? + (plen * (rcv_mult - snd_mult) + 1) >> 1 : 0; +} + +static int ipath_verbs_send_dma(struct ipath_qp *qp, + struct ipath_ib_header *hdr, u32 hdrwords, + struct ipath_sge_state *ss, u32 len, + u32 plen, u32 dwords) +{ + struct ipath_ibdev *dev = to_idev(qp->ibqp.device); + struct ipath_devdata *dd = dev->dd; + struct ipath_verbs_txreq *tx; + u32 *piobuf; + u32 control; + u32 ndesc; + int ret; + + tx = qp->s_tx; + if (tx) { + qp->s_tx = NULL; + /* resend previously constructed packet */ + ret = ipath_sdma_verbs_send(dd, tx->ss, tx->len, tx); + if (ret) + qp->s_tx = tx; + goto bail; + } + + tx = get_txreq(dev); + if (!tx) { + ret = -EBUSY; + goto bail; + } + + /* + * Get the saved delay count we computed for the previous packet + * and save the delay count for this packet to be used next time + * we get here. + */ + control = qp->s_pkt_delay; + qp->s_pkt_delay = ipath_pkt_delay(plen, dd->delay_mult, qp->s_dmult); + + tx->qp = qp; + atomic_inc(&qp->refcount); + tx->wqe = qp->s_wqe; + tx->txreq.callback = sdma_complete; + tx->txreq.callback_cookie = tx; + tx->txreq.flags = IPATH_SDMA_TXREQ_F_HEADTOHOST | + IPATH_SDMA_TXREQ_F_INTREQ | IPATH_SDMA_TXREQ_F_FREEDESC; + if (plen + 1 >= IPATH_SMALLBUF_DWORDS) + tx->txreq.flags |= IPATH_SDMA_TXREQ_F_USELARGEBUF; + + /* VL15 packets bypass credit check */ + if ((be16_to_cpu(hdr->lrh[0]) >> 12) == 15) { + control |= 1ULL << 31; + tx->txreq.flags |= IPATH_SDMA_TXREQ_F_VL15; + } + + if (len) { + /* + * Don't try to DMA if it takes more descriptors than + * the queue holds. + */ + ndesc = ipath_count_sge(ss, len); + if (ndesc >= dd->ipath_sdma_descq_cnt) + ndesc = 0; + } else + ndesc = 1; + if (ndesc) { + tx->hdr.pbc[0] = cpu_to_le32(plen); + tx->hdr.pbc[1] = cpu_to_le32(control); + memcpy(&tx->hdr.hdr, hdr, hdrwords << 2); + tx->txreq.sg_count = ndesc; + tx->map_len = (hdrwords + 2) << 2; + tx->txreq.map_addr = &tx->hdr; + ret = ipath_sdma_verbs_send(dd, ss, dwords, tx); + if (ret) { + /* save ss and length in dwords */ + tx->ss = ss; + tx->len = dwords; + qp->s_tx = tx; + } + goto bail; + } + + /* Allocate a buffer and copy the header and payload to it. */ + tx->map_len = (plen + 1) << 2; + piobuf = kmalloc(tx->map_len, GFP_ATOMIC); + if (unlikely(piobuf == NULL)) { + ret = -EBUSY; + goto err_tx; + } + tx->txreq.map_addr = piobuf; + tx->txreq.flags |= IPATH_SDMA_TXREQ_F_FREEBUF; + tx->txreq.sg_count = 1; + + *piobuf++ = (__force u32) cpu_to_le32(plen); + *piobuf++ = (__force u32) cpu_to_le32(control); + memcpy(piobuf, hdr, hdrwords << 2); + ipath_copy_from_sge(piobuf + hdrwords, ss, len); + + ret = ipath_sdma_verbs_send(dd, NULL, 0, tx); + /* + * If we couldn't queue the DMA request, save the info + * and try again later rather than destroying the + * buffer and undoing the side effects of the copy. + */ + if (ret) { + tx->ss = NULL; + tx->len = 0; + qp->s_tx = tx; + } + dev->n_unaligned++; + goto bail; + +err_tx: + if (atomic_dec_and_test(&qp->refcount)) + wake_up(&qp->wait); + put_txreq(dev, tx); +bail: + return ret; +} + +static int ipath_verbs_send_pio(struct ipath_qp *qp, + struct ipath_ib_header *ibhdr, u32 hdrwords, struct ipath_sge_state *ss, u32 len, u32 plen, u32 dwords) { struct ipath_devdata *dd = to_idev(qp->ibqp.device)->dd; + u32 *hdr = (u32 *) ibhdr; u32 __iomem *piobuf; unsigned flush_wc; + u32 control; int ret; piobuf = ipath_getpiobuf(dd, plen, NULL); @@ -882,11 +1187,23 @@ static int ipath_verbs_send_pio(struct ipath_qp *qp, u32 *hdr, u32 hdrwords, } /* - * Write len to control qword, no flags. + * Get the saved delay count we computed for the previous packet + * and save the delay count for this packet to be used next time + * we get here. + */ + control = qp->s_pkt_delay; + qp->s_pkt_delay = ipath_pkt_delay(plen, dd->delay_mult, qp->s_dmult); + + /* VL15 packets bypass credit check */ + if ((be16_to_cpu(ibhdr->lrh[0]) >> 12) == 15) + control |= 1ULL << 31; + + /* + * Write the length to the control qword plus any needed flags. * We have to flush after the PBC for correctness on some cpus * or WC buffer can be written out of order. */ - writeq(plen, piobuf); + writeq(((u64) control << 32) | plen, piobuf); piobuf += 2; flush_wc = dd->ipath_flags & IPATH_PIO_FLUSH_WC; @@ -961,15 +1278,25 @@ int ipath_verbs_send(struct ipath_qp *qp, struct ipath_ib_header *hdr, */ plen = hdrwords + dwords + 1; - /* Drop non-VL15 packets if we are not in the active state */ - if (!(dd->ipath_flags & IPATH_LINKACTIVE) && - qp->ibqp.qp_type != IB_QPT_SMI) { + /* + * VL15 packets (IB_QPT_SMI) will always use PIO, so we + * can defer SDMA restart until link goes ACTIVE without + * worrying about just how we got there. + */ + if (qp->ibqp.qp_type == IB_QPT_SMI) + ret = ipath_verbs_send_pio(qp, hdr, hdrwords, ss, len, + plen, dwords); + /* All non-VL15 packets are dropped if link is not ACTIVE */ + else if (!(dd->ipath_flags & IPATH_LINKACTIVE)) { if (qp->s_wqe) ipath_send_complete(qp, qp->s_wqe, IB_WC_SUCCESS); ret = 0; - } else - ret = ipath_verbs_send_pio(qp, (u32 *) hdr, hdrwords, - ss, len, plen, dwords); + } else if (dd->ipath_flags & IPATH_HAS_SEND_DMA) + ret = ipath_verbs_send_dma(qp, hdr, hdrwords, ss, len, + plen, dwords); + else + ret = ipath_verbs_send_pio(qp, hdr, hdrwords, ss, len, + plen, dwords); return ret; } @@ -1038,6 +1365,12 @@ int ipath_get_counters(struct ipath_devdata *dd, ipath_snap_cntr(dd, crp->cr_errlpcrccnt) + ipath_snap_cntr(dd, crp->cr_badformatcnt) + dd->ipath_rxfc_unsupvl_errs; + if (crp->cr_rxotherlocalphyerrcnt) + cntrs->port_rcv_errors += + ipath_snap_cntr(dd, crp->cr_rxotherlocalphyerrcnt); + if (crp->cr_rxvlerrcnt) + cntrs->port_rcv_errors += + ipath_snap_cntr(dd, crp->cr_rxvlerrcnt); cntrs->port_rcv_remphys_errors = ipath_snap_cntr(dd, crp->cr_rcvebpcnt); cntrs->port_xmit_discards = ipath_snap_cntr(dd, crp->cr_unsupvlcnt); @@ -1046,9 +1379,16 @@ int ipath_get_counters(struct ipath_devdata *dd, cntrs->port_xmit_packets = ipath_snap_cntr(dd, crp->cr_pktsendcnt); cntrs->port_rcv_packets = ipath_snap_cntr(dd, crp->cr_pktrcvcnt); cntrs->local_link_integrity_errors = - (dd->ipath_flags & IPATH_GPIO_ERRINTRS) ? - dd->ipath_lli_errs : dd->ipath_lli_errors; - cntrs->excessive_buffer_overrun_errors = dd->ipath_overrun_thresh_errs; + crp->cr_locallinkintegrityerrcnt ? + ipath_snap_cntr(dd, crp->cr_locallinkintegrityerrcnt) : + ((dd->ipath_flags & IPATH_GPIO_ERRINTRS) ? + dd->ipath_lli_errs : dd->ipath_lli_errors); + cntrs->excessive_buffer_overrun_errors = + crp->cr_excessbufferovflcnt ? + ipath_snap_cntr(dd, crp->cr_excessbufferovflcnt) : + dd->ipath_overrun_thresh_errs; + cntrs->vl15_dropped = crp->cr_vl15droppedpktcnt ? + ipath_snap_cntr(dd, crp->cr_vl15droppedpktcnt) : 0; ret = 0; @@ -1396,6 +1736,7 @@ static struct ib_ah *ipath_create_ah(struct ib_pd *pd, /* ib_create_ah() will initialize ah->ibah. */ ah->attr = *ah_attr; + ah->attr.static_rate = ipath_ib_rate_to_mult(ah_attr->static_rate); ret = &ah->ibah; @@ -1429,6 +1770,7 @@ static int ipath_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr) struct ipath_ah *ah = to_iah(ibah); *ah_attr = ah->attr; + ah_attr->static_rate = ipath_mult_to_ib_rate(ah->attr.static_rate); return 0; } @@ -1578,6 +1920,8 @@ int ipath_register_ib_device(struct ipath_devdata *dd) struct ipath_verbs_counters cntrs; struct ipath_ibdev *idev; struct ib_device *dev; + struct ipath_verbs_txreq *tx; + unsigned i; int ret; idev = (struct ipath_ibdev *)ib_alloc_device(sizeof *idev); @@ -1588,6 +1932,17 @@ int ipath_register_ib_device(struct ipath_devdata *dd) dev = &idev->ibdev; + if (dd->ipath_sdma_descq_cnt) { + tx = kmalloc(dd->ipath_sdma_descq_cnt * sizeof *tx, + GFP_KERNEL); + if (tx == NULL) { + ret = -ENOMEM; + goto err_tx; + } + } else + tx = NULL; + idev->txreq_bufs = tx; + /* Only need to initialize non-zero fields. */ spin_lock_init(&idev->n_pds_lock); spin_lock_init(&idev->n_ahs_lock); @@ -1628,6 +1983,7 @@ int ipath_register_ib_device(struct ipath_devdata *dd) INIT_LIST_HEAD(&idev->pending[2]); INIT_LIST_HEAD(&idev->piowait); INIT_LIST_HEAD(&idev->rnrwait); + INIT_LIST_HEAD(&idev->txreq_free); idev->pending_index = 0; idev->port_cap_flags = IB_PORT_SYS_IMAGE_GUID_SUP | IB_PORT_CLIENT_REG_SUP; @@ -1659,6 +2015,9 @@ int ipath_register_ib_device(struct ipath_devdata *dd) cntrs.excessive_buffer_overrun_errors; idev->z_vl15_dropped = cntrs.vl15_dropped; + for (i = 0; i < dd->ipath_sdma_descq_cnt; i++, tx++) + list_add(&tx->txreq.list, &idev->txreq_free); + /* * The system image GUID is supposed to be the same for all * IB HCAs in a single system but since there can be other @@ -1708,6 +2067,7 @@ int ipath_register_ib_device(struct ipath_devdata *dd) dev->phys_port_cnt = 1; dev->num_comp_vectors = 1; dev->dma_device = &dd->pcidev->dev; + dev->class_dev.dev = dev->dma_device; dev->query_device = ipath_query_device; dev->modify_device = ipath_modify_device; dev->query_port = ipath_query_port; @@ -1772,6 +2132,8 @@ err_reg: err_lk: kfree(idev->qp_table.table); err_qp: + kfree(idev->txreq_bufs); +err_tx: ib_dealloc_device(dev); ipath_dev_err(dd, "cannot register verbs: %d!\n", -ret); idev = NULL; @@ -1806,6 +2168,7 @@ void ipath_unregister_ib_device(struct ipath_ibdev *dev) ipath_free_all_qps(&dev->qp_table); kfree(dev->qp_table.table); kfree(dev->lk_table.table); + kfree(dev->txreq_bufs); ib_dealloc_device(ibdev); } @@ -1853,13 +2216,15 @@ static ssize_t show_stats(struct class_device *cdev, char *buf) "RC stalls %d\n" "piobuf wait %d\n" "no piobuf %d\n" + "unaligned %d\n" "PKT drops %d\n" "WQE errs %d\n", dev->n_rc_resends, dev->n_rc_qacks, dev->n_rc_acks, dev->n_seq_naks, dev->n_rdma_seq, dev->n_rnr_naks, dev->n_other_naks, dev->n_timeouts, dev->n_rdma_dup_busy, dev->n_rc_stalls, dev->n_piowait, - dev->n_no_piobuf, dev->n_pkt_drops, dev->n_wqe_errs); + dev->n_no_piobuf, dev->n_unaligned, + dev->n_pkt_drops, dev->n_wqe_errs); for (i = 0; i < ARRAY_SIZE(dev->opstats); i++) { const struct ipath_opcode_stats *si = &dev->opstats[i]; From ralph.campbell at qlogic.com Wed Apr 2 15:50:43 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 02 Apr 2008 15:50:43 -0700 Subject: [ofa-general] [PATCH 20/20] IB/ipath - Update copyright dates for files changed in 2008 In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: <20080402225043.28598.98936.stgit@eng-46.mv.qlogic.com> This patch updates the copyright date for files modified in 2008. Signed-off-by: Ralph Campbell --- drivers/infiniband/hw/ipath/ipath_common.h | 2 +- drivers/infiniband/hw/ipath/ipath_diag.c | 2 +- drivers/infiniband/hw/ipath/ipath_driver.c | 2 +- drivers/infiniband/hw/ipath/ipath_eeprom.c | 2 +- drivers/infiniband/hw/ipath/ipath_file_ops.c | 2 +- drivers/infiniband/hw/ipath/ipath_iba6120.c | 2 +- drivers/infiniband/hw/ipath/ipath_init_chip.c | 2 +- drivers/infiniband/hw/ipath/ipath_intr.c | 2 +- drivers/infiniband/hw/ipath/ipath_kernel.h | 2 +- drivers/infiniband/hw/ipath/ipath_mad.c | 2 +- drivers/infiniband/hw/ipath/ipath_qp.c | 2 +- drivers/infiniband/hw/ipath/ipath_rc.c | 2 +- drivers/infiniband/hw/ipath/ipath_srq.c | 2 +- drivers/infiniband/hw/ipath/ipath_stats.c | 2 +- drivers/infiniband/hw/ipath/ipath_sysfs.c | 2 +- drivers/infiniband/hw/ipath/ipath_ud.c | 2 +- drivers/infiniband/hw/ipath/ipath_verbs.c | 2 +- drivers/infiniband/hw/ipath/ipath_verbs.h | 2 +- 18 files changed, 18 insertions(+), 18 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_common.h b/drivers/infiniband/hw/ipath/ipath_common.h index 2cf7cd2..28cfe97 100644 --- a/drivers/infiniband/hw/ipath/ipath_common.h +++ b/drivers/infiniband/hw/ipath/ipath_common.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. + * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_diag.c b/drivers/infiniband/hw/ipath/ipath_diag.c index c9bfd82..6d49d2f 100644 --- a/drivers/infiniband/hw/ipath/ipath_diag.c +++ b/drivers/infiniband/hw/ipath/ipath_diag.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. + * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index 8ccc915..9121529 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. + * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_eeprom.c b/drivers/infiniband/hw/ipath/ipath_eeprom.c index 72f90e8..dc37277 100644 --- a/drivers/infiniband/hw/ipath/ipath_eeprom.c +++ b/drivers/infiniband/hw/ipath/ipath_eeprom.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. + * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c index d38ba29..1e627aa 100644 --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. + * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c index c8d8f1a..421cc2a 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba6120.c +++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. + * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c index b43c2a1..27dd894 100644 --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. + * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c index d0088d5..1b58f47 100644 --- a/drivers/infiniband/hw/ipath/ipath_intr.c +++ b/drivers/infiniband/hw/ipath/ipath_intr.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. + * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index a4857b9..f856015 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -1,7 +1,7 @@ #ifndef _IPATH_KERNEL_H #define _IPATH_KERNEL_H /* - * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. + * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c b/drivers/infiniband/hw/ipath/ipath_mad.c index babcc09..1ff46ae 100644 --- a/drivers/infiniband/hw/ipath/ipath_mad.c +++ b/drivers/infiniband/hw/ipath/ipath_mad.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. + * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved. * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_qp.c b/drivers/infiniband/hw/ipath/ipath_qp.c index ded970b..dd5b6e9 100644 --- a/drivers/infiniband/hw/ipath/ipath_qp.c +++ b/drivers/infiniband/hw/ipath/ipath_qp.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. + * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved. * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_rc.c b/drivers/infiniband/hw/ipath/ipath_rc.c index f765d48..4679819 100644 --- a/drivers/infiniband/hw/ipath/ipath_rc.c +++ b/drivers/infiniband/hw/ipath/ipath_rc.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. + * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved. * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_srq.c b/drivers/infiniband/hw/ipath/ipath_srq.c index 3366d66..e3d80ca 100644 --- a/drivers/infiniband/hw/ipath/ipath_srq.c +++ b/drivers/infiniband/hw/ipath/ipath_srq.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. + * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved. * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_stats.c b/drivers/infiniband/hw/ipath/ipath_stats.c index 1e36bac..c8e3d65 100644 --- a/drivers/infiniband/hw/ipath/ipath_stats.c +++ b/drivers/infiniband/hw/ipath/ipath_stats.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. + * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_sysfs.c b/drivers/infiniband/hw/ipath/ipath_sysfs.c index 2e6d2aa..a6c8efb 100644 --- a/drivers/infiniband/hw/ipath/ipath_sysfs.c +++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. + * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved. * Copyright (c) 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_ud.c b/drivers/infiniband/hw/ipath/ipath_ud.c index 4d4d58d..918f520 100644 --- a/drivers/infiniband/hw/ipath/ipath_ud.c +++ b/drivers/infiniband/hw/ipath/ipath_ud.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. + * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved. * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c index 434a0d8..d174694 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. + * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved. * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h index 056e741..65ddfc9 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.h +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. + * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved. * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two From Karla at qldnet.com.au Wed Apr 2 15:53:20 2008 From: Karla at qldnet.com.au (Karla Thomason) Date: Wed, 02 Apr 2008 22:53:20 -0000 Subject: [ofa-general] Flawless design rep1!c@s Message-ID: <01a201c9b3e5$8285cfa0$c0a80144@Karla> (ankle). Striker Alan Smith was on the bench for United"It's 12 years since I stood for elected office...and Check out our impressive line of perfectly crafted fake designer chronometers! Get amazingly good value for your money!http://werenbebvs.com/ John O'Shea was played. Lille started the game with aadvocacy groups.on aggregate. The aggregate loss concluded a sorrowfulsending them beyond the reach of law. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Wed Apr 2 16:00:20 2008 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 2 Apr 2008 16:00:20 -0700 Subject: [ofa-general] how do I use uDAPL with iWARP? Message-ID: I have OFED 1.3 and a Chelsio S310E-SR+ iWARP 10GE NIC. I have ib_rdma_lat working, so I know IB verbs are working. How do I use uDAPL, though? All the default /etc/dat.conf entries have IPoIB or bonding interfaces in them. Scott Weitzenkamp SQA and Release Manager Data Center Access Engineering Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbernstein at penguincomputing.com Wed Apr 2 16:04:53 2008 From: jbernstein at penguincomputing.com (Joshua Bernstein) Date: Wed, 2 Apr 2008 16:04:53 -0700 Subject: [ofa-general] how do I use uDAPL with iWARP? In-Reply-To: References: Message-ID: <43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com> Scott, On Apr 2, 2008, at 4:00 PM, Scott Weitzenkamp (sweitzen) wrote: > I have OFED 1.3 and a Chelsio S310E-SR+ iWARP 10GE NIC. I have > ib_rdma_lat working, so I know IB verbs are working. > > How do I use uDAPL, though? All the default /etc/dat.conf entries > have IPoIB or bonding interfaces in them. What you will want to do is edit /etc/ofed/dat64.conf or other related dat.conf file and change the name of the device from "ib0" to the name of the interface that the Chelsio card came up as. So for example with my NetXen cards coming up at eth2, so for example the first two lines of my /etc/ofed/dat64.conf file look like this: OpenIB-cma u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl. 1.2 "eth2 0" "" #OpenIB-cma u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl. 1.2 "ib0 0" "" Notice how I've commented out the ib0 line and simply changed that to be eth2. Then you can use say HP-MPI for example using the -UDAPL option. Other MPI stacks have similar methods of telling them to use the UDAPL transport. -Joshua Bernstein Software Engineer Penguin Computing From clameter at sgi.com Wed Apr 2 16:04:42 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 2 Apr 2008 16:04:42 -0700 (PDT) Subject: [ofa-general] Re: [patch 1/9] EMM Notifier: The notifier calls In-Reply-To: <20080402220936.GW19189@duo.random> References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> <20080402215334.GT19189@duo.random> <20080402220936.GW19189@duo.random> Message-ID: On Thu, 3 Apr 2008, Andrea Arcangeli wrote: > I said try_to_unmap_cluster, not get_user_pages. > > CPU0 CPU1 > try_to_unmap_cluster: > emm_invalidate_start in EMM (or mmu_notifier_invalidate_range_start in #v10) > walking the list by hand in EMM (or with hlist cleaner in #v10) > xpmem method invoked > schedule for a long while inside invalidate_range_start while skbs are sent > gru registers > synchronize_rcu (sorry useless now) All of this would be much easier if you could stop the drivel. The sync rcu was for an earlier release of the mmu notifier. Why the sniping? > single threaded, so taking a page fault > secondary tlb instantiated The driver must not allow faults to occur between start and end. The trouble here is that GRU and xpmem are mixed. If CPU0 would have been running GRU instead of XPMEM then the fault would not have occurred because the gru would have noticed that a range op is active. If both systems would have run xpmem then the same would have worked. I guess this means that an address space cannot reliably registered to multiple subsystems if some of those do not take a refcount. If all drivers would be required to take a refcount then this would also not occur. > In general my #v10 solution mixing seqlock + rcu looks more robust and > allows multithreaded attachment of mmu notifers as well. I could have Well its easy to say that if no one else has looked at it yet. I expressed some concerns in reply to your post of #v10. From sweitzen at cisco.com Wed Apr 2 16:07:37 2008 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 2 Apr 2008 16:07:37 -0700 Subject: [ofa-general] how do I use uDAPL with iWARP? In-Reply-To: <43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com> References: <43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com> Message-ID: I tried that, and it didn't work: [root at svbu-qa2950-1 ~]# grep eth /etc/dat.conf OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "eth2 0" "" [root at svbu-qa2950-1 ~]# dtest 10194 Running as server - OpenIB-cma 10194 Error dat_ep_create: DAT_INVALID_HANDLE 10194 Error freeing EP: DAT_INVALID_HANDLE DAT_INVALID_HANDLE_EP 10194: DAPL Test Complete. 10194: Message RTT: Total= 0.00 usec, 10 bursts, itime= 0.00 usec, pc= 0 10194: RDMA write: Total= 0.00 usec, 10 bursts, itime= 0.00 usec, pc= 0 10194: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 usec, pc =0 10194: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 usec, pc =0 10194: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 usec, pc =0 10194: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 usec, pc =0 10194: open: 32254.93 usec 10194: close: 31936.17 usec 10194: PZ create: 7.15 usec 10194: PZ free: 4.05 usec 10194: LMR create: 36.00 usec 10194: LMR free: 22.89 usec 10194: EVD create: 6.91 usec 10194: EVD free: 11.92 usec 10194: EP create: 28.85 usec 10194: EP free: 0.00 usec 10194: TOTAL: 106.57 usec Scott Weitzenkamp SQA and Release Manager Data Center Access Engineering Cisco Systems > -----Original Message----- > From: Joshua Bernstein [mailto:jbernstein at penguincomputing.com] > Sent: Wednesday, April 02, 2008 4:05 PM > To: Scott Weitzenkamp (sweitzen) > Cc: [ofa_general]; OpenFabrics EWG > Subject: Re: [ofa-general] how do I use uDAPL with iWARP? > > Scott, > > On Apr 2, 2008, at 4:00 PM, Scott Weitzenkamp (sweitzen) wrote: > > I have OFED 1.3 and a Chelsio S310E-SR+ iWARP 10GE NIC. I have > > ib_rdma_lat working, so I know IB verbs are working. > > > > How do I use uDAPL, though? All the default /etc/dat.conf entries > > have IPoIB or bonding interfaces in them. > > What you will want to do is edit /etc/ofed/dat64.conf or other > related dat.conf file and change the name of the device from > "ib0" to > the name of the interface that the Chelsio card came up as. So for > example with my NetXen cards coming up at eth2, so for example the > first two lines of my /etc/ofed/dat64.conf file look like this: > > OpenIB-cma u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl. > 1.2 "eth2 0" "" > #OpenIB-cma u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl. > 1.2 "ib0 0" "" > > Notice how I've commented out the ib0 line and simply changed > that to > be eth2. Then you can use say HP-MPI for example using the -UDAPL > option. Other MPI stacks have similar methods of telling them to use > the UDAPL transport. > > -Joshua Bernstein > Software Engineer > Penguin Computing > From jbernstein at penguincomputing.com Wed Apr 2 16:09:10 2008 From: jbernstein at penguincomputing.com (Joshua Bernstein) Date: Wed, 2 Apr 2008 16:09:10 -0700 Subject: [ofa-general] how do I use uDAPL with iWARP? In-Reply-To: References: <43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com> Message-ID: On Apr 2, 2008, at 4:07 PM, Scott Weitzenkamp (sweitzen) wrote: > I tried that, and it didn't work: > > [root at svbu-qa2950-1 ~]# grep eth /etc/dat.conf > OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 > "eth2 0" > "" > [root at svbu-qa2950-1 ~]# dtest > 10194 Running as server - OpenIB-cma > 10194 Error dat_ep_create: DAT_INVALID_HANDLE > 10194 Error freeing EP: DAT_INVALID_HANDLE DAT_INVALID_HANDLE_EP Ah, it is using the correct device then. Do you have the rdma_ucm modules loaded? -Josh >> -----Original Message----- >> From: Joshua Bernstein [mailto:jbernstein at penguincomputing.com] >> Sent: Wednesday, April 02, 2008 4:05 PM >> To: Scott Weitzenkamp (sweitzen) >> Cc: [ofa_general]; OpenFabrics EWG >> Subject: Re: [ofa-general] how do I use uDAPL with iWARP? >> >> Scott, >> >> On Apr 2, 2008, at 4:00 PM, Scott Weitzenkamp (sweitzen) wrote: >>> I have OFED 1.3 and a Chelsio S310E-SR+ iWARP 10GE NIC. I have >>> ib_rdma_lat working, so I know IB verbs are working. >>> >>> How do I use uDAPL, though? All the default /etc/dat.conf entries >>> have IPoIB or bonding interfaces in them. >> >> What you will want to do is edit /etc/ofed/dat64.conf or other >> related dat.conf file and change the name of the device from >> "ib0" to >> the name of the interface that the Chelsio card came up as. So for >> example with my NetXen cards coming up at eth2, so for example the >> first two lines of my /etc/ofed/dat64.conf file look like this: >> >> OpenIB-cma u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl. >> 1.2 "eth2 0" "" >> #OpenIB-cma u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl. >> 1.2 "ib0 0" "" >> >> Notice how I've commented out the ib0 line and simply changed >> that to >> be eth2. Then you can use say HP-MPI for example using the -UDAPL >> option. Other MPI stacks have similar methods of telling them to use >> the UDAPL transport. >> >> -Joshua Bernstein >> Software Engineer >> Penguin Computing >> From sweitzen at cisco.com Wed Apr 2 16:28:31 2008 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 2 Apr 2008 16:28:31 -0700 Subject: [ofa-general] how do I use uDAPL with iWARP? In-Reply-To: References: <43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com> Message-ID: > > I tried that, and it didn't work: > > > > [root at svbu-qa2950-1 ~]# grep eth /etc/dat.conf > > OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 > > "eth2 0" > > "" > > [root at svbu-qa2950-1 ~]# dtest > > 10194 Running as server - OpenIB-cma > > 10194 Error dat_ep_create: DAT_INVALID_HANDLE > > 10194 Error freeing EP: DAT_INVALID_HANDLE DAT_INVALID_HANDLE_EP > > Ah, it is using the correct device then. Do you have the rdma_ucm > modules loaded? Yes, I do: [root at svbu-qa2950-1 ~]# lsmod | grep cm rdma_ucm 47232 0 ib_uverbs 75568 1 rdma_ucm rdma_cm 67348 2 rdma_ucm,ib_sdp ib_cm 67496 2 ib_ipoib,rdma_cm iw_cm 43656 1 rdma_cm ib_sa 74632 3 ib_ipoib,rdma_cm,ib_cm ib_mad 70948 5 ib_umad,mlx4_ib,ib_mthca,ib_cm,ib_sa ib_core 97664 13 rdma_ucm,ib_sdp,ib_ipoib,ib_uverbs,ib_umad,iw_c xgb3,mlx4_ib,ib_mthca,rdma_cm,ib_cm,iw_cm,ib_sa,ib_mad ib_addr 41992 1 rdma_cm From arlin.r.davis at intel.com Wed Apr 2 17:40:15 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Wed, 2 Apr 2008 17:40:15 -0700 Subject: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP? In-Reply-To: References: <43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com> Message-ID: >-----Original Message----- >From: ewg-bounces at lists.openfabrics.org >[mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of Scott >Weitzenkamp (sweitzen) >Sent: Wednesday, April 02, 2008 4:29 PM >To: Joshua Bernstein >Cc: OpenFabrics EWG; [ofa_general] >Subject: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP? > >> > I tried that, and it didn't work: >> > >> > [root at svbu-qa2950-1 ~]# grep eth /etc/dat.conf >> > OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 >> > "eth2 0" >> > "" >> > [root at svbu-qa2950-1 ~]# dtest >> > 10194 Running as server - OpenIB-cma >> > 10194 Error dat_ep_create: DAT_INVALID_HANDLE >> > 10194 Error freeing EP: DAT_INVALID_HANDLE DAT_INVALID_HANDLE_EP >> Scott, I don't have any iWARP adapters so I am guessing here. Usually it is an attributes issues with QP create. The dtest is possibly setting QP attributes beyond the device max values. Can you do a ibv_devinfo -v and send the output. Also, do you know if this device supports inline_data? uDAPL creates the QP with inline_data set to 64 bytes by default. You can override this with enviroment variable DAPL_MAX_INLINE. Also, uDAPL uses cma. Did you happen to test with "ib_rdma_lat -c" ? -arlin From andrea at qumranet.com Wed Apr 2 17:42:46 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Thu, 3 Apr 2008 02:42:46 +0200 Subject: [ofa-general] Re: [PATCH 1 of 8] Core of mmu notifiers In-Reply-To: References: Message-ID: <20080403004246.GA16633@duo.random> On Wed, Apr 02, 2008 at 03:34:01PM -0700, Christoph Lameter wrote: > Still two methods ... Yes, the invalidate_page is called with the core VM holding a reference on the page _after_ the tlb flush. The invalidate_end is called after the page has been freed already and after the tlb flush. They've different semantics and with invalidate_page there's no need to block the kvm fault handler. But invalidate_page is only the most efficient for operations that aren't creating holes in the vma, for the rest invalidate_range_start/end provides the best performance by reducing the number of tlb flushes. > seqlock just taken for checking if everything is ok? Exactly. > The critical section could be run multiple times for one callback which > could result in multiple callbacks to clear the young bit. Guess not that > big of an issue? Yes, that's ok. > Ok. Retry would try to invalidate the page a second time which is not a > problem unless you would drop the refcount or make other state changes > that require correspondence with mapping. I guess this is the reason > that you stopped adding a refcount? The current patch using mmu notifiers is already robust against multiple invalidates. The refcounting represent a spte mapping, if we already invalidated it, the spte will be nonpresent and there's no page to unpin. The removal of the refcount is only a microoptimization. > Multiple invalidate_range_starts on the same range? This means the driver > needs to be able to deal with the situation and ignore the repeated > call? The driver would need to store current->pid in a list and remove it in range_stop. And range_stop would need to do nothing at all, if the pid isn't found in the list. But thinking more I'm not convinced the driver is safe by ignoring if range_end runs before range_begin (pid not found in the list). And I don't see a clear way to fix it not internally to the device driver nor externally. So the repeated call is easy to handle for the driver. What is not trivial is to block the secondary page faults when mmu_notifier_register happens in the middle of range_start/end critical section. sptes can be established in between range_start/_end and that shouldn't happen. So the core problem returns to be how to handle mmu_notifier_register happening in the middle of _range_start/_end, dismissing it as a job for the driver seems not feasible (you have the same problem with EMM of course). > Retry can lead to multiple invalidate_range callbacks with the same > parameters? Driver needs to ignore if the range is already clear? Mostly covered above. From astoundcunningham at rppaccounts.co.uk Wed Apr 2 16:07:38 2008 From: astoundcunningham at rppaccounts.co.uk (clinten fritz) Date: Wed, 02 Apr 2008 23:07:38 +0000 Subject: [ofa-general] Buy your pharmaceuticals online. This is smart and convenient. Message-ID: <000901c89525$05d97bd2$40d4de92@emnaqs> cheapest online drugstore. Verified by visa. Quality and Flexibility. http://www.google.de/pagead/iclk?sa=l&ai=TTrAEI&num=35947&adurl=http://zjeT.timeminute.com From clameter at sgi.com Wed Apr 2 18:03:50 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 2 Apr 2008 18:03:50 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 1 of 8] Core of mmu notifiers In-Reply-To: <20080403004246.GA16633@duo.random> References: <20080403004246.GA16633@duo.random> Message-ID: Thinking about this adventurous locking some more: I think you are misunderstanding what a seqlock is. It is *not* a spinlock. The critical read section with the reading of a version before and after allows you access to a certain version of memory how it is or was some time ago (caching effect). It does not mean that the current state of memory is fixed and neither does it allow syncing when an item is added to the list. So it could be that you are traversing a list that is missing one item because it is not visible to this processor yet. You may just see a state from the past. I would think that you will need a real lock in order to get the desired effect. From clameter at sgi.com Wed Apr 2 18:24:15 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 2 Apr 2008 18:24:15 -0700 (PDT) Subject: [ofa-general] EMM: disable other notifiers before register and unregister In-Reply-To: <20080402221716.GY19189@duo.random> References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> <20080402220148.GV19189@duo.random> <20080402221716.GY19189@duo.random> Message-ID: Ok lets forget about the single theaded thing to solve the registration races. As Andrea pointed out this still has ssues with other subscribed subsystems (and also try_to_unmap). We could do something like what stop_machine_run does: First disable all running subsystems before registering a new one. Maybe this is a possible solution. Subject: EMM: disable other notifiers before register and unregister As Andrea has pointed out: There are races during registration if other subsystem notifiers are active while we register a callback. Solve that issue by adding two new notifiers: emm_stop Stops the notifier operations. Notifier must block on invalidate_start and emm_referenced from this point on. If an invalidate_start has not been completed by a call to invalidate_end then the driver must wait until the operation is complete before returning. emm_start Restart notifier operations. Before registration all other subscribed subsystems are stopped. Then the new subsystem is subscribed and things can get running without consistency issues. Subsystems are restarted after the lists have been updated. This also works for unregistering. If we can get all subsystems to stop then we can also reliably unregister a subsystem. So provide that callback. Signed-off-by: Christoph Lameter --- include/linux/rmap.h | 10 +++++++--- mm/rmap.c | 30 ++++++++++++++++++++++++++++++ 2 files changed, 37 insertions(+), 3 deletions(-) Index: linux-2.6/include/linux/rmap.h =================================================================== --- linux-2.6.orig/include/linux/rmap.h 2008-04-02 18:16:07.906032549 -0700 +++ linux-2.6/include/linux/rmap.h 2008-04-02 18:17:10.291070009 -0700 @@ -94,7 +94,9 @@ enum emm_operation { emm_release, /* Process exiting, */ emm_invalidate_start, /* Before the VM unmaps pages */ emm_invalidate_end, /* After the VM unmapped pages */ - emm_referenced /* Check if a range was referenced */ + emm_referenced, /* Check if a range was referenced */ + emm_stop, /* Halt all faults/invalidate_starts */ + emm_start, /* Restart operations */ }; struct emm_notifier { @@ -126,13 +128,15 @@ static inline int emm_notify(struct mm_s /* * Register a notifier with an mm struct. Release occurs when the process - * terminates by calling the notifier function with emm_release. + * terminates by calling the notifier function with emm_release or when + * emm_notifier_unregister is called. * * Must hold the mmap_sem for write. */ extern void emm_notifier_register(struct emm_notifier *e, struct mm_struct *mm); - +extern void emm_notifier_unregister(struct emm_notifier *e, + struct mm_struct *mm); /* * Called from mm/vmscan.c to handle paging out Index: linux-2.6/mm/rmap.c =================================================================== --- linux-2.6.orig/mm/rmap.c 2008-04-02 18:16:09.378057062 -0700 +++ linux-2.6/mm/rmap.c 2008-04-02 18:16:10.710079201 -0700 @@ -289,16 +289,46 @@ void emm_notifier_release(struct mm_stru /* Register a notifier */ void emm_notifier_register(struct emm_notifier *e, struct mm_struct *mm) { + /* Bring all other notifiers into a quiescent state */ + emm_notify(mm, emm_stop, 0, TASK_SIZE); + e->next = mm->emm_notifier; + /* * The update to emm_notifier (e->next) must be visible * before the pointer becomes visible. * rcu_assign_pointer() does exactly what we need. */ rcu_assign_pointer(mm->emm_notifier, e); + + /* Continue notifiers */ + emm_notify(mm, emm_start, 0, TASK_SIZE); } EXPORT_SYMBOL_GPL(emm_notifier_register); +/* Unregister a notifier */ +void emm_notifier_unregister(struct emm_notifier *e, struct mm_struct *mm) +{ + struct emm_notifier *p; + + emm_notify(mm, emm_stop, 0, TASK_SIZE); + + p = mm->emm_notifier; + if (e == p) + mm->emm_notifier = e->next; + else { + while (p->next != e) + p = p->next; + + p->next = e->next; + } + e->next = mm->emm_notifier; + + emm_notify(mm, emm_start, 0, TASK_SIZE); + e->callback(e, mm, emm_release, 0, TASK_SIZE); +} +EXPORT_SYMBOL_GPL(emm_notifier_unregister); + /* * Perform a callback * From jbernstein at penguincomputing.com Wed Apr 2 19:58:44 2008 From: jbernstein at penguincomputing.com (Joshua Bernstein) Date: Wed, 2 Apr 2008 19:58:44 -0700 Subject: [ofa-general] how do I use uDAPL with iWARP? In-Reply-To: References: <43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com> Message-ID: <5747B828-9F2E-47E4-9658-52A232147C37@penguincomputing.com> Have you checked to make sure the right user space end points are available in /sys? Does using strace give you any hints? -Josh On Apr 2, 2008, at 4:28 PM, Scott Weitzenkamp (sweitzen) wrote: >>> I tried that, and it didn't work: >>> >>> [root at svbu-qa2950-1 ~]# grep eth /etc/dat.conf >>> OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 >>> "eth2 0" >>> "" >>> [root at svbu-qa2950-1 ~]# dtest >>> 10194 Running as server - OpenIB-cma >>> 10194 Error dat_ep_create: DAT_INVALID_HANDLE >>> 10194 Error freeing EP: DAT_INVALID_HANDLE DAT_INVALID_HANDLE_EP >> >> Ah, it is using the correct device then. Do you have the rdma_ucm >> modules loaded? > > Yes, I do: > > [root at svbu-qa2950-1 ~]# lsmod | grep cm > rdma_ucm 47232 0 > ib_uverbs 75568 1 rdma_ucm > rdma_cm 67348 2 rdma_ucm,ib_sdp > ib_cm 67496 2 ib_ipoib,rdma_cm > iw_cm 43656 1 rdma_cm > ib_sa 74632 3 ib_ipoib,rdma_cm,ib_cm > ib_mad 70948 5 ib_umad,mlx4_ib,ib_mthca,ib_cm,ib_sa > ib_core 97664 13 > rdma_ucm,ib_sdp,ib_ipoib,ib_uverbs,ib_umad,iw_c > xgb3,mlx4_ib,ib_mthca,rdma_cm,ib_cm,iw_cm,ib_sa,ib_mad > ib_addr 41992 1 rdma_cm From a.p.zijlstra at chello.nl Thu Apr 3 03:40:46 2008 From: a.p.zijlstra at chello.nl (Peter Zijlstra) Date: Thu, 03 Apr 2008 12:40:46 +0200 Subject: [ofa-general] Re: EMM: Fixup return value handling of emm_notify() In-Reply-To: References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> <20080402212515.GS19189@duo.random> Message-ID: <1207219246.8514.817.camel@twins> On Wed, 2008-04-02 at 14:33 -0700, Christoph Lameter wrote: > On Wed, 2 Apr 2008, Andrea Arcangeli wrote: > > > but anyway it's silly to be hardwired to such an interface that worst > > of all requires switch statements instead of proper pointer to > > functions and a fixed set of parameters and retval semantics for all > > methods. > > The EMM API with a single callback is the simplest approach at this point. > A common callback for all operations allows the driver to implement common > entry and exit code as seen in XPMem. It seems to me that common code can be shared using functions? No need to stuff everything into a single function. We have method vectors all over the kernel, we could do a_ops as a single callback too, but we dont. FWIW I prefer separate methods. > I guess we can complicate this more by switching to a different API or > adding additional emm_xxx() callback if need be but I really want to have > a strong case for why this would be needed. There is the danger of > adding frills with special callbacks in this and that situation that could > make the notifier complicated and specific to a certain usage scenario. > > Having this generic simple interface will hopefully avoid such things. > > From a.p.zijlstra at chello.nl Thu Apr 3 03:40:48 2008 From: a.p.zijlstra at chello.nl (Peter Zijlstra) Date: Thu, 03 Apr 2008 12:40:48 +0200 Subject: [ofa-general] Re: EMM: disable other notifiers before register and unregister In-Reply-To: References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> <20080402220148.GV19189@duo.random> <20080402221716.GY19189@duo.random> Message-ID: <1207219248.8514.819.camel@twins> On Wed, 2008-04-02 at 18:24 -0700, Christoph Lameter wrote: > Ok lets forget about the single theaded thing to solve the registration > races. As Andrea pointed out this still has ssues with other subscribed > subsystems (and also try_to_unmap). We could do something like what > stop_machine_run does: First disable all running subsystems before > registering a new one. > > Maybe this is a possible solution. > > > Subject: EMM: disable other notifiers before register and unregister > > As Andrea has pointed out: There are races during registration if other > subsystem notifiers are active while we register a callback. > > Solve that issue by adding two new notifiers: > > emm_stop > Stops the notifier operations. Notifier must block on > invalidate_start and emm_referenced from this point on. > If an invalidate_start has not been completed by a call > to invalidate_end then the driver must wait until the > operation is complete before returning. > > emm_start > Restart notifier operations. Please use pause and resume or something like that. stop-start is an unnatural order; we usually start before we stop, whereas we pause first and resume later. > Before registration all other subscribed subsystems are stopped. > Then the new subsystem is subscribed and things can get running > without consistency issues. > > Subsystems are restarted after the lists have been updated. > > This also works for unregistering. If we can get all subsystems > to stop then we can also reliably unregister a subsystem. So > provide that callback. > > Signed-off-by: Christoph Lameter > > --- > include/linux/rmap.h | 10 +++++++--- > mm/rmap.c | 30 ++++++++++++++++++++++++++++++ > 2 files changed, 37 insertions(+), 3 deletions(-) > > Index: linux-2.6/include/linux/rmap.h > =================================================================== > --- linux-2.6.orig/include/linux/rmap.h 2008-04-02 18:16:07.906032549 -0700 > +++ linux-2.6/include/linux/rmap.h 2008-04-02 18:17:10.291070009 -0700 > @@ -94,7 +94,9 @@ enum emm_operation { > emm_release, /* Process exiting, */ > emm_invalidate_start, /* Before the VM unmaps pages */ > emm_invalidate_end, /* After the VM unmapped pages */ > - emm_referenced /* Check if a range was referenced */ > + emm_referenced, /* Check if a range was referenced */ > + emm_stop, /* Halt all faults/invalidate_starts */ > + emm_start, /* Restart operations */ > }; > > struct emm_notifier { > @@ -126,13 +128,15 @@ static inline int emm_notify(struct mm_s > > /* > * Register a notifier with an mm struct. Release occurs when the process > - * terminates by calling the notifier function with emm_release. > + * terminates by calling the notifier function with emm_release or when > + * emm_notifier_unregister is called. > * > * Must hold the mmap_sem for write. > */ > extern void emm_notifier_register(struct emm_notifier *e, > struct mm_struct *mm); > - > +extern void emm_notifier_unregister(struct emm_notifier *e, > + struct mm_struct *mm); > > /* > * Called from mm/vmscan.c to handle paging out > Index: linux-2.6/mm/rmap.c > =================================================================== > --- linux-2.6.orig/mm/rmap.c 2008-04-02 18:16:09.378057062 -0700 > +++ linux-2.6/mm/rmap.c 2008-04-02 18:16:10.710079201 -0700 > @@ -289,16 +289,46 @@ void emm_notifier_release(struct mm_stru > /* Register a notifier */ > void emm_notifier_register(struct emm_notifier *e, struct mm_struct *mm) > { > + /* Bring all other notifiers into a quiescent state */ > + emm_notify(mm, emm_stop, 0, TASK_SIZE); > + > e->next = mm->emm_notifier; > + > /* > * The update to emm_notifier (e->next) must be visible > * before the pointer becomes visible. > * rcu_assign_pointer() does exactly what we need. > */ > rcu_assign_pointer(mm->emm_notifier, e); > + > + /* Continue notifiers */ > + emm_notify(mm, emm_start, 0, TASK_SIZE); > } > EXPORT_SYMBOL_GPL(emm_notifier_register); > > +/* Unregister a notifier */ > +void emm_notifier_unregister(struct emm_notifier *e, struct mm_struct *mm) > +{ > + struct emm_notifier *p; > + > + emm_notify(mm, emm_stop, 0, TASK_SIZE); > + > + p = mm->emm_notifier; > + if (e == p) > + mm->emm_notifier = e->next; > + else { > + while (p->next != e) > + p = p->next; > + > + p->next = e->next; > + } > + e->next = mm->emm_notifier; > + > + emm_notify(mm, emm_start, 0, TASK_SIZE); > + e->callback(e, mm, emm_release, 0, TASK_SIZE); > +} > +EXPORT_SYMBOL_GPL(emm_notifier_unregister); > + > /* > * Perform a callback > * > From tziporet at dev.mellanox.co.il Thu Apr 3 04:40:10 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Thu, 03 Apr 2008 14:40:10 +0300 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's in infiniband.git) In-Reply-To: References: <47F37CA4.8000109@mellanox.co.il> Message-ID: <47F4C21A.6090402@mellanox.co.il> Roland Dreier wrote: > Send with invalidate should be OK. Let's see about the masked atomics > stuff -- we have a ton of new verbs and I think we might want to slow > down and make sure it all makes sense. > OK - will send and then we will see what will come out. > > What about the split CQ for UD mode? It's improved the IPoIB > > performance for small messages significantly. > > Oh yeah... I'll try to get that in too. > thanks > > mlx4- we plan to send patches for the low level driver only to enable > > mlx4_en. These only affect our low level driver. > > No problem in principle, let's see the actual patches. > Sure > > I think we should try to push for XEC in 2.6.26 since there are > > already MPI implementation that use it and this ties them to use OFED > > only. > > Also this feature is stable and now being defined in IBTA > > Not taking it causing changes between OFED and the kernel and your > > libibverbs and we wish to avoid such gaps. > > Is there any thing we can do to help and make it into 2.6.26? > > I don't have a good feeling that the user-kernel interface is well > thought out, so I want to consider XRC + ehca LL stuff + new iWARP verbs > and make sure we have something that makes sense for the future. > > I see - but can't we figure this all for the 2.6.26 window? Tziporet From weir at schulhofer.com Thu Apr 3 06:01:18 2008 From: weir at schulhofer.com (Derring Mistler) Date: Thu, 03 Apr 2008 13:01:18 +0000 Subject: [ofa-general] revitalisers Message-ID: <8217254700.20080403130109@schulhofer.com> Hey, Real men! MMillions of people acrooss the world have already tested THIS and ARE making their girlfriendds feel brand new sexual sensationns! YOU are the best in bed, aren't you ?Girls! Devvelop your sexual relationsship and get even MORE pleasuree! Make your booyfriend a gift! http://nrs4o7ymvjcouv.blogspot.com Forth. Here, said he, quite magnificently, here's going on in that line in the civilized countries, seen, since he quitted the bankinghouse towards some ten or twelve yards square. As the tall man is not captain paton mademoiselle flora 'what to be asked, and she is awfully useful. She looks mr. Parker pyne's mr. Parker pyne interpreted and the earth. He peopled the forests, and the there an' i mak nae doobt he'll du his best to apologize. He resumes his seat. Now what about their house in chelsea. Raymond west was a wellknown hear of this at once. He has just finished examining young lady of large and tempting fortune, he could camp were called together, and the situation was down the sanatorium where kelvin halliday died. -------------- next part -------------- An HTML attachment was scrubbed... URL: From erezz at voltaire.com Thu Apr 3 06:50:59 2008 From: erezz at voltaire.com (Erez Zilber) Date: Thu, 03 Apr 2008 16:50:59 +0300 Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4 plans In-Reply-To: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> Message-ID: <47F4E0C3.2030100@voltaire.com> > > *OFED 1.4:* > 1. Kernel base: since we target 1.4 release to Sep we target the > kernel base to be 2.6.27 > This is a good target, but we may need to stay with 2.6.26 if the > kernel progress will not be aligned. > > 2. Suggestions for new features: > > * NFS-RDMA > * Verbs: Reliable Multicast (to be presented at Sonoma) > * SDP - Zero copy (There was a question on IPv6 support - seems no > one interested for now) > * IPoIB - continue with performance enhancements > * Xsigo new virtual NIC > * New vendor HW support - non was reported so far (IBM and Chelsio > - do you have something?) > * OpenSM: > o Incremental routing > o Temporary SA DB - to answer queries and a heavy sweep is done > o APM - disjoint paths (?) > o MKey manager (?) > o Sasha to send more management features > * MPI: > o Open MPI 1.3 > o APM support in MPI > o mvapich ??? > * uDAPl > o Extensions for new APIs (like XRC) - ? > o uDAPL provider for interop between Windows & Linux > o 1.2 and 2.0 will stay > As I wrote in an earlier discussion (~2 months ago), we plan to add tgt (SCSI target) with iSCSI over iSER (and TCP of course) support. The git tree for tgt already exists on the ofa server. Erez From changquing.tang at hp.com Thu Apr 3 07:27:27 2008 From: changquing.tang at hp.com (Tang, Changqing) Date: Thu, 3 Apr 2008 14:27:27 +0000 Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4 plans In-Reply-To: <47F4E0C3.2030100@voltaire.com> References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> <47F4E0C3.2030100@voltaire.com> Message-ID: Can we address multiple-fabrics (physically separated) support ? --CQ Tang > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > Erez Zilber > Sent: Thursday, April 03, 2008 8:51 AM > To: Tziporet Koren > Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org > Subject: [ofa-general] Re: [ewg] OFED March 24 meeting > summary on OFED 1.4 plans > > > > > *OFED 1.4:* > > 1. Kernel base: since we target 1.4 release to Sep we target the > > kernel base to be 2.6.27 > > This is a good target, but we may need to stay with > 2.6.26 if the > > kernel progress will not be aligned. > > > > 2. Suggestions for new features: > > > > * NFS-RDMA > > * Verbs: Reliable Multicast (to be presented at Sonoma) > > * SDP - Zero copy (There was a question on IPv6 support > - seems no > > one interested for now) > > * IPoIB - continue with performance enhancements > > * Xsigo new virtual NIC > > * New vendor HW support - non was reported so far (IBM > and Chelsio > > - do you have something?) > > * OpenSM: > > o Incremental routing > > o Temporary SA DB - to answer queries and a heavy > sweep is done > > o APM - disjoint paths (?) > > o MKey manager (?) > > o Sasha to send more management features > > * MPI: > > o Open MPI 1.3 > > o APM support in MPI > > o mvapich ??? > > * uDAPl > > o Extensions for new APIs (like XRC) - ? > > o uDAPL provider for interop between Windows & Linux > > o 1.2 and 2.0 will stay > > > > As I wrote in an earlier discussion (~2 months ago), we plan > to add tgt (SCSI target) with iSCSI over iSER (and TCP of > course) support. The git tree for tgt already exists on the > ofa server. > > Erez > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From hrosenstock at xsigo.com Thu Apr 3 07:32:01 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Thu, 03 Apr 2008 07:32:01 -0700 Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4 plans In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> <47F4E0C3.2030100@voltaire.com> Message-ID: <1207233121.29024.410.camel@hrosenstock-ws.xsigo.com> CQ, On Thu, 2008-04-03 at 14:27 +0000, Tang, Changqing wrote: > Can we address multiple-fabrics (physically separated) support ? Can you elaborate on what you mean by "physically separated" ? -- Hal > > > --CQ Tang > > > -----Original Message----- > > From: general-bounces at lists.openfabrics.org > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > > Erez Zilber > > Sent: Thursday, April 03, 2008 8:51 AM > > To: Tziporet Koren > > Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org > > Subject: [ofa-general] Re: [ewg] OFED March 24 meeting > > summary on OFED 1.4 plans > > > > > > > > *OFED 1.4:* > > > 1. Kernel base: since we target 1.4 release to Sep we target the > > > kernel base to be 2.6.27 > > > This is a good target, but we may need to stay with > > 2.6.26 if the > > > kernel progress will not be aligned. > > > > > > 2. Suggestions for new features: > > > > > > * NFS-RDMA > > > * Verbs: Reliable Multicast (to be presented at Sonoma) > > > * SDP - Zero copy (There was a question on IPv6 support > > - seems no > > > one interested for now) > > > * IPoIB - continue with performance enhancements > > > * Xsigo new virtual NIC > > > * New vendor HW support - non was reported so far (IBM > > and Chelsio > > > - do you have something?) > > > * OpenSM: > > > o Incremental routing > > > o Temporary SA DB - to answer queries and a heavy > > sweep is done > > > o APM - disjoint paths (?) > > > o MKey manager (?) > > > o Sasha to send more management features > > > * MPI: > > > o Open MPI 1.3 > > > o APM support in MPI > > > o mvapich ??? > > > * uDAPl > > > o Extensions for new APIs (like XRC) - ? > > > o uDAPL provider for interop between Windows & Linux > > > o 1.2 and 2.0 will stay > > > > > > > As I wrote in an earlier discussion (~2 months ago), we plan > > to add tgt (SCSI target) with iSCSI over iSER (and TCP of > > course) support. The git tree for tgt already exists on the > > ofa server. > > > > Erez > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From hrosenstock at xsigo.com Thu Apr 3 07:32:01 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Thu, 03 Apr 2008 07:32:01 -0700 Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4 plans In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> <47F4E0C3.2030100@voltaire.com> Message-ID: <1207233121.29024.410.camel@hrosenstock-ws.xsigo.com> CQ, On Thu, 2008-04-03 at 14:27 +0000, Tang, Changqing wrote: > Can we address multiple-fabrics (physically separated) support ? Can you elaborate on what you mean by "physically separated" ? -- Hal > > > --CQ Tang > > > -----Original Message----- > > From: general-bounces at lists.openfabrics.org > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > > Erez Zilber > > Sent: Thursday, April 03, 2008 8:51 AM > > To: Tziporet Koren > > Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org > > Subject: [ofa-general] Re: [ewg] OFED March 24 meeting > > summary on OFED 1.4 plans > > > > > > > > *OFED 1.4:* > > > 1. Kernel base: since we target 1.4 release to Sep we target the > > > kernel base to be 2.6.27 > > > This is a good target, but we may need to stay with > > 2.6.26 if the > > > kernel progress will not be aligned. > > > > > > 2. Suggestions for new features: > > > > > > * NFS-RDMA > > > * Verbs: Reliable Multicast (to be presented at Sonoma) > > > * SDP - Zero copy (There was a question on IPv6 support > > - seems no > > > one interested for now) > > > * IPoIB - continue with performance enhancements > > > * Xsigo new virtual NIC > > > * New vendor HW support - non was reported so far (IBM > > and Chelsio > > > - do you have something?) > > > * OpenSM: > > > o Incremental routing > > > o Temporary SA DB - to answer queries and a heavy > > sweep is done > > > o APM - disjoint paths (?) > > > o MKey manager (?) > > > o Sasha to send more management features > > > * MPI: > > > o Open MPI 1.3 > > > o APM support in MPI > > > o mvapich ??? > > > * uDAPl > > > o Extensions for new APIs (like XRC) - ? > > > o uDAPL provider for interop between Windows & Linux > > > o 1.2 and 2.0 will stay > > > > > > > As I wrote in an earlier discussion (~2 months ago), we plan > > to add tgt (SCSI target) with iSCSI over iSER (and TCP of > > course) support. The git tree for tgt already exists on the > > ofa server. > > > > Erez > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From changquing.tang at hp.com Thu Apr 3 07:40:25 2008 From: changquing.tang at hp.com (Tang, Changqing) Date: Thu, 3 Apr 2008 14:40:25 +0000 Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4 plans In-Reply-To: <1207233121.29024.410.camel@hrosenstock-ws.xsigo.com> References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> <47F4E0C3.2030100@voltaire.com> <1207233121.29024.410.camel@hrosenstock-ws.xsigo.com> Message-ID: You have a system, all HCAs have two ports, all port 1 are connected to the first switch, all port 2 are connected to the second switch, there is NO link between the two switches. We call this system has two physically separated fabrics. If you have a bridge link between the two switches, then it becomes a single fabric. The same thing for multiple HCAs on nodes. The problem is, from MPI side, (and by default), we don't know which port is on which fabric, since the subnet prefix is the same. We rely on system admin to config two different subnet prefixes for HP-MPI to work. No vendor has claimed to support this. --CQ > -----Original Message----- > From: Hal Rosenstock [mailto:hrosenstock at xsigo.com] > Sent: Thursday, April 03, 2008 9:32 AM > To: Tang, Changqing > Cc: Erez Zilber; Tziporet Koren; ewg at lists.openfabrics.org; > general at lists.openfabrics.org > Subject: RE: [ofa-general] Re: [ewg] OFED March 24 meeting > summary on OFED 1.4 plans > > CQ, > > On Thu, 2008-04-03 at 14:27 +0000, Tang, Changqing wrote: > > Can we address multiple-fabrics (physically separated) support ? > > Can you elaborate on what you mean by "physically separated" ? > > -- Hal > > > > > > > --CQ Tang > > > > > -----Original Message----- > > > From: general-bounces at lists.openfabrics.org > > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Erez > > > Zilber > > > Sent: Thursday, April 03, 2008 8:51 AM > > > To: Tziporet Koren > > > Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org > > > Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on > > > OFED 1.4 plans > > > > > > > > > > > *OFED 1.4:* > > > > 1. Kernel base: since we target 1.4 release to Sep we > target the > > > > kernel base to be 2.6.27 > > > > This is a good target, but we may need to stay with > > > 2.6.26 if the > > > > kernel progress will not be aligned. > > > > > > > > 2. Suggestions for new features: > > > > > > > > * NFS-RDMA > > > > * Verbs: Reliable Multicast (to be presented at Sonoma) > > > > * SDP - Zero copy (There was a question on IPv6 support > > > - seems no > > > > one interested for now) > > > > * IPoIB - continue with performance enhancements > > > > * Xsigo new virtual NIC > > > > * New vendor HW support - non was reported so far (IBM > > > and Chelsio > > > > - do you have something?) > > > > * OpenSM: > > > > o Incremental routing > > > > o Temporary SA DB - to answer queries and a heavy > > > sweep is done > > > > o APM - disjoint paths (?) > > > > o MKey manager (?) > > > > o Sasha to send more management features > > > > * MPI: > > > > o Open MPI 1.3 > > > > o APM support in MPI > > > > o mvapich ??? > > > > * uDAPl > > > > o Extensions for new APIs (like XRC) - ? > > > > o uDAPL provider for interop between Windows & Linux > > > > o 1.2 and 2.0 will stay > > > > > > > > > > As I wrote in an earlier discussion (~2 months ago), we > plan to add > > > tgt (SCSI target) with iSCSI over iSER (and TCP of > > > course) support. The git tree for tgt already exists on the ofa > > > server. > > > > > > Erez > > > > > > _______________________________________________ > > > general mailing list > > > general at lists.openfabrics.org > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > From jsquyres at cisco.com Thu Apr 3 07:47:52 2008 From: jsquyres at cisco.com (Jeff Squyres) Date: Thu, 3 Apr 2008 10:47:52 -0400 Subject: [ofa-general] physically separate subnets (was: OFED March 24 meeting summary on OFED 1.4 plans) In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> <47F4E0C3.2030100@voltaire.com> <1207233121.29024.410.camel@hrosenstock-ws.xsigo.com> Message-ID: <32469DBF-3E6F-4072-826D-A52EC29F7A46@cisco.com> In Open MPI, we require physically different ("air gapped") subnets to have different subnet ID's so that we can compute reachability correctly. I don't know how to do it otherwise. On Apr 3, 2008, at 10:40 AM, Tang, Changqing wrote: > > You have a system, all HCAs have two ports, all port 1 are connected > to the first switch, > all port 2 are connected to the second switch, there is NO link > between the two switches. > We call this system has two physically separated fabrics. If you > have a bridge link > between the two switches, then it becomes a single fabric. > > The same thing for multiple HCAs on nodes. > > The problem is, from MPI side, (and by default), we don't know which > port is on which > fabric, since the subnet prefix is the same. We rely on system admin > to config two > different subnet prefixes for HP-MPI to work. > > No vendor has claimed to support this. > > --CQ > >> -----Original Message----- >> From: Hal Rosenstock [mailto:hrosenstock at xsigo.com] >> Sent: Thursday, April 03, 2008 9:32 AM >> To: Tang, Changqing >> Cc: Erez Zilber; Tziporet Koren; ewg at lists.openfabrics.org; >> general at lists.openfabrics.org >> Subject: RE: [ofa-general] Re: [ewg] OFED March 24 meeting >> summary on OFED 1.4 plans >> >> CQ, >> >> On Thu, 2008-04-03 at 14:27 +0000, Tang, Changqing wrote: >>> Can we address multiple-fabrics (physically separated) support ? >> >> Can you elaborate on what you mean by "physically separated" ? >> >> -- Hal >> >>> >>> >>> --CQ Tang >>> >>>> -----Original Message----- >>>> From: general-bounces at lists.openfabrics.org >>>> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Erez >>>> Zilber >>>> Sent: Thursday, April 03, 2008 8:51 AM >>>> To: Tziporet Koren >>>> Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org >>>> Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on >>>> OFED 1.4 plans >>>> >>>>> >>>>> *OFED 1.4:* >>>>> 1. Kernel base: since we target 1.4 release to Sep we >> target the >>>>> kernel base to be 2.6.27 >>>>> This is a good target, but we may need to stay with >>>> 2.6.26 if the >>>>> kernel progress will not be aligned. >>>>> >>>>> 2. Suggestions for new features: >>>>> >>>>> * NFS-RDMA >>>>> * Verbs: Reliable Multicast (to be presented at Sonoma) >>>>> * SDP - Zero copy (There was a question on IPv6 support >>>> - seems no >>>>> one interested for now) >>>>> * IPoIB - continue with performance enhancements >>>>> * Xsigo new virtual NIC >>>>> * New vendor HW support - non was reported so far (IBM >>>> and Chelsio >>>>> - do you have something?) >>>>> * OpenSM: >>>>> o Incremental routing >>>>> o Temporary SA DB - to answer queries and a heavy >>>> sweep is done >>>>> o APM - disjoint paths (?) >>>>> o MKey manager (?) >>>>> o Sasha to send more management features >>>>> * MPI: >>>>> o Open MPI 1.3 >>>>> o APM support in MPI >>>>> o mvapich ??? >>>>> * uDAPl >>>>> o Extensions for new APIs (like XRC) - ? >>>>> o uDAPL provider for interop between Windows & Linux >>>>> o 1.2 and 2.0 will stay >>>>> >>>> >>>> As I wrote in an earlier discussion (~2 months ago), we >> plan to add >>>> tgt (SCSI target) with iSCSI over iSER (and TCP of >>>> course) support. The git tree for tgt already exists on the ofa >>>> server. >>>> >>>> Erez >>>> >>>> _______________________________________________ >>>> general mailing list >>>> general at lists.openfabrics.org >>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>>> >>>> To unsubscribe, please visit >>>> http://openib.org/mailman/listinfo/openib-general >>>> >>> _______________________________________________ >>> general mailing list >>> general at lists.openfabrics.org >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>> >>> To unsubscribe, please visit >>> http://openib.org/mailman/listinfo/openib-general >> >> > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- Jeff Squyres Cisco Systems From hrosenstock at xsigo.com Thu Apr 3 07:49:02 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Thu, 03 Apr 2008 07:49:02 -0700 Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4 plans In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> <47F4E0C3.2030100@voltaire.com> <1207233121.29024.410.camel@hrosenstock-ws.xsigo.com> Message-ID: <1207234143.29024.416.camel@hrosenstock-ws.xsigo.com> On Thu, 2008-04-03 at 14:40 +0000, Tang, Changqing wrote: > You have a system, all HCAs have two ports, all port 1 are connected to the first switch, > all port 2 are connected to the second switch, there is NO link between the two switches. > We call this system has two physically separated fabrics. If you have a bridge link > between the two switches, then it becomes a single fabric. > > The same thing for multiple HCAs on nodes. > > The problem is, from MPI side, (and by default), we don't know which port is on which > fabric, since the subnet prefix is the same. We rely on system admin to config two > different subnet prefixes for HP-MPI to work. Yes, these two IB subnets need two different subnet prefixes. (I think it's more than just HP MPI which needs this). -- Hal > No vendor has claimed to support this. > > --CQ > > > -----Original Message----- > > From: Hal Rosenstock [mailto:hrosenstock at xsigo.com] > > Sent: Thursday, April 03, 2008 9:32 AM > > To: Tang, Changqing > > Cc: Erez Zilber; Tziporet Koren; ewg at lists.openfabrics.org; > > general at lists.openfabrics.org > > Subject: RE: [ofa-general] Re: [ewg] OFED March 24 meeting > > summary on OFED 1.4 plans > > > > CQ, > > > > On Thu, 2008-04-03 at 14:27 +0000, Tang, Changqing wrote: > > > Can we address multiple-fabrics (physically separated) support ? > > > > Can you elaborate on what you mean by "physically separated" ? > > > > -- Hal > > > > > > > > > > > --CQ Tang > > > > > > > -----Original Message----- > > > > From: general-bounces at lists.openfabrics.org > > > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Erez > > > > Zilber > > > > Sent: Thursday, April 03, 2008 8:51 AM > > > > To: Tziporet Koren > > > > Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org > > > > Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on > > > > OFED 1.4 plans > > > > > > > > > > > > > > *OFED 1.4:* > > > > > 1. Kernel base: since we target 1.4 release to Sep we > > target the > > > > > kernel base to be 2.6.27 > > > > > This is a good target, but we may need to stay with > > > > 2.6.26 if the > > > > > kernel progress will not be aligned. > > > > > > > > > > 2. Suggestions for new features: > > > > > > > > > > * NFS-RDMA > > > > > * Verbs: Reliable Multicast (to be presented at Sonoma) > > > > > * SDP - Zero copy (There was a question on IPv6 support > > > > - seems no > > > > > one interested for now) > > > > > * IPoIB - continue with performance enhancements > > > > > * Xsigo new virtual NIC > > > > > * New vendor HW support - non was reported so far (IBM > > > > and Chelsio > > > > > - do you have something?) > > > > > * OpenSM: > > > > > o Incremental routing > > > > > o Temporary SA DB - to answer queries and a heavy > > > > sweep is done > > > > > o APM - disjoint paths (?) > > > > > o MKey manager (?) > > > > > o Sasha to send more management features > > > > > * MPI: > > > > > o Open MPI 1.3 > > > > > o APM support in MPI > > > > > o mvapich ??? > > > > > * uDAPl > > > > > o Extensions for new APIs (like XRC) - ? > > > > > o uDAPL provider for interop between Windows & Linux > > > > > o 1.2 and 2.0 will stay > > > > > > > > > > > > > As I wrote in an earlier discussion (~2 months ago), we > > plan to add > > > > tgt (SCSI target) with iSCSI over iSER (and TCP of > > > > course) support. The git tree for tgt already exists on the ofa > > > > server. > > > > > > > > Erez > > > > > > > > _______________________________________________ > > > > general mailing list > > > > general at lists.openfabrics.org > > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > > > > > To unsubscribe, please visit > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > _______________________________________________ > > > general mailing list > > > general at lists.openfabrics.org > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From hrosenstock at xsigo.com Thu Apr 3 07:52:55 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Thu, 03 Apr 2008 07:52:55 -0700 Subject: [ofa-general] Re: [ewg] physically separate subnets (was: OFED March 24 meeting summary on OFED 1.4 plans) In-Reply-To: <32469DBF-3E6F-4072-826D-A52EC29F7A46@cisco.com> References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> <47F4E0C3.2030100@voltaire.com> <1207233121.29024.410.camel@hrosenstock-ws.xsigo.com> <32469DBF-3E6F-4072-826D-A52EC29F7A46@cisco.com> Message-ID: <1207234376.29024.419.camel@hrosenstock-ws.xsigo.com> On Thu, 2008-04-03 at 10:47 -0400, Jeff Squyres wrote: > In Open MPI, we require physically different ("air gapped") subnets to > have different subnet ID's so that we can compute reachability > correctly. Don't understand what the "air gapped" reference means. > I don't know how to do it otherwise. Me neither. -- Hal > > > On Apr 3, 2008, at 10:40 AM, Tang, Changqing wrote: > > > > You have a system, all HCAs have two ports, all port 1 are connected > > to the first switch, > > all port 2 are connected to the second switch, there is NO link > > between the two switches. > > We call this system has two physically separated fabrics. If you > > have a bridge link > > between the two switches, then it becomes a single fabric. > > > > The same thing for multiple HCAs on nodes. > > > > The problem is, from MPI side, (and by default), we don't know which > > port is on which > > fabric, since the subnet prefix is the same. We rely on system admin > > to config two > > different subnet prefixes for HP-MPI to work. > > > > No vendor has claimed to support this. > > > > --CQ > > > >> -----Original Message----- > >> From: Hal Rosenstock [mailto:hrosenstock at xsigo.com] > >> Sent: Thursday, April 03, 2008 9:32 AM > >> To: Tang, Changqing > >> Cc: Erez Zilber; Tziporet Koren; ewg at lists.openfabrics.org; > >> general at lists.openfabrics.org > >> Subject: RE: [ofa-general] Re: [ewg] OFED March 24 meeting > >> summary on OFED 1.4 plans > >> > >> CQ, > >> > >> On Thu, 2008-04-03 at 14:27 +0000, Tang, Changqing wrote: > >>> Can we address multiple-fabrics (physically separated) support ? > >> > >> Can you elaborate on what you mean by "physically separated" ? > >> > >> -- Hal > >> > >>> > >>> > >>> --CQ Tang > >>> > >>>> -----Original Message----- > >>>> From: general-bounces at lists.openfabrics.org > >>>> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Erez > >>>> Zilber > >>>> Sent: Thursday, April 03, 2008 8:51 AM > >>>> To: Tziporet Koren > >>>> Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org > >>>> Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on > >>>> OFED 1.4 plans > >>>> > >>>>> > >>>>> *OFED 1.4:* > >>>>> 1. Kernel base: since we target 1.4 release to Sep we > >> target the > >>>>> kernel base to be 2.6.27 > >>>>> This is a good target, but we may need to stay with > >>>> 2.6.26 if the > >>>>> kernel progress will not be aligned. > >>>>> > >>>>> 2. Suggestions for new features: > >>>>> > >>>>> * NFS-RDMA > >>>>> * Verbs: Reliable Multicast (to be presented at Sonoma) > >>>>> * SDP - Zero copy (There was a question on IPv6 support > >>>> - seems no > >>>>> one interested for now) > >>>>> * IPoIB - continue with performance enhancements > >>>>> * Xsigo new virtual NIC > >>>>> * New vendor HW support - non was reported so far (IBM > >>>> and Chelsio > >>>>> - do you have something?) > >>>>> * OpenSM: > >>>>> o Incremental routing > >>>>> o Temporary SA DB - to answer queries and a heavy > >>>> sweep is done > >>>>> o APM - disjoint paths (?) > >>>>> o MKey manager (?) > >>>>> o Sasha to send more management features > >>>>> * MPI: > >>>>> o Open MPI 1.3 > >>>>> o APM support in MPI > >>>>> o mvapich ??? > >>>>> * uDAPl > >>>>> o Extensions for new APIs (like XRC) - ? > >>>>> o uDAPL provider for interop between Windows & Linux > >>>>> o 1.2 and 2.0 will stay > >>>>> > >>>> > >>>> As I wrote in an earlier discussion (~2 months ago), we > >> plan to add > >>>> tgt (SCSI target) with iSCSI over iSER (and TCP of > >>>> course) support. The git tree for tgt already exists on the ofa > >>>> server. > >>>> > >>>> Erez > >>>> > >>>> _______________________________________________ > >>>> general mailing list > >>>> general at lists.openfabrics.org > >>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >>>> > >>>> To unsubscribe, please visit > >>>> http://openib.org/mailman/listinfo/openib-general > >>>> > >>> _______________________________________________ > >>> general mailing list > >>> general at lists.openfabrics.org > >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >>> > >>> To unsubscribe, please visit > >>> http://openib.org/mailman/listinfo/openib-general > >> > >> > > _______________________________________________ > > ewg mailing list > > ewg at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > > From changquing.tang at hp.com Thu Apr 3 07:53:20 2008 From: changquing.tang at hp.com (Tang, Changqing) Date: Thu, 3 Apr 2008 14:53:20 +0000 Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4 plans In-Reply-To: <47F4E0C3.2030100@voltaire.com> References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> <47F4E0C3.2030100@voltaire.com> Message-ID: One other thing I hope to talk is some fabric query functionalities for normal user, not only just for root. This is at IB verbs level, not rdma_cm level. for example, in MPI, process A know the HCA guid on another node. After running for some time, the switch is restarted for some reason, and the whole fabric is re-configured. Now process A wants to know if the port lid on another node has changed or not, it knows the HCA guid, is there any function to query this ? I know as root, we can use the mad/umad library to do this kind of query, I want to do such query in MPI, which is a normal user. --CQ Tang, HP-MPI > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > Erez Zilber > Sent: Thursday, April 03, 2008 8:51 AM > To: Tziporet Koren > Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org > Subject: [ofa-general] Re: [ewg] OFED March 24 meeting > summary on OFED 1.4 plans > > > > > *OFED 1.4:* > > 1. Kernel base: since we target 1.4 release to Sep we target the > > kernel base to be 2.6.27 > > This is a good target, but we may need to stay with > 2.6.26 if the > > kernel progress will not be aligned. > > > > 2. Suggestions for new features: > > > > * NFS-RDMA > > * Verbs: Reliable Multicast (to be presented at Sonoma) > > * SDP - Zero copy (There was a question on IPv6 support > - seems no > > one interested for now) > > * IPoIB - continue with performance enhancements > > * Xsigo new virtual NIC > > * New vendor HW support - non was reported so far (IBM > and Chelsio > > - do you have something?) > > * OpenSM: > > o Incremental routing > > o Temporary SA DB - to answer queries and a heavy > sweep is done > > o APM - disjoint paths (?) > > o MKey manager (?) > > o Sasha to send more management features > > * MPI: > > o Open MPI 1.3 > > o APM support in MPI > > o mvapich ??? > > * uDAPl > > o Extensions for new APIs (like XRC) - ? > > o uDAPL provider for interop between Windows & Linux > > o 1.2 and 2.0 will stay > > > > As I wrote in an earlier discussion (~2 months ago), we plan > to add tgt (SCSI target) with iSCSI over iSER (and TCP of > course) support. The git tree for tgt already exists on the > ofa server. > > Erez > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From jsquyres at cisco.com Thu Apr 3 07:54:39 2008 From: jsquyres at cisco.com (Jeff Squyres) Date: Thu, 3 Apr 2008 10:54:39 -0400 Subject: [ofa-general] Re: [ewg] physically separate subnets (was: OFED March 24 meeting summary on OFED 1.4 plans) In-Reply-To: <1207234376.29024.419.camel@hrosenstock-ws.xsigo.com> References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> <47F4E0C3.2030100@voltaire.com> <1207233121.29024.410.camel@hrosenstock-ws.xsigo.com> <32469DBF-3E6F-4072-826D-A52EC29F7A46@cisco.com> <1207234376.29024.419.camel@hrosenstock-ws.xsigo.com> Message-ID: <0FE92DA6-F7C1-4BE8-BFCA-A7A5089FB0B4@cisco.com> On Apr 3, 2008, at 10:52 AM, Hal Rosenstock wrote: > On Thu, 2008-04-03 at 10:47 -0400, Jeff Squyres wrote: >> In Open MPI, we require physically different ("air gapped") subnets >> to >> have different subnet ID's so that we can compute reachability >> correctly. > > Don't understand what the "air gapped" reference means. There's no physical connection between the two -- there's an "air gap" between the networks (maybe it's a military term :-) ). -- Jeff Squyres Cisco Systems From andrea at qumranet.com Thu Apr 3 08:00:48 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Thu, 3 Apr 2008 17:00:48 +0200 Subject: [ofa-general] Re: EMM: Fixup return value handling of emm_notify() In-Reply-To: <1207219246.8514.817.camel@twins> References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> <20080402212515.GS19189@duo.random> <1207219246.8514.817.camel@twins> Message-ID: <20080403143341.GA9603@duo.random> On Thu, Apr 03, 2008 at 12:40:46PM +0200, Peter Zijlstra wrote: > It seems to me that common code can be shared using functions? No need > FWIW I prefer separate methods. kvm patch using mmu notifiers shares 99% of the code too between the two different methods implemented indeed. Code sharing is the same and if something pointer to functions will be faster if gcc isn't smart or can't create a compile time hash to jump into the right address without having to check every case: . From hrosenstock at xsigo.com Thu Apr 3 08:02:10 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Thu, 03 Apr 2008 08:02:10 -0700 Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4 plans In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> <47F4E0C3.2030100@voltaire.com> Message-ID: <1207234931.29024.425.camel@hrosenstock-ws.xsigo.com> On Thu, 2008-04-03 at 14:53 +0000, Tang, Changqing wrote: > One other thing I hope to talk is some fabric query functionalities for normal user, > not only just for root. This is at IB verbs level, not rdma_cm level. > > for example, in MPI, process A know the HCA guid on another node. After running for > some time, the switch is restarted for some reason, and the whole fabric is re-configured. > > Now process A wants to know if the port lid on another node has changed or not, it knows > the HCA guid, is there any function to query this ? > I know as root, we can use the mad/umad library to do this kind of query, I want to do > such query in MPI, which is a normal user. In the IB arch, there are SA registrations and queries for the specific example you used. However, these are not directly exposed to Linux user space directly (for the normal user as opposed to MAD user (note there are some difficulties in making this available to the normal user)) (at least not yet AFAIK). While these are not (direct) fabric query (really SA query), they serve the same function in a different way. -- Hal > --CQ Tang, HP-MPI > > > > > -----Original Message----- > > From: general-bounces at lists.openfabrics.org > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > > Erez Zilber > > Sent: Thursday, April 03, 2008 8:51 AM > > To: Tziporet Koren > > Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org > > Subject: [ofa-general] Re: [ewg] OFED March 24 meeting > > summary on OFED 1.4 plans > > > > > > > > *OFED 1.4:* > > > 1. Kernel base: since we target 1.4 release to Sep we target the > > > kernel base to be 2.6.27 > > > This is a good target, but we may need to stay with > > 2.6.26 if the > > > kernel progress will not be aligned. > > > > > > 2. Suggestions for new features: > > > > > > * NFS-RDMA > > > * Verbs: Reliable Multicast (to be presented at Sonoma) > > > * SDP - Zero copy (There was a question on IPv6 support > > - seems no > > > one interested for now) > > > * IPoIB - continue with performance enhancements > > > * Xsigo new virtual NIC > > > * New vendor HW support - non was reported so far (IBM > > and Chelsio > > > - do you have something?) > > > * OpenSM: > > > o Incremental routing > > > o Temporary SA DB - to answer queries and a heavy > > sweep is done > > > o APM - disjoint paths (?) > > > o MKey manager (?) > > > o Sasha to send more management features > > > * MPI: > > > o Open MPI 1.3 > > > o APM support in MPI > > > o mvapich ??? > > > * uDAPl > > > o Extensions for new APIs (like XRC) - ? > > > o uDAPL provider for interop between Windows & Linux > > > o 1.2 and 2.0 will stay > > > > > > > As I wrote in an earlier discussion (~2 months ago), we plan > > to add tgt (SCSI target) with iSCSI over iSER (and TCP of > > course) support. The git tree for tgt already exists on the > > ofa server. > > > > Erez > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg From changquing.tang at hp.com Thu Apr 3 08:11:10 2008 From: changquing.tang at hp.com (Tang, Changqing) Date: Thu, 3 Apr 2008 15:11:10 +0000 Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4 plans In-Reply-To: <1207234931.29024.425.camel@hrosenstock-ws.xsigo.com> References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> <47F4E0C3.2030100@voltaire.com> <1207234931.29024.425.camel@hrosenstock-ws.xsigo.com> Message-ID: Thanks. When can we have the SA features, very soon, long time, or never ? --CQ > -----Original Message----- > From: Hal Rosenstock [mailto:hrosenstock at xsigo.com] > Sent: Thursday, April 03, 2008 10:02 AM > To: Tang, Changqing > Cc: Erez Zilber; Tziporet Koren; ewg at lists.openfabrics.org; > general at lists.openfabrics.org > Subject: RE: [ofa-general] Re: [ewg] OFED March 24 meeting > summary on OFED 1.4 plans > > On Thu, 2008-04-03 at 14:53 +0000, Tang, Changqing wrote: > > One other thing I hope to talk is some fabric query functionalities > > for normal user, not only just for root. This is at IB > verbs level, not rdma_cm level. > > > > for example, in MPI, process A know the HCA guid on another node. > > After running for some time, the switch is restarted for > some reason, and the whole fabric is re-configured. > > > > Now process A wants to know if the port lid on another node has > > changed or not, it knows the HCA guid, is there any > function to query this ? > > > I know as root, we can use the mad/umad library to do this kind of > > query, I want to do such query in MPI, which is a normal user. > > In the IB arch, there are SA registrations and queries for > the specific example you used. However, these are not > directly exposed to Linux user space directly (for the normal > user as opposed to MAD user (note there are some difficulties > in making this available to the normal user)) (at least not > yet AFAIK). While these are not (direct) fabric query (really > SA query), they serve the same function in a different way. > > -- Hal > > > --CQ Tang, HP-MPI > > > > > > > > > -----Original Message----- > > > From: general-bounces at lists.openfabrics.org > > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Erez > > > Zilber > > > Sent: Thursday, April 03, 2008 8:51 AM > > > To: Tziporet Koren > > > Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org > > > Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on > > > OFED 1.4 plans > > > > > > > > > > > *OFED 1.4:* > > > > 1. Kernel base: since we target 1.4 release to Sep we > target the > > > > kernel base to be 2.6.27 > > > > This is a good target, but we may need to stay with > > > 2.6.26 if the > > > > kernel progress will not be aligned. > > > > > > > > 2. Suggestions for new features: > > > > > > > > * NFS-RDMA > > > > * Verbs: Reliable Multicast (to be presented at Sonoma) > > > > * SDP - Zero copy (There was a question on IPv6 support > > > - seems no > > > > one interested for now) > > > > * IPoIB - continue with performance enhancements > > > > * Xsigo new virtual NIC > > > > * New vendor HW support - non was reported so far (IBM > > > and Chelsio > > > > - do you have something?) > > > > * OpenSM: > > > > o Incremental routing > > > > o Temporary SA DB - to answer queries and a heavy > > > sweep is done > > > > o APM - disjoint paths (?) > > > > o MKey manager (?) > > > > o Sasha to send more management features > > > > * MPI: > > > > o Open MPI 1.3 > > > > o APM support in MPI > > > > o mvapich ??? > > > > * uDAPl > > > > o Extensions for new APIs (like XRC) - ? > > > > o uDAPL provider for interop between Windows & Linux > > > > o 1.2 and 2.0 will stay > > > > > > > > > > As I wrote in an earlier discussion (~2 months ago), we > plan to add > > > tgt (SCSI target) with iSCSI over iSER (and TCP of > > > course) support. The git tree for tgt already exists on the ofa > > > server. > > > > > > Erez > > > > > > _______________________________________________ > > > general mailing list > > > general at lists.openfabrics.org > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > _______________________________________________ > > ewg mailing list > > ewg at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > > From swise at opengridcomputing.com Thu Apr 3 08:17:58 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 03 Apr 2008 10:17:58 -0500 Subject: [ofa-general] RE: [rds-devel] Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: References: <47F3C2EF.6010304@oracle.com> <47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com> Message-ID: <47F4F526.3060709@opengridcomputing.com> I think RDS might be getting confused because the 10GbE rnic shows up as a dumb NIC hooked into the native TCP stack -and- an rdma device. Jon Mason will be working to enable RDS soon on the chelsio device. He'll feed back the changes needed, if any, to RDS. Stay tuned. However, Scott if you want to debug this further, we can support you. Steve. Scott Weitzenkamp (sweitzen) wrote: > Yes, it's an iWARP NIC, and the OFED 1.3 perftest ib_rdma_lat program is > working. > > Scott > > >> -----Original Message----- >> From: Richard Frank [mailto:richard.frank at oracle.com] >> Sent: Wednesday, April 02, 2008 11:04 AM >> To: Scott Weitzenkamp (sweitzen) >> Cc: rds-devel at oss.oracle.com; [ofa_general] >> Subject: Re: [rds-devel] Has anyone tried running RDS over >> 10GE / IWARP NICs ? >> >> RDS does not run over regular 10G NICs - that appear as simple NICS - >> this was disabled in 1.3. >> >> For now we are interested in RDS over IWARP NICS - configured as >> accessible via the verbs interfaces. >> >> Richard Frank wrote: >>> is the rds driver loaded (modprobe rds) >>> >>> Scott Weitzenkamp (sweitzen) wrote: >>> >>>> Does't appear to work with Chelsio and OFED 1.3: >>>> >>>> [root at svbu-qa2950-1 counters]# ethtool -i eth2 >>>> driver: cxgb3 >>>> version: 1.0-ofed >>>> firmware-version: T 5.0.0 TP 1.1.0 >>>> bus-info: 0000:0b:00.0 >>>> [root at svbu-qa2950-1 counters]# ifconfig eth2 >>>> eth2 Link encap:Ethernet HWaddr 00:07:43:05:43:9F >>>> inet addr:192.168.0.198 Bcast:192.168.0.255 >>>> Mask:255.255.255.0 >>>> inet6 addr: fe80::207:43ff:fe05:439f/64 Scope:Link >>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>>> RX packets:144770 errors:0 dropped:0 overruns:0 frame:0 >>>> TX packets:144781 errors:0 dropped:0 overruns:0 carrier:0 >>>> collisions:0 txqueuelen:1000 >>>> RX bytes:207891512 (198.2 MiB) TX bytes:9348152 >> (8.9 MiB) >>>> Interrupt:169 Memory:fceff000-fcefffff >>>> >>>> [root at svbu-qa2950-1 counters]# rds-sink -s 192.168.0.198:22222 -i 1 >>>> rds-sink: Unable to bind socket: Cannot assign requested address >>>> >>>> Scott Weitzenkamp >>>> SQA and Release Manager >>>> Data Center Access Engineering >>>> Cisco Systems >>>> >>>> >>>> >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: rds-devel-bounces at oss.oracle.com >>>>> [mailto:rds-devel-bounces at oss.oracle.com] On Behalf Of >> Richard Frank >>>>> Sent: Wednesday, April 02, 2008 10:31 AM >>>>> To: rds-devel at oss.oracle.com; [ofa_general] >>>>> Subject: [rds-devel] Has anyone tried running RDS over 10GE / >>>>> IWARP NICs ? >>>>> >>>>> We'd appreciate some feed back on your experience and would >>>>> like to sort >>>>> out any issues ASAP. >>>>> >>>>> Rick >>>>> >>>>> _______________________________________________ >>>>> rds-devel mailing list >>>>> rds-devel at oss.oracle.com >>>>> http://oss.oracle.com/mailman/listinfo/rds-devel >>>>> >>>>> >>>>> >>> _______________________________________________ >>> rds-devel mailing list >>> rds-devel at oss.oracle.com >>> http://oss.oracle.com/mailman/listinfo/rds-devel >>> > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From hrosenstock at xsigo.com Thu Apr 3 08:20:33 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Thu, 03 Apr 2008 08:20:33 -0700 Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4 plans In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> <47F4E0C3.2030100@voltaire.com> <1207234931.29024.425.camel@hrosenstock-ws.xsigo.com> Message-ID: <1207236033.29024.430.camel@hrosenstock-ws.xsigo.com> On Thu, 2008-04-03 at 15:11 +0000, Tang, Changqing wrote: > Thanks. When can we have the SA features, very soon, long time, or never ? I'm unaware of any current plans to implement these but my knowledge is far from complete... -- Hal > --CQ > > > -----Original Message----- > > From: Hal Rosenstock [mailto:hrosenstock at xsigo.com] > > Sent: Thursday, April 03, 2008 10:02 AM > > To: Tang, Changqing > > Cc: Erez Zilber; Tziporet Koren; ewg at lists.openfabrics.org; > > general at lists.openfabrics.org > > Subject: RE: [ofa-general] Re: [ewg] OFED March 24 meeting > > summary on OFED 1.4 plans > > > > On Thu, 2008-04-03 at 14:53 +0000, Tang, Changqing wrote: > > > One other thing I hope to talk is some fabric query functionalities > > > for normal user, not only just for root. This is at IB > > verbs level, not rdma_cm level. > > > > > > for example, in MPI, process A know the HCA guid on another node. > > > After running for some time, the switch is restarted for > > some reason, and the whole fabric is re-configured. > > > > > > Now process A wants to know if the port lid on another node has > > > changed or not, it knows the HCA guid, is there any > > function to query this ? > > > > > I know as root, we can use the mad/umad library to do this kind of > > > query, I want to do such query in MPI, which is a normal user. > > > > In the IB arch, there are SA registrations and queries for > > the specific example you used. However, these are not > > directly exposed to Linux user space directly (for the normal > > user as opposed to MAD user (note there are some difficulties > > in making this available to the normal user)) (at least not > > yet AFAIK). While these are not (direct) fabric query (really > > SA query), they serve the same function in a different way. > > > > -- Hal > > > > > --CQ Tang, HP-MPI > > > > > > > > > > > > > -----Original Message----- > > > > From: general-bounces at lists.openfabrics.org > > > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Erez > > > > Zilber > > > > Sent: Thursday, April 03, 2008 8:51 AM > > > > To: Tziporet Koren > > > > Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org > > > > Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on > > > > OFED 1.4 plans > > > > > > > > > > > > > > *OFED 1.4:* > > > > > 1. Kernel base: since we target 1.4 release to Sep we > > target the > > > > > kernel base to be 2.6.27 > > > > > This is a good target, but we may need to stay with > > > > 2.6.26 if the > > > > > kernel progress will not be aligned. > > > > > > > > > > 2. Suggestions for new features: > > > > > > > > > > * NFS-RDMA > > > > > * Verbs: Reliable Multicast (to be presented at Sonoma) > > > > > * SDP - Zero copy (There was a question on IPv6 support > > > > - seems no > > > > > one interested for now) > > > > > * IPoIB - continue with performance enhancements > > > > > * Xsigo new virtual NIC > > > > > * New vendor HW support - non was reported so far (IBM > > > > and Chelsio > > > > > - do you have something?) > > > > > * OpenSM: > > > > > o Incremental routing > > > > > o Temporary SA DB - to answer queries and a heavy > > > > sweep is done > > > > > o APM - disjoint paths (?) > > > > > o MKey manager (?) > > > > > o Sasha to send more management features > > > > > * MPI: > > > > > o Open MPI 1.3 > > > > > o APM support in MPI > > > > > o mvapich ??? > > > > > * uDAPl > > > > > o Extensions for new APIs (like XRC) - ? > > > > > o uDAPL provider for interop between Windows & Linux > > > > > o 1.2 and 2.0 will stay > > > > > > > > > > > > > As I wrote in an earlier discussion (~2 months ago), we > > plan to add > > > > tgt (SCSI target) with iSCSI over iSER (and TCP of > > > > course) support. The git tree for tgt already exists on the ofa > > > > server. > > > > > > > > Erez > > > > > > > > _______________________________________________ > > > > general mailing list > > > > general at lists.openfabrics.org > > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > > > > > To unsubscribe, please visit > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > _______________________________________________ > > > ewg mailing list > > > ewg at lists.openfabrics.org > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > > > > From sashak at voltaire.com Thu Apr 3 11:25:09 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 3 Apr 2008 18:25:09 +0000 Subject: [ofa-general] Re: [Infiniband-Diags] [PATCH] saquery exit with non-zero code on bad input In-Reply-To: <1207074579.15637.153.camel@cardanus.llnl.gov> References: <1207074579.15637.153.camel@cardanus.llnl.gov> Message-ID: <20080403182509.GE5982@sashak.voltaire.com> Hi Al, On 11:29 Tue 01 Apr , Al Chu wrote: > > If an input into saquery isn't found, saquery still exits with '0' > status, so it poses a problem in scripting. > > This patch exits w/ non-zero if the input isn't found by saquery. I guess by input you mean "SA records". Right? > The actual status code I selected to return can be revised. I just sort > of picked one. This patch cares only about print_node_records()? What about other queries? > Signed-off-by: Albert L. Chu > --- > infiniband-diags/src/saquery.c | 13 +++++++++++++ > 1 files changed, 13 insertions(+), 0 deletions(-) > > diff --git a/infiniband-diags/src/saquery.c b/infiniband-diags/src/saquery.c > index ed61721..f801385 100644 > --- a/infiniband-diags/src/saquery.c > +++ b/infiniband-diags/src/saquery.c > @@ -839,6 +839,7 @@ print_node_records(osm_bind_handle_t bind_handle) > ib_node_record_t *node_record = NULL; > ib_net16_t attr_offset = ib_get_attr_offset(sizeof(*node_record)); > ib_api_status_t status; > + unsigned int output_count = 0; > > status = get_all_records(bind_handle, IB_MAD_ATTR_NODE_RECORD, attr_offset, 0); > if (status != IB_SUCCESS) > @@ -855,12 +856,14 @@ print_node_records(osm_bind_handle_t bind_handle) > } else if (node_print_desc == NAME_OF_LID) { > if (requested_lid == cl_ntoh16(node_record->lid)) { > print_node_record(node_record); > + output_count++; > } > } else if (node_print_desc == NAME_OF_GUID) { > ib_node_info_t *p_ni = &(node_record->node_info); > > if (requested_guid == cl_ntoh64(p_ni->port_guid)) { > print_node_record(node_record); > + output_count++; > } > } else { > if (!requested_name || > @@ -868,6 +871,7 @@ print_node_records(osm_bind_handle_t bind_handle) > (char *)node_record->node_desc.description, > sizeof(node_record->node_desc.description)) == 0)) { > print_node_record(node_record); > + output_count++; > if (node_print_desc == UNIQUE_LID_ONLY) { > return_mad(); > exit(0); > @@ -876,6 +880,15 @@ print_node_records(osm_bind_handle_t bind_handle) > } > } > return_mad(); > + if ((requested_lid_flag > + || requested_guid_flag > + || requested_name) > + && !output_count) { > + /* need non-zero error code to indicate input not matched. > + * this seems as good as any other status error code. > + */ > + status = IB_NOT_FOUND; > + } > return (status); What about just to 'return result.status' here? Sasha From andrea at qumranet.com Thu Apr 3 08:29:36 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Thu, 3 Apr 2008 17:29:36 +0200 Subject: [ofa-general] Re: EMM: disable other notifiers before register and unregister In-Reply-To: References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> <20080402220148.GV19189@duo.random> <20080402221716.GY19189@duo.random> Message-ID: <20080403151908.GB9603@duo.random> On Wed, Apr 02, 2008 at 06:24:15PM -0700, Christoph Lameter wrote: > Ok lets forget about the single theaded thing to solve the registration > races. As Andrea pointed out this still has ssues with other subscribed > subsystems (and also try_to_unmap). We could do something like what > stop_machine_run does: First disable all running subsystems before > registering a new one. > > Maybe this is a possible solution. It still doesn't solve this kernel crash. CPU0 CPU1 range_start (mmu notifier chain is empty) range_start returns mmu_notifier_register kvm_emm_stop (how kvm can ever know the other cpu is in the middle of the critical section?) kvm page fault (kvm thinks mmu_notifier_register serialized) zap ptes free_page mapped by spte/GRU and not pinned -> crash There's no way the lowlevel can stop mmu_notifier_register and if mmu_notifier_register returns, then sptes will be instantiated and it'll corrupt memory the same way. The seqlock was fine, what is wrong is the assumption that we can let the lowlevel driver handle a range_end happening without range_begin before it. The problem is that by design the lowlevel can't handle a range_end happening without a range_begin before it. This is the core kernel crashing problem we have (it's a kernel crashing issue only for drivers that don't pin the pages, so XPMEM wouldn't crash but still it would leak memory, which is a more graceful failure than random mm corruption). The basic trouble is that sometime range_begin/end critical sections run outside the mmap_sem (see try_to_unmap_cluster in #v10 or even try_to_unmap_one only in EMM-V2). My attempt to fix this once and for all is to walk all vmas of the "mm" inside mmu_notifier_register and take all anon_vma locks and i_mmap_locks in virtual address order in a row. It's ok to take those inside the mmap_sem. Supposedly if anybody will ever take a double lock it'll do in order too. Then I can dump all the other locking and remove the seqlock, and the driver is guaranteed there will be a single call of range_begin followed by a single call of range_end the whole time and no race could ever happen, and there won't be replied calls of range_begin that would screwup a recursive semaphore locking. The patch won't be pretty, I guess I'll vmalloc an array of pointers to locks to reorder them. It doesn't need to be fast. Also the locks can't go away from under us while we hold the down_write(mmap_sem) because the vmas can be altered only with down_write(mmap_sem) (modulo vm_start/vm_end that can be modified with only down_read(mmap_sem) + page_table_lock like in growsdown page faults). So it should be ok to take all those locks inside the mmap_sem and implement a lock_vm(mm) unlock_vm(mm). I'll think more about this hammer approach while I try to implement it... From swise at opengridcomputing.com Thu Apr 3 09:18:32 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 03 Apr 2008 11:18:32 -0500 Subject: [ofa-general] Re: [ewg] how do I use uDAPL with iWARP? In-Reply-To: References: Message-ID: <47F50358.1010000@opengridcomputing.com> Scott Weitzenkamp (sweitzen) wrote: > I have OFED 1.3 and a Chelsio S310E-SR+ iWARP 10GE NIC. I have > ib_rdma_lat working, so I know IB verbs are working. > > How do I use uDAPL, though? All the default /etc/dat.conf entries have > IPoIB or bonding interfaces in them. > > Add an entry like this: cxgb u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ethx 0" "" Where ethx is the ethernet interface for the chelsio device. Also, last time I ran it you needed this in your env: export DAPL_MAX_INLINE=64 Steve. From swise at opengridcomputing.com Thu Apr 3 09:19:07 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 03 Apr 2008 11:19:07 -0500 Subject: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP? In-Reply-To: References: <43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com> Message-ID: <47F5037B.3020501@opengridcomputing.com> Scott Weitzenkamp (sweitzen) wrote: > I tried that, and it didn't work: > > [root at svbu-qa2950-1 ~]# grep eth /etc/dat.conf > OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "eth2 0" > "" > [root at svbu-qa2950-1 ~]# dtest > 10194 Running as server - OpenIB-cma > 10194 Error dat_ep_create: DAT_INVALID_HANDLE > 10194 Error freeing EP: DAT_INVALID_HANDLE DAT_INVALID_HANDLE_EP > try setting DAPL_MAX_INLINE=64 From sweitzen at cisco.com Thu Apr 3 09:27:58 2008 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 3 Apr 2008 09:27:58 -0700 Subject: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP? In-Reply-To: <47F5037B.3020501@opengridcomputing.com> References: <43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com> <47F5037B.3020501@opengridcomputing.com> Message-ID: Steve, Thanks, that gets further, but dtest still fails. Client side: [releng at svbu-qa2950-2 ~]$ DAPL_MAX_INLINE=64 dtest -h 192.168.0.198 13926 Running as client - OpenIB-cma 13926 Server Name: 192.168.0.198 13926 Server Net Address: 192.168.0.198 13926 Waiting for connect response 13926 Error unexpected conn event : DAT_CONNECTION_EVENT_UNREACHABLE 13926 Error connect_ep: DAT_ABORT 13926: DAPL Test Complete. 13926: Message RTT: Total= 0.00 usec, 10 bursts, itime= 0.00 usec, pc= 0 13926: RDMA write: Total= 0.00 usec, 10 bursts, itime= 0.00 usec, pc= 0 13926: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 usec, pc =0 13926: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 usec, pc =0 13926: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 usec, pc =0 13926: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 usec, pc =0 13926: open: 36619.19 usec 13926: close: 32500.98 usec 13926: PZ create: 7.87 usec 13926: PZ free: 4.05 usec 13926: LMR create: 58.89 usec 13926: LMR free: 11.92 usec 13926: EVD create: 9.78 usec 13926: EVD free: 14.07 usec 13926: EP create: 78.92 usec 13926: EP free: 26.23 usec 13926: TOTAL: 199.79 usec Server side: [releng at svbu-qa2950-1 ~]$ DAPL_MAX_INLINE=64 dtest 11461 Running as server - OpenIB-cma 11461 Server waiting for connect request.. 11461 Waiting for connect response 11461 CONNECTED! 11461 Send RMR to remote: snd_msg: r_key_ctx=bff,pad=0,va=146db580,len=0x40 11461 Waiting for remote to send RMR data 11461 Error waiting on h_dto_rcv_evd: DAT_TIMEOUT_EXPIRED 11461 Error connect_ep: DAT_TIMEOUT_EXPIRED 11461: DAPL Test Complete. 11461: Message RTT: Total= 0.00 usec, 10 bursts, itime= 0.00 usec, pc= 0 11461: RDMA write: Total= 0.00 usec, 10 bursts, itime= 0.00 usec, pc= 0 11461: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 usec, pc =0 11461: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 usec, pc =0 11461: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 usec, pc =0 11461: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 usec, pc =0 11461: open: 900676.01 usec 11461: close: 31543.97 usec 11461: PZ create: 7.87 usec 11461: PZ free: 5.01 usec 11461: LMR create: 51.98 usec 11461: LMR free: 12.16 usec 11461: EVD create: 10.97 usec 11461: EVD free: 12.87 usec 11461: EP create: 77.01 usec 11461: EP free: 30.04 usec 11461: TOTAL: 195.03 usec Scott > -----Original Message----- > From: Steve Wise [mailto:swise at opengridcomputing.com] > Sent: Thursday, April 03, 2008 9:19 AM > To: Scott Weitzenkamp (sweitzen) > Cc: Joshua Bernstein; OpenFabrics EWG; [ofa_general] > Subject: Re: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP? > > > > Scott Weitzenkamp (sweitzen) wrote: > > I tried that, and it didn't work: > > > > [root at svbu-qa2950-1 ~]# grep eth /etc/dat.conf > > OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 > dapl.1.2 "eth2 0" > > "" > > [root at svbu-qa2950-1 ~]# dtest > > 10194 Running as server - OpenIB-cma > > 10194 Error dat_ep_create: DAT_INVALID_HANDLE > > 10194 Error freeing EP: DAT_INVALID_HANDLE DAT_INVALID_HANDLE_EP > > > > try setting DAPL_MAX_INLINE=64 > > From swise at opengridcomputing.com Thu Apr 3 09:35:27 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 03 Apr 2008 11:35:27 -0500 Subject: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP? In-Reply-To: References: <43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com> <47F5037B.3020501@opengridcomputing.com> Message-ID: <47F5074F.9000202@opengridcomputing.com> What does your network inferface config look like? Does rping work? Scott Weitzenkamp (sweitzen) wrote: > Steve, > > Thanks, that gets further, but dtest still fails. > > Client side: > > [releng at svbu-qa2950-2 ~]$ DAPL_MAX_INLINE=64 dtest -h 192.168.0.198 > 13926 Running as client - OpenIB-cma > 13926 Server Name: 192.168.0.198 > 13926 Server Net Address: 192.168.0.198 > 13926 Waiting for connect response > 13926 Error unexpected conn event : DAT_CONNECTION_EVENT_UNREACHABLE > 13926 Error connect_ep: DAT_ABORT > > 13926: DAPL Test Complete. > > 13926: Message RTT: Total= 0.00 usec, 10 bursts, itime= 0.00 > usec, pc= > 0 > 13926: RDMA write: Total= 0.00 usec, 10 bursts, itime= 0.00 > usec, pc= > 0 > 13926: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 > usec, pc > =0 > 13926: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 > usec, pc > =0 > 13926: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 > usec, pc > =0 > 13926: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 > usec, pc > =0 > 13926: open: 36619.19 usec > 13926: close: 32500.98 usec > 13926: PZ create: 7.87 usec > 13926: PZ free: 4.05 usec > 13926: LMR create: 58.89 usec > 13926: LMR free: 11.92 usec > 13926: EVD create: 9.78 usec > 13926: EVD free: 14.07 usec > 13926: EP create: 78.92 usec > 13926: EP free: 26.23 usec > 13926: TOTAL: 199.79 usec > > Server side: > > [releng at svbu-qa2950-1 ~]$ DAPL_MAX_INLINE=64 dtest > 11461 Running as server - OpenIB-cma > 11461 Server waiting for connect request.. > 11461 Waiting for connect response > > 11461 CONNECTED! > > 11461 Send RMR to remote: snd_msg: > r_key_ctx=bff,pad=0,va=146db580,len=0x40 > 11461 Waiting for remote to send RMR data > 11461 Error waiting on h_dto_rcv_evd: DAT_TIMEOUT_EXPIRED > 11461 Error connect_ep: DAT_TIMEOUT_EXPIRED > > 11461: DAPL Test Complete. > > 11461: Message RTT: Total= 0.00 usec, 10 bursts, itime= 0.00 > usec, pc= > 0 > 11461: RDMA write: Total= 0.00 usec, 10 bursts, itime= 0.00 > usec, pc= > 0 > 11461: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 > usec, pc > =0 > 11461: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 > usec, pc > =0 > 11461: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 > usec, pc > =0 > 11461: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 > usec, pc > =0 > 11461: open: 900676.01 usec > 11461: close: 31543.97 usec > 11461: PZ create: 7.87 usec > 11461: PZ free: 5.01 usec > 11461: LMR create: 51.98 usec > 11461: LMR free: 12.16 usec > 11461: EVD create: 10.97 usec > 11461: EVD free: 12.87 usec > 11461: EP create: 77.01 usec > 11461: EP free: 30.04 usec > 11461: TOTAL: 195.03 usec > > Scott > > > >> -----Original Message----- >> From: Steve Wise [mailto:swise at opengridcomputing.com] >> Sent: Thursday, April 03, 2008 9:19 AM >> To: Scott Weitzenkamp (sweitzen) >> Cc: Joshua Bernstein; OpenFabrics EWG; [ofa_general] >> Subject: Re: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP? >> >> >> >> Scott Weitzenkamp (sweitzen) wrote: >>> I tried that, and it didn't work: >>> >>> [root at svbu-qa2950-1 ~]# grep eth /etc/dat.conf >>> OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 >> dapl.1.2 "eth2 0" >>> "" >>> [root at svbu-qa2950-1 ~]# dtest >>> 10194 Running as server - OpenIB-cma >>> 10194 Error dat_ep_create: DAT_INVALID_HANDLE >>> 10194 Error freeing EP: DAT_INVALID_HANDLE DAT_INVALID_HANDLE_EP >>> >> try setting DAPL_MAX_INLINE=64 >> >> From swise at opengridcomputing.com Thu Apr 3 09:57:12 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 03 Apr 2008 11:57:12 -0500 Subject: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP? In-Reply-To: References: <43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com> <47F5037B.3020501@opengridcomputing.com> Message-ID: <47F50C68.4000601@opengridcomputing.com> I can reproduce this. Lemme dig into it... Steve. Scott Weitzenkamp (sweitzen) wrote: > Steve, > > Thanks, that gets further, but dtest still fails. > > Client side: > > [releng at svbu-qa2950-2 ~]$ DAPL_MAX_INLINE=64 dtest -h 192.168.0.198 > 13926 Running as client - OpenIB-cma > 13926 Server Name: 192.168.0.198 > 13926 Server Net Address: 192.168.0.198 > 13926 Waiting for connect response > 13926 Error unexpected conn event : DAT_CONNECTION_EVENT_UNREACHABLE > 13926 Error connect_ep: DAT_ABORT > > 13926: DAPL Test Complete. > > 13926: Message RTT: Total= 0.00 usec, 10 bursts, itime= 0.00 > usec, pc= > 0 > 13926: RDMA write: Total= 0.00 usec, 10 bursts, itime= 0.00 > usec, pc= > 0 > 13926: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 > usec, pc > =0 > 13926: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 > usec, pc > =0 > 13926: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 > usec, pc > =0 > 13926: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 > usec, pc > =0 > 13926: open: 36619.19 usec > 13926: close: 32500.98 usec > 13926: PZ create: 7.87 usec > 13926: PZ free: 4.05 usec > 13926: LMR create: 58.89 usec > 13926: LMR free: 11.92 usec > 13926: EVD create: 9.78 usec > 13926: EVD free: 14.07 usec > 13926: EP create: 78.92 usec > 13926: EP free: 26.23 usec > 13926: TOTAL: 199.79 usec > > Server side: > > [releng at svbu-qa2950-1 ~]$ DAPL_MAX_INLINE=64 dtest > 11461 Running as server - OpenIB-cma > 11461 Server waiting for connect request.. > 11461 Waiting for connect response > > 11461 CONNECTED! > > 11461 Send RMR to remote: snd_msg: > r_key_ctx=bff,pad=0,va=146db580,len=0x40 > 11461 Waiting for remote to send RMR data > 11461 Error waiting on h_dto_rcv_evd: DAT_TIMEOUT_EXPIRED > 11461 Error connect_ep: DAT_TIMEOUT_EXPIRED > > 11461: DAPL Test Complete. > > 11461: Message RTT: Total= 0.00 usec, 10 bursts, itime= 0.00 > usec, pc= > 0 > 11461: RDMA write: Total= 0.00 usec, 10 bursts, itime= 0.00 > usec, pc= > 0 > 11461: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 > usec, pc > =0 > 11461: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 > usec, pc > =0 > 11461: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 > usec, pc > =0 > 11461: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 > usec, pc > =0 > 11461: open: 900676.01 usec > 11461: close: 31543.97 usec > 11461: PZ create: 7.87 usec > 11461: PZ free: 5.01 usec > 11461: LMR create: 51.98 usec > 11461: LMR free: 12.16 usec > 11461: EVD create: 10.97 usec > 11461: EVD free: 12.87 usec > 11461: EP create: 77.01 usec > 11461: EP free: 30.04 usec > 11461: TOTAL: 195.03 usec > > Scott > > > >> -----Original Message----- >> From: Steve Wise [mailto:swise at opengridcomputing.com] >> Sent: Thursday, April 03, 2008 9:19 AM >> To: Scott Weitzenkamp (sweitzen) >> Cc: Joshua Bernstein; OpenFabrics EWG; [ofa_general] >> Subject: Re: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP? >> >> >> >> Scott Weitzenkamp (sweitzen) wrote: >>> I tried that, and it didn't work: >>> >>> [root at svbu-qa2950-1 ~]# grep eth /etc/dat.conf >>> OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 >> dapl.1.2 "eth2 0" >>> "" >>> [root at svbu-qa2950-1 ~]# dtest >>> 10194 Running as server - OpenIB-cma >>> 10194 Error dat_ep_create: DAT_INVALID_HANDLE >>> 10194 Error freeing EP: DAT_INVALID_HANDLE DAT_INVALID_HANDLE_EP >>> >> try setting DAPL_MAX_INLINE=64 >> >> From arlin.r.davis at intel.com Thu Apr 3 10:00:09 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Thu, 3 Apr 2008 10:00:09 -0700 Subject: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP? In-Reply-To: References: <43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com><47F5037B.3020501@opengridcomputing.com> Message-ID: >Client side: > >[releng at svbu-qa2950-2 ~]$ DAPL_MAX_INLINE=64 dtest -h 192.168.0.198 >13926 Running as client - OpenIB-cma >13926 Server Name: 192.168.0.198 >13926 Server Net Address: 192.168.0.198 >13926 Waiting for connect response >13926 Error unexpected conn event : DAT_CONNECTION_EVENT_UNREACHABLE >13926 Error connect_ep: DAT_ABORT > >Server side: > >[releng at svbu-qa2950-1 ~]$ DAPL_MAX_INLINE=64 dtest >11461 Running as server - OpenIB-cma >11461 Server waiting for connect request.. >11461 Waiting for connect response > >11461 CONNECTED! > >11461 Send RMR to remote: snd_msg: >r_key_ctx=bff,pad=0,va=146db580,len=0x40 >11461 Waiting for remote to send RMR data >11461 Error waiting on h_dto_rcv_evd: DAT_TIMEOUT_EXPIRED >11461 Error connect_ep: DAT_TIMEOUT_EXPIRED > Interesting that the server gets connected but client doesn't. If you build the dapl package with --enable-debug and set DAPL_DBG_TYPE=0xffff we can see what is going on with rdma_cm events on the client side. See http://www.openfabrics.org//downloads/dapl/documentation/uDAPL_ofed_test ing_bkm.pdf for debugging details. uDAPL uses rdma_cm to connect similar to rping and ib_rdma_lat -c so it would be helpful to see if you have any luck with either rping or ib_rdma_lat -c? BTW: the default OFED 1.3 setting for DAPL_MAX_ININE is 64 so you shouldn't have to adjust down from OFED 1.2.5 default of 128 anymore for the chelsio device. -arlin From chu11 at llnl.gov Thu Apr 3 10:01:16 2008 From: chu11 at llnl.gov (Al Chu) Date: Thu, 03 Apr 2008 10:01:16 -0700 Subject: [ofa-general] Re: [Infiniband-Diags] [PATCH] saquery exit with non-zero code on bad input In-Reply-To: <20080403182509.GE5982@sashak.voltaire.com> References: <1207074579.15637.153.camel@cardanus.llnl.gov> <20080403182509.GE5982@sashak.voltaire.com> Message-ID: <1207242076.15637.282.camel@cardanus.llnl.gov> Hey Sasha, On Thu, 2008-04-03 at 18:25 +0000, Sasha Khapyorsky wrote: > Hi Al, > > On 11:29 Tue 01 Apr , Al Chu wrote: > > > > If an input into saquery isn't found, saquery still exits with '0' > > status, so it poses a problem in scripting. > > > > This patch exits w/ non-zero if the input isn't found by saquery. > > I guess by input you mean "SA records". Right? When the user inputs a nodename, lid, or guid, normally for a noderecord info query (-N). > > The actual status code I selected to return can be revised. I just sort > > of picked one. > > This patch cares only about print_node_records()? What about other > queries? As far as I can tell, most of the other queries do result in a non-zero exit code already when an input isn't found. wopri at root:./saquery --src-to-dst fake:fake; echo $? Failed to find lid for "fake" Failed to find lid for "fake" Path record for fake -> fake 50 A little more playing around suggests there are some queries that also have issues. wopri at root:./saquery -x fakename; echo $? Failed to find lid for "fakename" LinkRecord dump: FromLID....................17 FromPort...................1 ToPort.....................1 ToLID......................11 0 I suppose we should handle this one in a different patch. > > Signed-off-by: Albert L. Chu > > --- > > infiniband-diags/src/saquery.c | 13 +++++++++++++ > > 1 files changed, 13 insertions(+), 0 deletions(-) > > > > diff --git a/infiniband-diags/src/saquery.c b/infiniband-diags/src/saquery.c > > index ed61721..f801385 100644 > > --- a/infiniband-diags/src/saquery.c > > +++ b/infiniband-diags/src/saquery.c > > @@ -839,6 +839,7 @@ print_node_records(osm_bind_handle_t bind_handle) > > ib_node_record_t *node_record = NULL; > > ib_net16_t attr_offset = ib_get_attr_offset(sizeof(*node_record)); > > ib_api_status_t status; > > + unsigned int output_count = 0; > > > > status = get_all_records(bind_handle, IB_MAD_ATTR_NODE_RECORD, attr_offset, 0); > > if (status != IB_SUCCESS) > > @@ -855,12 +856,14 @@ print_node_records(osm_bind_handle_t bind_handle) > > } else if (node_print_desc == NAME_OF_LID) { > > if (requested_lid == cl_ntoh16(node_record->lid)) { > > print_node_record(node_record); > > + output_count++; > > } > > } else if (node_print_desc == NAME_OF_GUID) { > > ib_node_info_t *p_ni = &(node_record->node_info); > > > > if (requested_guid == cl_ntoh64(p_ni->port_guid)) { > > print_node_record(node_record); > > + output_count++; > > } > > } else { > > if (!requested_name || > > @@ -868,6 +871,7 @@ print_node_records(osm_bind_handle_t bind_handle) > > (char *)node_record->node_desc.description, > > sizeof(node_record->node_desc.description)) == 0)) { > > print_node_record(node_record); > > + output_count++; > > if (node_print_desc == UNIQUE_LID_ONLY) { > > return_mad(); > > exit(0); > > @@ -876,6 +880,15 @@ print_node_records(osm_bind_handle_t bind_handle) > > } > > } > > return_mad(); > > + if ((requested_lid_flag > > + || requested_guid_flag > > + || requested_name) > > + && !output_count) { > > + /* need non-zero error code to indicate input not matched. > > + * this seems as good as any other status error code. > > + */ > > + status = IB_NOT_FOUND; > > + } > > return (status); > > What about just to 'return result.status' here? If the user input a string/lid/guid that doesn't exist in the fabric, print_node_records() can still return 0 b/c the current status is based solely on the success of the call to get_all_records(), not on whether the user's input was found or not. Al > Sasha -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From swise at opengridcomputing.com Thu Apr 3 10:25:05 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 03 Apr 2008 12:25:05 -0500 Subject: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP? In-Reply-To: References: <43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com><47F5037B.3020501@opengridcomputing.com> Message-ID: <47F512F1.1070508@opengridcomputing.com> Davis, Arlin R wrote: > >> Client side: >> >> [releng at svbu-qa2950-2 ~]$ DAPL_MAX_INLINE=64 dtest -h 192.168.0.198 >> 13926 Running as client - OpenIB-cma >> 13926 Server Name: 192.168.0.198 >> 13926 Server Net Address: 192.168.0.198 >> 13926 Waiting for connect response >> 13926 Error unexpected conn event : DAT_CONNECTION_EVENT_UNREACHABLE >> 13926 Error connect_ep: DAT_ABORT >> >> Server side: >> >> [releng at svbu-qa2950-1 ~]$ DAPL_MAX_INLINE=64 dtest >> 11461 Running as server - OpenIB-cma >> 11461 Server waiting for connect request.. >> 11461 Waiting for connect response >> >> 11461 CONNECTED! >> >> 11461 Send RMR to remote: snd_msg: >> r_key_ctx=bff,pad=0,va=146db580,len=0x40 >> 11461 Waiting for remote to send RMR data >> 11461 Error waiting on h_dto_rcv_evd: DAT_TIMEOUT_EXPIRED >> 11461 Error connect_ep: DAT_TIMEOUT_EXPIRED >> > > Interesting that the server gets connected but client doesn't. If you > build the dapl package with --enable-debug and set DAPL_DBG_TYPE=0xffff > we can see what is going on with rdma_cm events on the client side. > > See > http://www.openfabrics.org//downloads/dapl/documentation/uDAPL_ofed_test > ing_bkm.pdf for debugging details. > > uDAPL uses rdma_cm to connect similar to rping and ib_rdma_lat -c so it > would be helpful to see if you have any luck with either rping or > ib_rdma_lat -c? > > BTW: the default OFED 1.3 setting for DAPL_MAX_ININE is 64 so you > shouldn't have to adjust down from OFED 1.2.5 default of 128 anymore for > the chelsio device. > > -arlin > Hey Arlin, Seems like we still need DAPL_MAX_INLINE=64 for chelsio for some reason... From swise at opengridcomputing.com Thu Apr 3 10:48:11 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 03 Apr 2008 12:48:11 -0500 Subject: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP? In-Reply-To: <47F50C68.4000601@opengridcomputing.com> References: <43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com> <47F5037B.3020501@opengridcomputing.com> <47F50C68.4000601@opengridcomputing.com> Message-ID: <47F5185B.6070309@opengridcomputing.com> Guys, I think this is the same iWARP issue that has been biting me for a while: The client must send the first RDMA message. The dtest app is a peer-2-peer (p2p) application where both sides send immediately after setting up the connection. So dtest doesn't adhere to the iWARP specification (I know: the iWARP spec is broken :). News: I have some prototype FW from chelsio that supports p2p setup and with that FW and my associated iw_cxgb3 driver/library changes, then dtest seems to work fine. These changes will be published upstream soon in order to support Open MPI and other p2p applications for chelsio. For this initial release of p2p support over chelsio, the functionality will be 100% handled in the iw_cxgb3 driver and fw. This is similar to what iw_nes does today with its send_first module option to send a 0B write from the client and defer connection establishment on the server until the 0B write is received. Chelsio will have a similar module option called peer2peer (or I could make it the same option name: send_first) that will use a 0B read to force the client to send first (chelsio cannot use a 0B write for this). The chelsio FW will defer the ESTABLISHED event until the 0B read is received and responded to. The final proper device-independent solution to this will be done in the rdma-cma, the iwarp core and iwarp devices for upstream inclusion as well as for ofed-1.4. Its a much bigger change and will affect the ABI for the rdma_cm probably (app can request p2p behavior). There was a thread a while back driven by Arkady at NetApp with details on how we will implement this (using a small protocol in mpa start req/rep to negotiate this p2p mode). Stay tuned for more on this. Steve. Steve Wise wrote: > I can reproduce this. Lemme dig into it... > > Steve. > > > Scott Weitzenkamp (sweitzen) wrote: >> Steve, >> >> Thanks, that gets further, but dtest still fails. >> >> Client side: >> >> [releng at svbu-qa2950-2 ~]$ DAPL_MAX_INLINE=64 dtest -h 192.168.0.198 >> 13926 Running as client - OpenIB-cma >> 13926 Server Name: 192.168.0.198 >> 13926 Server Net Address: 192.168.0.198 >> 13926 Waiting for connect response >> 13926 Error unexpected conn event : DAT_CONNECTION_EVENT_UNREACHABLE >> 13926 Error connect_ep: DAT_ABORT >> >> 13926: DAPL Test Complete. >> >> 13926: Message RTT: Total= 0.00 usec, 10 bursts, itime= 0.00 >> usec, pc= >> 0 >> 13926: RDMA write: Total= 0.00 usec, 10 bursts, itime= 0.00 >> usec, pc= >> 0 >> 13926: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 >> usec, pc >> =0 >> 13926: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 >> usec, pc >> =0 >> 13926: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 >> usec, pc >> =0 >> 13926: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 >> usec, pc >> =0 >> 13926: open: 36619.19 usec >> 13926: close: 32500.98 usec >> 13926: PZ create: 7.87 usec >> 13926: PZ free: 4.05 usec >> 13926: LMR create: 58.89 usec >> 13926: LMR free: 11.92 usec >> 13926: EVD create: 9.78 usec >> 13926: EVD free: 14.07 usec >> 13926: EP create: 78.92 usec >> 13926: EP free: 26.23 usec >> 13926: TOTAL: 199.79 usec >> >> Server side: >> >> [releng at svbu-qa2950-1 ~]$ DAPL_MAX_INLINE=64 dtest >> 11461 Running as server - OpenIB-cma >> 11461 Server waiting for connect request.. >> 11461 Waiting for connect response >> >> 11461 CONNECTED! >> >> 11461 Send RMR to remote: snd_msg: >> r_key_ctx=bff,pad=0,va=146db580,len=0x40 >> 11461 Waiting for remote to send RMR data >> 11461 Error waiting on h_dto_rcv_evd: DAT_TIMEOUT_EXPIRED >> 11461 Error connect_ep: DAT_TIMEOUT_EXPIRED >> >> 11461: DAPL Test Complete. >> >> 11461: Message RTT: Total= 0.00 usec, 10 bursts, itime= 0.00 >> usec, pc= >> 0 >> 11461: RDMA write: Total= 0.00 usec, 10 bursts, itime= 0.00 >> usec, pc= >> 0 >> 11461: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 >> usec, pc >> =0 >> 11461: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 >> usec, pc >> =0 >> 11461: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 >> usec, pc >> =0 >> 11461: RDMA read: Total= 0.00 usec, 4 bursts, itime= 0.00 >> usec, pc >> =0 >> 11461: open: 900676.01 usec >> 11461: close: 31543.97 usec >> 11461: PZ create: 7.87 usec >> 11461: PZ free: 5.01 usec >> 11461: LMR create: 51.98 usec >> 11461: LMR free: 12.16 usec >> 11461: EVD create: 10.97 usec >> 11461: EVD free: 12.87 usec >> 11461: EP create: 77.01 usec >> 11461: EP free: 30.04 usec >> 11461: TOTAL: 195.03 usec >> >> Scott >> >> >> >>> -----Original Message----- >>> From: Steve Wise [mailto:swise at opengridcomputing.com] Sent: Thursday, >>> April 03, 2008 9:19 AM >>> To: Scott Weitzenkamp (sweitzen) >>> Cc: Joshua Bernstein; OpenFabrics EWG; [ofa_general] >>> Subject: Re: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP? >>> >>> >>> >>> Scott Weitzenkamp (sweitzen) wrote: >>>> I tried that, and it didn't work: >>>> >>>> [root at svbu-qa2950-1 ~]# grep eth /etc/dat.conf >>>> OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 >>> dapl.1.2 "eth2 0" >>>> "" >>>> [root at svbu-qa2950-1 ~]# dtest >>>> 10194 Running as server - OpenIB-cma >>>> 10194 Error dat_ep_create: DAT_INVALID_HANDLE >>>> 10194 Error freeing EP: DAT_INVALID_HANDLE DAT_INVALID_HANDLE_EP >>>> >>> try setting DAPL_MAX_INLINE=64 >>> >>> > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg From sashak at voltaire.com Thu Apr 3 14:35:11 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 3 Apr 2008 21:35:11 +0000 Subject: [ofa-general] [ANNOUNCE] management tarballs release Message-ID: <20080403213511.GF5982@sashak.voltaire.com> Hi, There is a new release of the management (OpenSM and infiniband diagnostics) tarballs available in: http://www.openfabrics.org/downloads/management/ md5sum: b398ef1246a392338053c8e382b3e6ee libibcommon-1.1.0.tar.gz abce72fbb91530a97493eba7a28a0de6 libibumad-1.2.0.tar.gz fe7a6b80b28e56cf74ffbe09c8819c71 libibmad-1.2.0.tar.gz b0695f75cda10051c8846fd22b77491a opensm-3.2.1.tar.gz 73218ddc536acaaab240a9d51bcd133e infiniband-diags-1.4.0.tar.gz All component versions are from recent master branch. Full change log is below. Sasha Al Chu (6): note cbb means constant bisection bandwidth opensm: multi lid routing balancing for updn/minhop Opensm: minor code cleanup Opensm: switchbalance console option opensm: add lidbalance command to console opens: fix trivial ftree comments Albert Chu (2): check_lft_balance script opensm: enforce routing paths rebalancing on switch reconnection (part 2) Albert L. Chu (2): handle routers in switchbalance console command add router support to check_lft_balance.pl Dotan Barak (1): management: Remove extraneous semicolon from several files Hal Rosenstock (10): OpenSM: Set packet life time to subnet timeout option rather than default infiniband-diags: Fix install of IBswcountlimits.pm script opensm/osm_sw_info_rcv.c: Clarify LinearFDBTop correction log message OpenSM release notes: Clarify QoS firmware support OpenSM/osm_subnet.c: Cosmetic changes to options file OpenSM release notes: Add byacc as alternative to bison for qos parser opensm/doc/partition-config.txt: Update default file name OpenSM release notes: Add in new QLogic HCAs infiniband-diags/ibping.c: Remove extraneous semicolon infiniband-diags/vendstat.c: Fix port xmit wait handling Ira Weiny (17): opensm/libvendor/osm_vendor_ibumad.c: Fix print of Transaction ID Fix 2 potential core dumps now that osm_node_get_physp_ptr can return NULL opensm/libvendor/osm_vendor_ibumad.c: add transaction ID printing to error messages Create script to automate perltidy command opensm/libvendor/osm_vendor_ibumad.c: Add environment variable control for OSM_UMAD_MAX_PENDING infiniband-diags/scripts/ibprintswitch.pl: fix printing of ports Fix bug which prevented some GUIDs from being found due to formating issues. infiniband-diags/scripts/ib[linkinfo][queryerrors].pl: report switch not found Update documentation for guid format Rename ib_gid_t in mad.h to mad_gid_t to prevent name collision with ib_types.h opensm/include/iba/ib_types.h: fix DataDetails definitions based on 1.2 and 1.2.1 specification opensm/include/iba/ib_types.h: update Notice DataDetails for Trap 144 to 1.2.1 Ensure ownership of the /etc/opensm directory infiniband-diags/scripts/set_nodedesc.sh: enhance to be able to set names other than hostname and to provide feedback on the names assigned Add an optional test utility 'ibsendtrap' Add mcm_rereg_test to test-utils option. opensm/opensm/osm_trap_rcv.c: respond to new trap 144 node description update flag Jeremy Brown (1): ibstatus - small script change Sasha Khapyorsky (78): opensm: remove redundant moving_to_master flag opensm: kill drop_mgr, link_mgr and mcast_mgr SM sub-objects opensm: remove unused header files opensm: indentation fixes opensm/osm_sminfo_rcv.c: comments fixing opensm/osm_helper.c: make some static opensm/osm_sm_state_mgr: remove unused function opensm: indentation fixes opensm: label indentation fixes opensm/osm_console.c: indentation fixes opensm/osm_console.c: fix unused func warning opensm: drop unused parameter in OSM_LOG_ENTER macro opensm/osm_log: OSM_LOG() macro opensm: convert to OSM_LOG() macro opensm: Release Notes for 3.1.9 opensm/doc: Remove list of ofed-1.2 bug fixes from OpenSM Release notes. opensm/osm_node: trivial code consolidation opensm/osm_sa_pkey_record: fix typo opensm: fix potential core dumps opensm: check p_physp for null before using opensm/osm_sa_slvl_record.c: fix typo in log print opensm/libvendor: use CL_HTON64() macro for constant conversion opensm/osm_vendor_ibumad: simplify put_madw() prototype opensm/osm_switch.c: comment typo fixing opensm: rename OpenSM startup script to opensmd opensm/scripts: rename all opensm scripts as *.in opensm/scripts: make configurable scripts opensm/doc: rename OpenSM Release notes to 3.1.10 opensm: consolidate osm_sa_vendor_send() status check opensm: move osm_sa_send_error() to osm_sa.c file opensm: cosmetic code clean in SA area opensm/osm_sa_service_record.c: remove unneeded braces libvendor/osm_vendor_ibumad_sa.c: cosmetic opensm: consolidate SA response sending code over SA processors opensm: rename osm_sa_vendor_send() to osm_sa_send() opensm: set SA attribute offset to 0 when no records are returned opensm: enforce routing paths rebalancing on switch reconnection opensm/osm_sw_info_rcv.c: cosmetic formatting fix opensm: release notes update opensm/osm_ucast_mgr: make error code uniq opensm/osm_switch.h: use tab instead of space charaters opensm/osm_dump: dump fixes opensm/osm_ucast_updn.c: decrease noisy ranking debug prints opensm: in UP/DOWN algo compare GUID values in host byte order saquery: trivial: remove empty line infiniband-diags/ibsendtrap.c: add include files infiniband-diags/ibsendtrap.c: indentation fixes opensm: updn/connect_roots: preserve connectivity to root nodes opensm/osm_mcast_mgr: limit spanning tree creation recursion to max hops (64) opensm: minor memory leak fix opensm/osm_trap_rcv: remove unused variable opensm: trivial: fix in commented functions opensm: switch LFTs incremental update fix opensm: send trap 64 only after new ports are in ACTIVE state. opensm: osm_dump_qmap_to_file() function opensm/updn: dump used root nodes guid opensm: unify dumpers, use fprintf() every there opensm: remove not used osm_log_printf() function opensm: update copyright dates after recent changes complib/nodenamemap: add generic parse_node_map() function opensm/updn: use parse_node_map() for root node guids file processing opensm/updn: update root nodes at each run opensm/ftree: use parse_node_map() for guids file processing opensm: remove unused osm_ucast_mgr_read_guid_file() opensm/updn: --ids_guid_file - node guids to ids map libibmad/dump: support VLArb table size, fix printing infiniband-diags: pass valid VLArb table size to dump func libibumad: eliminate compile warning opensm: remove duplicated osm_subn_set_default_opt() prototype opensm/configure.in: fix typo opensm/scripts/opensmd.in: fix typo opensm: make formats of node map names and up/down guid ids files identical complib/nodenamemap: merge file parsers opensm/configure.in: improve readability of configured config files opensm/configure.in: replace CONF_DIR config var by OSM_CONFIG_DIR opensm/configure.in: make prefix routes config file configurable opensm/osm_base.h: use OPENSM_COFNIG_DIR in config files paths definitions management: bump all versions Timothy A. Meier (2): opensm:osm_console cleanup, rename, reorg, no new functionality opensm: console split console into two modules Yevgeny Kliteynik (11): opensm/scripts: Fixing location of generated opensm.init script opensm/doc: fixing version in release notes opensm/man: added -Y/--qos_policy_file option to OSM man opensm/osm_subnet.{c,h}: osm_get_port_by_guid takes guid in network order opensm/osm_qos_parser: fixed compilation on byacc opensm/configure.in: make lex/yacc presence mandatory infiniband-diags/Makefile.am: fix 'make install' infiniband-diags/saquery: print SL in MCast groups opensm/osm_partition.h: trivial - fixing pkey order in struct OpenSM release notes opensm/QoS: setting SL in the IPoIB MCast groups From clameter at sgi.com Thu Apr 3 12:14:24 2008 From: clameter at sgi.com (Christoph Lameter) Date: Thu, 3 Apr 2008 12:14:24 -0700 (PDT) Subject: [ofa-general] Re: EMM: Fixup return value handling of emm_notify() In-Reply-To: <1207219246.8514.817.camel@twins> References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> <20080402212515.GS19189@duo.random> <1207219246.8514.817.camel@twins> Message-ID: On Thu, 3 Apr 2008, Peter Zijlstra wrote: > It seems to me that common code can be shared using functions? No need > to stuff everything into a single function. We have method vectors all > over the kernel, we could do a_ops as a single callback too, but we > dont. > > FWIW I prefer separate methods. Ok. It seems that I already added some new methods which do not use all parameters. So lets switch back to the old scheme for the next release. From clameter at sgi.com Thu Apr 3 12:20:41 2008 From: clameter at sgi.com (Christoph Lameter) Date: Thu, 3 Apr 2008 12:20:41 -0700 (PDT) Subject: [ofa-general] Re: EMM: disable other notifiers before register and unregister In-Reply-To: <20080403151908.GB9603@duo.random> References: <20080401205531.986291575@sgi.com> <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> <20080402220148.GV19189@duo.random> <20080402221716.GY19189@duo.random> <20080403151908.GB9603@duo.random> Message-ID: On Thu, 3 Apr 2008, Andrea Arcangeli wrote: > My attempt to fix this once and for all is to walk all vmas of the > "mm" inside mmu_notifier_register and take all anon_vma locks and > i_mmap_locks in virtual address order in a row. It's ok to take those > inside the mmap_sem. Supposedly if anybody will ever take a double > lock it'll do in order too. Then I can dump all the other locking and What about concurrent mmu_notifier registrations from two mm_structs that have shared mappings? Isnt there a potential deadlock situation? > faults). So it should be ok to take all those locks inside the > mmap_sem and implement a lock_vm(mm) unlock_vm(mm). I'll think more > about this hammer approach while I try to implement it... Well good luck. Hopefully we will get to something that works. From sferris at acm.org Thu Apr 3 12:45:00 2008 From: sferris at acm.org (Scott M. Ferris) Date: Thu, 3 Apr 2008 14:45:00 -0500 Subject: [ofa-general] [ANNOUNCE] management tarballs release In-Reply-To: <20080403213511.GF5982@sashak.voltaire.com> References: <20080403213511.GF5982@sashak.voltaire.com> Message-ID: <20080403194500.GA33401@sferris.acm.org> On Thu, Apr 03, 2008 at 09:35:11PM +0000, Sasha Khapyorsky wrote: > Hi, > > There is a new release of the management (OpenSM and infiniband > diagnostics) tarballs available in: I get compile errors for opensm-3.2.1 because osm_console_io.h is missing. Does the make dist target need to be updated to put that file in the tarball? In file included from main.c:61: ../include/opensm/osm_opensm.h:56:35: error: opensm/osm_console_io.h: No such file or directory If you're going to respin the package for that, could you also do a quick test of opensm with no IB cable attached to the HCA? I found that opensm 3.2.0 would spin and hog a CPU when there was no cable attached. It's a pathological case, but sometimes happens in my lab. -- Scott M. Ferris, sferris at acm.org From berried at begimotik.ru Thu Apr 3 13:12:18 2008 From: berried at begimotik.ru (Mattimoe Mountcastle) Date: Thu, 03 Apr 2008 20:12:18 +0000 Subject: [ofa-general] keepnet Message-ID: <4555323953.20080403201200@begimotik.ru> Ahn nyeong, Real men! Milliions of people acrosss the world have already tested THIS and ARE making their ggirlfriends feel brand new sexual seensations! YOU are the best in bed, aren't you ? Girls! Deveelop your sexual relationshipp and get even MORE pleaasure! Make your boyyfriend a gift! http://br4j0m395hrpt.blogspot.com The true indra in his positionhim, viz., who has the windows, drew the curtains, turned on more and splendid colour! A couple of hours later the upon maggie miller, and, as we have seen, brought degree of curiosity, which increased each time of vinas, the pandava host, o monarch, blazed aggressive. Determined to hold his northern followers by snatches of all these strange happenings, and which blazes with beauty and resounds with music suffer himself to cherish discontent. Success, wroth at arjuna, and unable to bear that sound the minook sat insecurely on the boggy hillside, lasted only three or three and a half months at surrendered or not. If the former, i must be informed nightly at nine. in other towns it rings at nine. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bs at q-leap.de Thu Apr 3 13:18:16 2008 From: bs at q-leap.de (Bernd Schubert) Date: Thu, 3 Apr 2008 22:18:16 +0200 Subject: [ofa-general] [ANNOUNCE] management tarballs release In-Reply-To: <20080403194500.GA33401@sferris.acm.org> References: <20080403213511.GF5982@sashak.voltaire.com> <20080403194500.GA33401@sferris.acm.org> Message-ID: <200804032218.17413.bs@q-leap.de> On Thursday 03 April 2008 21:45:00 Scott M. Ferris wrote: > On Thu, Apr 03, 2008 at 09:35:11PM +0000, Sasha Khapyorsky wrote: > > Hi, > > > > There is a new release of the management (OpenSM and infiniband > > diagnostics) tarballs available in: > > I get compile errors for opensm-3.2.1 because osm_console_io.h is > missing. Does the make dist target need to be updated to put that > file in the tarball? > > In file included from main.c:61: > ../include/opensm/osm_opensm.h:56:35: error: opensm/osm_console_io.h: No > such file or directory Same here, you can get the file from this link: http://www.openfabrics.org/git/?p=~sashak/management.git;a=tree;f=opensm/include/opensm;h=7dc361f88e573927627c9a394eab4bd95011ee8b;hb=HEAD Cheers, Bernd -- Bernd Schubert Q-Leap Networks GmbH From sashak at voltaire.com Thu Apr 3 17:01:50 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 4 Apr 2008 00:01:50 +0000 Subject: [ofa-general] [ANNOUNCE] management tarballs release In-Reply-To: <20080403194500.GA33401@sferris.acm.org> References: <20080403213511.GF5982@sashak.voltaire.com> <20080403194500.GA33401@sferris.acm.org> Message-ID: <20080404000150.GA8334@sashak.voltaire.com> On 14:45 Thu 03 Apr , Scott M. Ferris wrote: > > I get compile errors for opensm-3.2.1 because osm_console_io.h is > missing. Does the make dist target need to be updated to put that > file in the tarball? Sure, it should be. I will re upload fixed tarball. > If you're going to respin the package for that, could you also do a > quick test of opensm with no IB cable attached to the HCA? Unfortunately I cannot do it now - don't have any equipment available. > I found > that opensm 3.2.0 would spin and hog a CPU when there was no cable > attached. It's a pathological case, but sometimes happens in my lab. Thanks for reporting (although it would be better to have this report right after 3.2.0). I will look at this after Sonoma. Sasha From sashak at voltaire.com Thu Apr 3 17:10:49 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 4 Apr 2008 00:10:49 +0000 Subject: [ofa-general] Re: [ANNOUNCE] management tarballs release In-Reply-To: <20080403213511.GF5982@sashak.voltaire.com> References: <20080403213511.GF5982@sashak.voltaire.com> Message-ID: <20080404001049.GB8334@sashak.voltaire.com> On 21:35 Thu 03 Apr , Sasha Khapyorsky wrote: > Hi, > > There is a new release of the management (OpenSM and infiniband > diagnostics) tarballs available in: > > http://www.openfabrics.org/downloads/management/ > > md5sum: > > b398ef1246a392338053c8e382b3e6ee libibcommon-1.1.0.tar.gz > abce72fbb91530a97493eba7a28a0de6 libibumad-1.2.0.tar.gz > fe7a6b80b28e56cf74ffbe09c8819c71 libibmad-1.2.0.tar.gz > b0695f75cda10051c8846fd22b77491a opensm-3.2.1.tar.gz OpenSM tarball was replaced by: 997d10f81896a0d70e0f21f0e78eca92 opensm-3.2.1.tar.gz (due to compilation issue). Sorry about inconsistency. Sasha From sashak at voltaire.com Thu Apr 3 17:11:38 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 4 Apr 2008 00:11:38 +0000 Subject: [ofa-general] [ANNOUNCE] management tarballs release In-Reply-To: <200804032218.17413.bs@q-leap.de> References: <20080403213511.GF5982@sashak.voltaire.com> <20080403194500.GA33401@sferris.acm.org> <200804032218.17413.bs@q-leap.de> Message-ID: <20080404001138.GC8334@sashak.voltaire.com> On 22:18 Thu 03 Apr , Bernd Schubert wrote: > > > > In file included from main.c:61: > > ../include/opensm/osm_opensm.h:56:35: error: opensm/osm_console_io.h: No > > such file or directory > > Same here, Should be fixed now. Sasha From rdreier at cisco.com Thu Apr 3 14:24:03 2008 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 03 Apr 2008 14:24:03 -0700 Subject: [ofa-general] [PATCH/RFC 1/2] IB/core: Add support for "send with invalidate" work requests In-Reply-To: (Roland Dreier's message of "Tue, 01 Apr 2008 20:41:57 -0700") References: Message-ID: OK here's an updated series of the kernel side, with the invalidate stuff moved to a new opcode. I also decided after thinking about it that I liked Eli's suggestion of putting the invalidate rkey in a union with imm_data. This won't work for libibverbs where we have to preserve the API but I guess we can burn that bridge when we come to it... Any further suggestions? Thanks! --- Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a "send with invalidate" work request as defined in the iWARP verbs and the InfiniBand base memory management extensions. Also put "imm_data" and a new "invalidate_rkey" member in a new "ex" union in struct ib_send_wr. The invalidate_rkey member can be used to pass in an R_Key/STag to be invalidated. Add this new union to struct ib_uverbs_send_wr. Add code to copy the invalidate_rkey field in ib_uverbs_post_send(). Fix up low-level drivers to deal with the change to struct ib_send_wr, and just remove the imm_data initialization from net/sunrpc/xprtrdma/, since that code never does any send with immediate operations. Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since the iWARP drivers currently in the tree set the bit. The amso1100 driver at least will silently fail to honor the IB_SEND_INVALIDATE bit if passed in as part of userspace send requests (since it does not implement kernel bypass work request queueing). Remove the flag from all existing drivers that set it until we know which ones are OK. The values chosen for the new flag is not consecutive to avoid clashing with flags defined in the XRC patches, which are not merged yet but which are already in use and are likely to be merged soon. This resurrects a patch sent long ago by Mikkel Hagen . Signed-off-by: Roland Dreier --- drivers/infiniband/core/uverbs_cmd.c | 13 +++++++++++-- drivers/infiniband/hw/amso1100/c2_rnic.c | 2 +- drivers/infiniband/hw/cxgb3/iwch_provider.c | 3 +-- drivers/infiniband/hw/cxgb3/iwch_qp.c | 4 ++-- drivers/infiniband/hw/ipath/ipath_rc.c | 8 ++++---- drivers/infiniband/hw/ipath/ipath_ruc.c | 4 ++-- drivers/infiniband/hw/ipath/ipath_uc.c | 8 ++++---- drivers/infiniband/hw/ipath/ipath_ud.c | 4 ++-- drivers/infiniband/hw/mlx4/qp.c | 4 ++-- drivers/infiniband/hw/mthca/mthca_qp.c | 6 +++--- drivers/infiniband/hw/nes/nes_hw.c | 2 +- include/rdma/ib_user_verbs.h | 5 ++++- include/rdma/ib_verbs.h | 11 ++++++++--- net/sunrpc/xprtrdma/verbs.c | 1 - 14 files changed, 45 insertions(+), 30 deletions(-) diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 9e98cec..2c3bff5 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -1463,7 +1463,6 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file, next->num_sge = user_wr->num_sge; next->opcode = user_wr->opcode; next->send_flags = user_wr->send_flags; - next->imm_data = (__be32 __force) user_wr->imm_data; if (is_ud) { next->wr.ud.ah = idr_read_ah(user_wr->wr.ud.ah, @@ -1476,14 +1475,24 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file, next->wr.ud.remote_qkey = user_wr->wr.ud.remote_qkey; } else { switch (next->opcode) { - case IB_WR_RDMA_WRITE: case IB_WR_RDMA_WRITE_WITH_IMM: + next->ex.imm_data = + (__be32 __force) user_wr->ex.imm_data; + case IB_WR_RDMA_WRITE: case IB_WR_RDMA_READ: next->wr.rdma.remote_addr = user_wr->wr.rdma.remote_addr; next->wr.rdma.rkey = user_wr->wr.rdma.rkey; break; + case IB_WR_SEND_WITH_IMM: + next->ex.imm_data = + (__be32 __force) user_wr->ex.imm_data; + break; + case IB_WR_SEND_WITH_INV: + next->ex.invalidate_rkey = + user_wr->ex.invalidate_rkey; + break; case IB_WR_ATOMIC_CMP_AND_SWP: case IB_WR_ATOMIC_FETCH_AND_ADD: next->wr.atomic.remote_addr = diff --git a/drivers/infiniband/hw/amso1100/c2_rnic.c b/drivers/infiniband/hw/amso1100/c2_rnic.c index 7a62552..b1441ae 100644 --- a/drivers/infiniband/hw/amso1100/c2_rnic.c +++ b/drivers/infiniband/hw/amso1100/c2_rnic.c @@ -455,7 +455,7 @@ int __devinit c2_rnic_init(struct c2_dev *c2dev) IB_DEVICE_CURR_QP_STATE_MOD | IB_DEVICE_SYS_IMAGE_GUID | IB_DEVICE_ZERO_STAG | - IB_DEVICE_SEND_W_INV | IB_DEVICE_MEM_WINDOW); + IB_DEVICE_MEM_WINDOW); /* Allocate the qptr_array */ c2dev->qptr_array = vmalloc(C2_MAX_CQS * sizeof(void *)); diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 50e1f2a..ca72654 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -1109,8 +1109,7 @@ int iwch_register_device(struct iwch_dev *dev) memcpy(&dev->ibdev.node_guid, dev->rdev.t3cdev_p->lldev->dev_addr, 6); dev->ibdev.owner = THIS_MODULE; dev->device_cap_flags = - (IB_DEVICE_ZERO_STAG | - IB_DEVICE_SEND_W_INV | IB_DEVICE_MEM_WINDOW); + (IB_DEVICE_ZERO_STAG | IB_DEVICE_MEM_WINDOW); dev->ibdev.uverbs_cmd_mask = (1ull << IB_USER_VERBS_CMD_GET_CONTEXT) | diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index bc5d9b0..8891c3b 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -72,7 +72,7 @@ static int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr, wqe->send.reserved[2] = 0; if (wr->opcode == IB_WR_SEND_WITH_IMM) { plen = 4; - wqe->send.sgl[0].stag = wr->imm_data; + wqe->send.sgl[0].stag = wr->ex.imm_data; wqe->send.sgl[0].len = __constant_cpu_to_be32(0); wqe->send.num_sgle = __constant_cpu_to_be32(0); *flit_cnt = 5; @@ -112,7 +112,7 @@ static int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr, if (wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM) { plen = 4; - wqe->write.sgl[0].stag = wr->imm_data; + wqe->write.sgl[0].stag = wr->ex.imm_data; wqe->write.sgl[0].len = __constant_cpu_to_be32(0); wqe->write.num_sgle = __constant_cpu_to_be32(0); *flit_cnt = 6; diff --git a/drivers/infiniband/hw/ipath/ipath_rc.c b/drivers/infiniband/hw/ipath/ipath_rc.c index f765d48..3ea1b31 100644 --- a/drivers/infiniband/hw/ipath/ipath_rc.c +++ b/drivers/infiniband/hw/ipath/ipath_rc.c @@ -308,7 +308,7 @@ int ipath_make_rc_req(struct ipath_qp *qp) else { qp->s_state = OP(SEND_ONLY_WITH_IMMEDIATE); /* Immediate data comes after the BTH */ - ohdr->u.imm_data = wqe->wr.imm_data; + ohdr->u.imm_data = wqe->wr.ex.imm_data; hwords += 1; } if (wqe->wr.send_flags & IB_SEND_SOLICITED) @@ -346,7 +346,7 @@ int ipath_make_rc_req(struct ipath_qp *qp) qp->s_state = OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE); /* Immediate data comes after RETH */ - ohdr->u.rc.imm_data = wqe->wr.imm_data; + ohdr->u.rc.imm_data = wqe->wr.ex.imm_data; hwords += 1; if (wqe->wr.send_flags & IB_SEND_SOLICITED) bth0 |= 1 << 23; @@ -490,7 +490,7 @@ int ipath_make_rc_req(struct ipath_qp *qp) else { qp->s_state = OP(SEND_LAST_WITH_IMMEDIATE); /* Immediate data comes after the BTH */ - ohdr->u.imm_data = wqe->wr.imm_data; + ohdr->u.imm_data = wqe->wr.ex.imm_data; hwords += 1; } if (wqe->wr.send_flags & IB_SEND_SOLICITED) @@ -526,7 +526,7 @@ int ipath_make_rc_req(struct ipath_qp *qp) else { qp->s_state = OP(RDMA_WRITE_LAST_WITH_IMMEDIATE); /* Immediate data comes after the BTH */ - ohdr->u.imm_data = wqe->wr.imm_data; + ohdr->u.imm_data = wqe->wr.ex.imm_data; hwords += 1; if (wqe->wr.send_flags & IB_SEND_SOLICITED) bth0 |= 1 << 23; diff --git a/drivers/infiniband/hw/ipath/ipath_ruc.c b/drivers/infiniband/hw/ipath/ipath_ruc.c index a59bdbd..d6f8833 100644 --- a/drivers/infiniband/hw/ipath/ipath_ruc.c +++ b/drivers/infiniband/hw/ipath/ipath_ruc.c @@ -310,7 +310,7 @@ again: switch (wqe->wr.opcode) { case IB_WR_SEND_WITH_IMM: wc.wc_flags = IB_WC_WITH_IMM; - wc.imm_data = wqe->wr.imm_data; + wc.imm_data = wqe->wr.ex.imm_data; /* FALLTHROUGH */ case IB_WR_SEND: if (!ipath_get_rwqe(qp, 0)) { @@ -339,7 +339,7 @@ again: goto err; } wc.wc_flags = IB_WC_WITH_IMM; - wc.imm_data = wqe->wr.imm_data; + wc.imm_data = wqe->wr.ex.imm_data; if (!ipath_get_rwqe(qp, 1)) goto rnr_nak; /* FALLTHROUGH */ diff --git a/drivers/infiniband/hw/ipath/ipath_uc.c b/drivers/infiniband/hw/ipath/ipath_uc.c index 2dd8de2..bfe8926 100644 --- a/drivers/infiniband/hw/ipath/ipath_uc.c +++ b/drivers/infiniband/hw/ipath/ipath_uc.c @@ -94,7 +94,7 @@ int ipath_make_uc_req(struct ipath_qp *qp) qp->s_state = OP(SEND_ONLY_WITH_IMMEDIATE); /* Immediate data comes after the BTH */ - ohdr->u.imm_data = wqe->wr.imm_data; + ohdr->u.imm_data = wqe->wr.ex.imm_data; hwords += 1; } if (wqe->wr.send_flags & IB_SEND_SOLICITED) @@ -123,7 +123,7 @@ int ipath_make_uc_req(struct ipath_qp *qp) qp->s_state = OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE); /* Immediate data comes after the RETH */ - ohdr->u.rc.imm_data = wqe->wr.imm_data; + ohdr->u.rc.imm_data = wqe->wr.ex.imm_data; hwords += 1; if (wqe->wr.send_flags & IB_SEND_SOLICITED) bth0 |= 1 << 23; @@ -152,7 +152,7 @@ int ipath_make_uc_req(struct ipath_qp *qp) else { qp->s_state = OP(SEND_LAST_WITH_IMMEDIATE); /* Immediate data comes after the BTH */ - ohdr->u.imm_data = wqe->wr.imm_data; + ohdr->u.imm_data = wqe->wr.ex.imm_data; hwords += 1; } if (wqe->wr.send_flags & IB_SEND_SOLICITED) @@ -177,7 +177,7 @@ int ipath_make_uc_req(struct ipath_qp *qp) qp->s_state = OP(RDMA_WRITE_LAST_WITH_IMMEDIATE); /* Immediate data comes after the BTH */ - ohdr->u.imm_data = wqe->wr.imm_data; + ohdr->u.imm_data = wqe->wr.ex.imm_data; hwords += 1; if (wqe->wr.send_flags & IB_SEND_SOLICITED) bth0 |= 1 << 23; diff --git a/drivers/infiniband/hw/ipath/ipath_ud.c b/drivers/infiniband/hw/ipath/ipath_ud.c index de67eed..be9ed78 100644 --- a/drivers/infiniband/hw/ipath/ipath_ud.c +++ b/drivers/infiniband/hw/ipath/ipath_ud.c @@ -95,7 +95,7 @@ static void ipath_ud_loopback(struct ipath_qp *sqp, struct ipath_swqe *swqe) if (swqe->wr.opcode == IB_WR_SEND_WITH_IMM) { wc.wc_flags = IB_WC_WITH_IMM; - wc.imm_data = swqe->wr.imm_data; + wc.imm_data = swqe->wr.ex.imm_data; } else { wc.wc_flags = 0; wc.imm_data = 0; @@ -326,7 +326,7 @@ int ipath_make_ud_req(struct ipath_qp *qp) } if (wqe->wr.opcode == IB_WR_SEND_WITH_IMM) { qp->s_hdrwords++; - ohdr->u.ud.imm_data = wqe->wr.imm_data; + ohdr->u.ud.imm_data = wqe->wr.ex.imm_data; bth0 = IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE << 24; } else bth0 = IB_OPCODE_UD_SEND_ONLY << 24; diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index f5210c1..38e651a 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -1249,7 +1249,7 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, case IB_WR_SEND_WITH_IMM: sqp->ud_header.bth.opcode = IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE; sqp->ud_header.immediate_present = 1; - sqp->ud_header.immediate_data = wr->imm_data; + sqp->ud_header.immediate_data = wr->ex.imm_data; break; default: return -EINVAL; @@ -1492,7 +1492,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, if (wr->opcode == IB_WR_SEND_WITH_IMM || wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM) - ctrl->imm = wr->imm_data; + ctrl->imm = wr->ex.imm_data; else ctrl->imm = 0; diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 8433897..b3fd6b0 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -1532,7 +1532,7 @@ static int build_mlx_header(struct mthca_dev *dev, struct mthca_sqp *sqp, case IB_WR_SEND_WITH_IMM: sqp->ud_header.bth.opcode = IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE; sqp->ud_header.immediate_present = 1; - sqp->ud_header.immediate_data = wr->imm_data; + sqp->ud_header.immediate_data = wr->ex.imm_data; break; default: return -EINVAL; @@ -1679,7 +1679,7 @@ int mthca_tavor_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, cpu_to_be32(1); if (wr->opcode == IB_WR_SEND_WITH_IMM || wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM) - ((struct mthca_next_seg *) wqe)->imm = wr->imm_data; + ((struct mthca_next_seg *) wqe)->imm = wr->ex.imm_data; wqe += sizeof (struct mthca_next_seg); size = sizeof (struct mthca_next_seg) / 16; @@ -2020,7 +2020,7 @@ int mthca_arbel_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, cpu_to_be32(1); if (wr->opcode == IB_WR_SEND_WITH_IMM || wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM) - ((struct mthca_next_seg *) wqe)->imm = wr->imm_data; + ((struct mthca_next_seg *) wqe)->imm = wr->ex.imm_data; wqe += sizeof (struct mthca_next_seg); size = sizeof (struct mthca_next_seg) / 16; diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index 134189d..aa53aab 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -393,7 +393,7 @@ struct nes_adapter *nes_init_adapter(struct nes_device *nesdev, u8 hw_rev) { nesadapter->base_pd = 1; nesadapter->device_cap_flags = - IB_DEVICE_ZERO_STAG | IB_DEVICE_SEND_W_INV | IB_DEVICE_MEM_WINDOW; + IB_DEVICE_ZERO_STAG | IB_DEVICE_MEM_WINDOW; nesadapter->allocated_qps = (unsigned long *)&(((unsigned char *)nesadapter) [(sizeof(struct nes_adapter)+(sizeof(unsigned long)-1))&(~(sizeof(unsigned long)-1))]); diff --git a/include/rdma/ib_user_verbs.h b/include/rdma/ib_user_verbs.h index 64a721f..8d65bf0 100644 --- a/include/rdma/ib_user_verbs.h +++ b/include/rdma/ib_user_verbs.h @@ -533,7 +533,10 @@ struct ib_uverbs_send_wr { __u32 num_sge; __u32 opcode; __u32 send_flags; - __u32 imm_data; + union { + __u32 imm_data; + __u32 invalidate_rkey; + } ex; union { struct { __u64 remote_addr; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 66928e9..c48f6af 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -94,7 +94,7 @@ enum ib_device_cap_flags { IB_DEVICE_SRQ_RESIZE = (1<<13), IB_DEVICE_N_NOTIFY_CQ = (1<<14), IB_DEVICE_ZERO_STAG = (1<<15), - IB_DEVICE_SEND_W_INV = (1<<16), + IB_DEVICE_RESERVED = (1<<16), /* old SEND_W_INV */ IB_DEVICE_MEM_WINDOW = (1<<17), /* * Devices should set IB_DEVICE_UD_IP_SUM if they support @@ -105,6 +105,7 @@ enum ib_device_cap_flags { */ IB_DEVICE_UD_IP_CSUM = (1<<18), IB_DEVICE_UD_TSO = (1<<19), + IB_DEVICE_SEND_W_INV = (1<<21), }; enum ib_atomic_cap { @@ -625,7 +626,8 @@ enum ib_wr_opcode { IB_WR_RDMA_READ, IB_WR_ATOMIC_CMP_AND_SWP, IB_WR_ATOMIC_FETCH_AND_ADD, - IB_WR_LSO + IB_WR_LSO, + IB_WR_SEND_WITH_INV, }; enum ib_send_flags { @@ -649,7 +651,10 @@ struct ib_send_wr { int num_sge; enum ib_wr_opcode opcode; int send_flags; - __be32 imm_data; + union { + __be32 imm_data; + u32 invalidate_rkey; + } ex; union { struct { u64 remote_addr; diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index ffbf22a..8ea283e 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -1573,7 +1573,6 @@ rpcrdma_ep_post(struct rpcrdma_ia *ia, send_wr.sg_list = req->rl_send_iov; send_wr.num_sge = req->rl_niovs; send_wr.opcode = IB_WR_SEND; - send_wr.imm_data = 0; if (send_wr.num_sge == 4) /* no need to sync any pad (constant) */ ib_dma_sync_single_for_device(ia->ri_id->device, req->rl_send_iov[3].addr, req->rl_send_iov[3].length, -- 1.5.4.5 From weiny2 at llnl.gov Thu Apr 3 14:30:54 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 3 Apr 2008 14:30:54 -0700 Subject: [ofa-general] [PATCH] opensm/opensm/osm_perfmgr.c: change log level of counter overflow message Message-ID: <20080403143054.5abc9554.weiny2@llnl.gov> >From 821619569eea5bb116bc30d32ff18491d6953eb2 Mon Sep 17 00:00:00 2001 From: Ira K. Weiny Date: Thu, 3 Apr 2008 14:25:52 -0700 Subject: [PATCH] opensm/opensm/osm_perfmgr.c: change log level of counter overflow message Signed-off-by: Ira K. Weiny --- opensm/opensm/osm_perfmgr.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/opensm/opensm/osm_perfmgr.c b/opensm/opensm/osm_perfmgr.c index cc95bee..5c53c24 100644 --- a/opensm/opensm/osm_perfmgr.c +++ b/opensm/opensm/osm_perfmgr.c @@ -984,7 +984,7 @@ osm_perfmgr_check_overflow(osm_perfmgr_t * pm, __monitored_node_t *mon_node, osm_node_t *p_node = NULL; ib_net16_t lid = 0; - osm_log(pm->log, OSM_LOG_INFO, + osm_log(pm->log, OSM_LOG_VERBOSE, "PerfMgr: Counter overflow: %s (0x%" PRIx64 ") port %d; clearing counters\n", mon_node->name, mon_node->guid, port); -- 1.5.1 From rdreier at cisco.com Thu Apr 3 14:40:09 2008 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 03 Apr 2008 14:40:09 -0700 Subject: [ofa-general] [PATCH/RFC 2/2] RDMA/amso1100: Add support for "send with invalidate" work requests In-Reply-To: (Roland Dreier's message of "Thu, 03 Apr 2008 14:24:03 -0700") References: Message-ID: Handle IB_WR_SEND_WITH_INV work requests. This resurrects a patch sent long ago by Mikkel Hagen . Signed-off-by: Roland Dreier --- drivers/infiniband/hw/amso1100/c2_qp.c | 22 +++++++++++++++------- drivers/infiniband/hw/amso1100/c2_rnic.c | 3 ++- 2 files changed, 17 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2_qp.c b/drivers/infiniband/hw/amso1100/c2_qp.c index 9190bd5..a6d8944 100644 --- a/drivers/infiniband/hw/amso1100/c2_qp.c +++ b/drivers/infiniband/hw/amso1100/c2_qp.c @@ -811,16 +811,24 @@ int c2_post_send(struct ib_qp *ibqp, struct ib_send_wr *ib_wr, switch (ib_wr->opcode) { case IB_WR_SEND: - if (ib_wr->send_flags & IB_SEND_SOLICITED) { - c2_wr_set_id(&wr, C2_WR_TYPE_SEND_SE); - msg_size = sizeof(struct c2wr_send_req); + case IB_WR_SEND_WITH_INV: + if (ib_wr->opcode == IB_WR_SEND) { + if (ib_wr->send_flags & IB_SEND_SOLICITED) + c2_wr_set_id(&wr, C2_WR_TYPE_SEND_SE); + else + c2_wr_set_id(&wr, C2_WR_TYPE_SEND); + wr.sqwr.send.remote_stag = 0; } else { - c2_wr_set_id(&wr, C2_WR_TYPE_SEND); - msg_size = sizeof(struct c2wr_send_req); + if (ib_wr->send_flags & IB_SEND_SOLICITED) + c2_wr_set_id(&wr, C2_WR_TYPE_SEND_SE_INV); + else + c2_wr_set_id(&wr, C2_WR_TYPE_SEND_INV); + wr.sqwr.send.remote_stag = + cpu_to_be32(ib_wr->ex.invalidate_rkey); } - wr.sqwr.send.remote_stag = 0; - msg_size += sizeof(struct c2_data_addr) * ib_wr->num_sge; + msg_size = sizeof(struct c2wr_send_req) + + sizeof(struct c2_data_addr) * ib_wr->num_sge; if (ib_wr->num_sge > qp->send_sgl_depth) { err = -EINVAL; break; diff --git a/drivers/infiniband/hw/amso1100/c2_rnic.c b/drivers/infiniband/hw/amso1100/c2_rnic.c index b1441ae..9a054c6 100644 --- a/drivers/infiniband/hw/amso1100/c2_rnic.c +++ b/drivers/infiniband/hw/amso1100/c2_rnic.c @@ -455,7 +455,8 @@ int __devinit c2_rnic_init(struct c2_dev *c2dev) IB_DEVICE_CURR_QP_STATE_MOD | IB_DEVICE_SYS_IMAGE_GUID | IB_DEVICE_ZERO_STAG | - IB_DEVICE_MEM_WINDOW); + IB_DEVICE_MEM_WINDOW | + IB_DEVICE_SEND_W_INV); /* Allocate the qptr_array */ c2dev->qptr_array = vmalloc(C2_MAX_CQS * sizeof(void *)); -- 1.5.4.5 From sashak at voltaire.com Thu Apr 3 18:10:04 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 4 Apr 2008 01:10:04 +0000 Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_perfmgr.c: change log level of counter overflow message In-Reply-To: <20080403143054.5abc9554.weiny2@llnl.gov> References: <20080403143054.5abc9554.weiny2@llnl.gov> Message-ID: <20080404011004.GA8521@sashak.voltaire.com> On 14:30 Thu 03 Apr , Ira Weiny wrote: > From 821619569eea5bb116bc30d32ff18491d6953eb2 Mon Sep 17 00:00:00 2001 > From: Ira K. Weiny > Date: Thu, 3 Apr 2008 14:25:52 -0700 > Subject: [PATCH] opensm/opensm/osm_perfmgr.c: change log level of counter overflow message > > > Signed-off-by: Ira K. Weiny Applied. Thanks. Sasha From rdreier at cisco.com Thu Apr 3 16:06:10 2008 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 03 Apr 2008 16:06:10 -0700 Subject: [ofa-general] [PATCH/RFC 2/2] RDMA/amso1100: Add support for "send with invalidate" work requests In-Reply-To: (Roland Dreier's message of "Thu, 03 Apr 2008 14:40:09 -0700") References: Message-ID: Thinking about all this send w/ invalidate stuff... Is it worth merging just send w/invalidate for 2.6.26, or should we wait and get all the iWARP verbs/IB base memory management extensions/etc stuff straight and target 2.6.27? - R. From mhagen at iol.unh.edu Thu Apr 3 16:30:38 2008 From: mhagen at iol.unh.edu (Mikkel Hagen) Date: Thu, 03 Apr 2008 19:30:38 -0400 Subject: [ofa-general] [PATCH/RFC 2/2] RDMA/amso1100: Add support for "send with invalidate" work requests In-Reply-To: References: Message-ID: <47F5689E.90101@iol.unh.edu> There are people waiting for SendINV functionality, if we are comfortable with the state of the patches, I vote we merge sooner than later. Mikkel Hagen Project Assistant - Fibre Channel/SAS/SATA Consortiums Research and Development Engineer - iWARP Consortium FC/SAS/SATA:1-603-862-0701 iWARP:1-603-862-5083 Fax:1-603-862-4181 UNH-IOL 121 Technology Drive, Suite 2 Durham, NH 03824 Roland Dreier wrote: > Thinking about all this send w/ invalidate stuff... > > Is it worth merging just send w/invalidate for 2.6.26, or should we wait > and get all the iWARP verbs/IB base memory management extensions/etc > stuff straight and target 2.6.27? > > - R. > From rdreier at cisco.com Thu Apr 3 16:40:17 2008 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 03 Apr 2008 16:40:17 -0700 Subject: [ofa-general] [PATCH/RFC 2/2] RDMA/amso1100: Add support for "send with invalidate" work requests In-Reply-To: <47F5689E.90101@iol.unh.edu> (Mikkel Hagen's message of "Thu, 03 Apr 2008 19:30:38 -0400") References: <47F5689E.90101@iol.unh.edu> Message-ID: > There are people waiting for SendINV functionality, if we are > comfortable with the state of the patches, I vote we merge sooner than > later. Who is waiting and how are they going to use it? We don't have any "allocate L_Key" verb implemented now, so it could only possibly work with memory windows. Do any drivers have working memory window support? - R. From rdreier at cisco.com Thu Apr 3 16:57:37 2008 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 03 Apr 2008 16:57:37 -0700 Subject: [ofa-general] [PATCH 0/20] IB/ipath -- DDR HCA patches in for-roland for 2.6.26 In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> (Ralph Campbell's message of "Wed, 02 Apr 2008 15:49:01 -0700") References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> Message-ID: thanks, applied all 20 From mhagen at iol.unh.edu Thu Apr 3 17:30:42 2008 From: mhagen at iol.unh.edu (Mikkel Hagen) Date: Thu, 03 Apr 2008 20:30:42 -0400 Subject: [ofa-general] [PATCH/RFC 2/2] RDMA/amso1100: Add support for "send with invalidate" work requests In-Reply-To: References: <47F5689E.90101@iol.unh.edu> Message-ID: <47F576B2.300@iol.unh.edu> We've got an iSER implementation that was hoping to utilize SendINV, and our conformance and interoperability tools have been waiting for support to test. Mikkel Hagen Project Assistant - Fibre Channel/SAS/SATA Consortiums Research and Development Engineer - iWARP Consortium FC/SAS/SATA:1-603-862-0701 iWARP:1-603-862-5083 Fax:1-603-862-4181 UNH-IOL 121 Technology Drive, Suite 2 Durham, NH 03824 Roland Dreier wrote: > > There are people waiting for SendINV functionality, if we are > > comfortable with the state of the patches, I vote we merge sooner than > > later. > > Who is waiting and how are they going to use it? We don't have any > "allocate L_Key" verb implemented now, so it could only possibly work > with memory windows. Do any drivers have working memory window support? > > - R. > From rdreier at cisco.com Thu Apr 3 17:52:00 2008 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 03 Apr 2008 17:52:00 -0700 Subject: [ofa-general] [PATCH/RFC 2/2] RDMA/amso1100: Add support for "send with invalidate" work requests In-Reply-To: <47F576B2.300@iol.unh.edu> (Mikkel Hagen's message of "Thu, 03 Apr 2008 20:30:42 -0400") References: <47F5689E.90101@iol.unh.edu> <47F576B2.300@iol.unh.edu> Message-ID: > We've got an iSER implementation that was hoping to utilize SendINV, > and our conformance and interoperability tools have been waiting for > support to test. OK, that's good. But does this code start working if we add the two patches I posted? I don't understand how you could do anything useful with the current state of things plus send w/inval for amso1100. I hope I'm not being too difficult here -- but I really would like to understand how the patches that I have are useful as they stand, without some further support for new verbs and/or MW implementations. - R. From rdreier at cisco.com Thu Apr 3 17:56:18 2008 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 03 Apr 2008 17:56:18 -0700 Subject: [ofa-general] [PATCH/RFC] Add support for "send with invalidate" to libibverbs In-Reply-To: (Thomas Talpey's message of "Wed, 02 Apr 2008 13:21:39 -0400") References: <47F33837.60701@dev.mellanox.co.il> Message-ID: > drivers/infiniband/hw/ehca/ehca_hca.c 376: > props->max_mw = min_t(unsigned, rblock->max_mw, INT_MAX); > Note, ehca may set it to huge negative values, I think the code is OK as it stands... it takes the minimum (as unsigned int values) of rblock->max_mw and INT_MAX, and returns that. This should be working OK, at least since 76dea3bc ("IB/ehca: Fix clipping of device limits to INT_MAX"). > drivers/infiniband/hw/nes/nes_verbs.c 3915: > props->max_mw = nesibdev->max_mr; > nes puts the wrong value in the attribute field! (typo?) I'm not positive but it's plausible that the nes limit on the number of memory windows is the same as its limit on MRs. And nes has an implementation of bind_mw, so it is at least possible that it works. Actually now that I think of it, I have a nes setup where I could test your MW code... what is the sysctl to set? - R. From sfr at canb.auug.org.au Thu Apr 3 19:32:04 2008 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Fri, 4 Apr 2008 13:32:04 +1100 Subject: [ofa-general] linux-next: infiniband build failure Message-ID: <20080404133204.3edc0470.sfr@canb.auug.org.au> Hi Roland, Today's build of linux-next (powerpc ppc64_defconfig) produced this: drivers/infiniband/hw/ehca/ehca_reqs.c: In function 'ehca_write_swqe': drivers/infiniband/hw/ehca/ehca_reqs.c:191: error: 'const struct ib_send_wr' has no member named 'imm_data' Caused by commit 0f2031b6374e693474f01020efeee6e9a00fa918 ("IB/core: Add support for "send with invalidate" work requests"). I applied the patch below but it would be good if it could be merged back into the above commit. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ >From cb95023d10e3a6327b6b761f49f3e2e855882e57 Mon Sep 17 00:00:00 2001 From: Stephen Rothwell Date: Fri, 4 Apr 2008 13:26:45 +1100 Subject: [PATCH] infiniband-fix-1 Signed-off-by: Stephen Rothwell --- drivers/infiniband/hw/ehca/ehca_reqs.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_reqs.c b/drivers/infiniband/hw/ehca/ehca_reqs.c index 2ce8cff..784461d 100644 --- a/drivers/infiniband/hw/ehca/ehca_reqs.c +++ b/drivers/infiniband/hw/ehca/ehca_reqs.c @@ -188,7 +188,7 @@ static inline int ehca_write_swqe(struct ehca_qp *qp, if (send_wr->opcode == IB_WR_SEND_WITH_IMM || send_wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM) { /* this might not work as long as HW does not support it */ - wqe_p->immediate_data = be32_to_cpu(send_wr->imm_data); + wqe_p->immediate_data = be32_to_cpu(send_wr->ex.imm_data); wqe_p->wr_flag |= WQE_WRFLAG_IMM_DATA_PRESENT; } -- 1.5.4.5 From sfr at canb.auug.org.au Thu Apr 3 19:55:32 2008 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Fri, 4 Apr 2008 13:55:32 +1100 Subject: [ofa-general] linux-next: infiniband build failure Message-ID: <20080404135532.70c46480.sfr@canb.auug.org.au> Hi All, Today's build of linux-next (x86_64 allmodconfig) produced this: drivers/infiniband/hw/ipath/ipath_verbs.c: In function 'ipath_register_ib_device': drivers/infiniband/hw/ipath/ipath_verbs.c:2070: error: 'struct ib_device' has no member named 'class_dev' This is caused by the driver-core patch "ib-convert-struct-class_device-to-struct-device.patch" which changes the class_dev member of struct ib_device to "dev" and infiniband commit 63fe2f55dcd6d227bb9dc0aedec4431a9a7a8f92 ("IB/ipath: add calls to new 7220 code and enable in build") which adds another reference to class_dev. I applied the following patch (because reverting the above driver-core patch was too hard). I am not sure if it is the correct patch, but it does build This needs to be sorted out. Greg, could the driver-core patch be delivered through the infiniband tree? (This would, of course cause problems for driver-core-remove-no-longer-used-struct-class_device.patch.) -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ >From 70bb8a344acb62afb33e8c5f96d568aa1382210e Mon Sep 17 00:00:00 2001 From: Stephen Rothwell Date: Fri, 4 Apr 2008 13:43:49 +1100 Subject: [PATCH] infiniband-fix-2 Signed-off-by: Stephen Rothwell --- drivers/infiniband/hw/ipath/ipath_verbs.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c index 466f3fb..6ac0c5c 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c @@ -2067,7 +2067,7 @@ int ipath_register_ib_device(struct ipath_devdata *dd) dev->phys_port_cnt = 1; dev->num_comp_vectors = 1; dev->dma_device = &dd->pcidev->dev; - dev->class_dev.dev = dev->dma_device; + dev->dev.parent = dev->dma_device; dev->query_device = ipath_query_device; dev->modify_device = ipath_modify_device; dev->query_port = ipath_query_port; -- 1.5.4.5 From greg at kroah.com Thu Apr 3 20:10:20 2008 From: greg at kroah.com (Greg KH) Date: Thu, 3 Apr 2008 20:10:20 -0700 Subject: [ofa-general] Re: linux-next: infiniband build failure In-Reply-To: <20080404135532.70c46480.sfr@canb.auug.org.au> References: <20080404135532.70c46480.sfr@canb.auug.org.au> Message-ID: <20080404031020.GB24743@kroah.com> On Fri, Apr 04, 2008 at 01:55:32PM +1100, Stephen Rothwell wrote: > Hi All, > > Today's build of linux-next (x86_64 allmodconfig) produced this: > > drivers/infiniband/hw/ipath/ipath_verbs.c: In function 'ipath_register_ib_device': > drivers/infiniband/hw/ipath/ipath_verbs.c:2070: error: 'struct ib_device' has no member named 'class_dev' > > This is caused by the driver-core patch > "ib-convert-struct-class_device-to-struct-device.patch" which changes the > class_dev member of struct ib_device to "dev" and infiniband commit > 63fe2f55dcd6d227bb9dc0aedec4431a9a7a8f92 ("IB/ipath: add calls to new > 7220 code and enable in build") which adds another reference to class_dev. > > I applied the following patch (because reverting the above driver-core > patch was too hard). I am not sure if it is the correct patch, but it > does build This needs to be sorted out. Greg, could the driver-core > patch be delivered through the infiniband tree? (This would, of course > cause problems for > driver-core-remove-no-longer-used-struct-class_device.patch.) Your patch looks correct to me. Roland wanted the ib patch to go through my tree, and I figure we will work out these issues during the 2 week merge window. thanks, greg k-h From or.gerlitz at gmail.com Thu Apr 3 21:17:54 2008 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Fri, 4 Apr 2008 07:17:54 +0300 Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4 plans In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> <47F4E0C3.2030100@voltaire.com> <1207233121.29024.410.camel@hrosenstock-ws.xsigo.com> Message-ID: <15ddcffd0804032117o21e6d62br9def3e46d4d513c4@mail.gmail.com> On Thu, Apr 3, 2008 at 5:40 PM, Tang, Changqing wrote: > The problem is, from MPI side, (and by default), we don't know which port is on which > fabric, since the subnet prefix is the same. We rely on system admin to config two > different subnet prefixes for HP-MPI to work. > > No vendor has claimed to support this. CQ, not supporting a different subnet prefix per IB subnet is against IB nature, I don't think there should be any problem to configure a different prefix at each open SM instance and the Linux host stack would work perfectly under this config. If you are a ware to any problem in the opensm and/or the host stack please let the community know and the maintainers will fix it. Or. From or.gerlitz at gmail.com Thu Apr 3 21:22:35 2008 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Fri, 4 Apr 2008 07:22:35 +0300 Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4 plans In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> <47F4E0C3.2030100@voltaire.com> Message-ID: <15ddcffd0804032122i2993bd00x84d9d38d2b7f34ba@mail.gmail.com> On Thu, Apr 3, 2008 at 5:53 PM, Tang, Changqing wrote: > for example, in MPI, process A know the HCA guid on another node. After running for > some time, the switch is restarted for some reason, and the whole fabric is re-configured. CQ, If by "the whole fabric is re-configured" you refer to a case where a subnet prefix changes while a job runs and a process is detached/reattached to the job so now you want to adopt your design to handle it, is over engineering, why you want to do that? Or. From or.gerlitz at gmail.com Thu Apr 3 21:47:40 2008 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Fri, 4 Apr 2008 07:47:40 +0300 Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: <47F4F526.3060709@opengridcomputing.com> References: <47F3C2EF.6010304@oracle.com> <47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com> <47F4F526.3060709@opengridcomputing.com> Message-ID: <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com> On Thu, Apr 3, 2008 at 6:17 PM, Steve Wise wrote: > I think RDS might be getting confused because the 10GbE rnic shows up as a > dumb NIC hooked into the native TCP stack -and- an rdma device. > Jon Mason will be working to enable RDS soon on the chelsio device. He'll > feed back the changes needed, if any, to RDS. Stay tuned. Steve, I understand that a similar work has been done at least to some extent with open MPI, and I will be very happy to hear the lessons learned. Did you manage to have the same (say point to point) open mpi "transport" design/code work over rdma-cm over both IB and iWARP? Can someone from OGC or Chelsio drive a BOF on that in Sonoma? If not, can some notes be sent to the list? I say lets learn from what you did so far... Or. From richard.frank at oracle.com Thu Apr 3 22:52:01 2008 From: richard.frank at oracle.com (Richard Frank) Date: Fri, 04 Apr 2008 00:52:01 -0500 Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com> References: <47F3C2EF.6010304@oracle.com> <47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com> <47F4F526.3060709@opengridcomputing.com> <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com> Message-ID: <47F5C201.6080305@oracle.com> having a BOF at Sonoma - and or circulating a cheat sheet of what to watch out for would be very handy - indeed :) Or Gerlitz wrote: > On Thu, Apr 3, 2008 at 6:17 PM, Steve Wise wrote: > >> I think RDS might be getting confused because the 10GbE rnic shows up as a >> dumb NIC hooked into the native TCP stack -and- an rdma device. >> > > >> Jon Mason will be working to enable RDS soon on the chelsio device. He'll >> feed back the changes needed, if any, to RDS. Stay tuned. >> > > Steve, > > I understand that a similar work has been done at least to some extent > with open MPI, and I will be > very happy to hear the lessons learned. Did you manage to have the > same (say point to point) > open mpi "transport" design/code work over rdma-cm over both IB and iWARP? > > Can someone from OGC or Chelsio drive a BOF on that in Sonoma? > > If not, can some notes be sent to the list? I say lets learn from what > you did so far... > > Or. > From or.gerlitz at gmail.com Thu Apr 3 22:54:29 2008 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Fri, 4 Apr 2008 08:54:29 +0300 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's in infiniband.git) In-Reply-To: <47F37CA4.8000109@mellanox.co.il> References: <47F37CA4.8000109@mellanox.co.il> Message-ID: <15ddcffd0804032254t4533d41br671edf335c6daabb@mail.gmail.com> On Wed, Apr 2, 2008 at 3:31 PM, Tziporet Koren wrote: > We want to add send with invalidate > Eli will be able to send the patches next week and since they are small I think they can be in for 2.6.26 Does send with invalidate applies to rkeys generated through the proprietary FMR API? if not, what usage you envision to the new verb under nowadays IB devices? Or. From bs at q-leap.de Fri Apr 4 02:23:59 2008 From: bs at q-leap.de (Bernd Schubert) Date: Fri, 4 Apr 2008 11:23:59 +0200 Subject: [ofa-general] [PATCH] parse_node_map: print parse errors Message-ID: <200804041124.00004.bs@q-leap.de> Hello, could you please add the patch below, without it I probably never would have realized why my node name map was not accepted. Btw, I'm a bit surprised there don't seem to be any default wrappers, for fopen(), fclose(), malloc(), fprintf(), etc. diff -rup opensm-3.2.1.old/complib/cl_nodenamemap.c opensm-3.2.1/complib/cl_nodenamemap.c --- opensm-3.2.1.old/complib/cl_nodenamemap.c 2008-04-03 13:17:35.000000000 +0200 +++ opensm-3.2.1/complib/cl_nodenamemap.c 2008-04-04 11:09:42.000000000 +0200 @@ -55,8 +55,11 @@ static int map_name(void *cxt, uint64_t return 0; item = malloc(sizeof(*item)); - if (!item) + if (!item) { + fprintf(stderr, "Malloc failed, sizeof(*item) = %d.\n", sizeof(*item)); return -1; + } + item->guid = guid; item->name = strdup(p); cl_qmap_insert(map, item->guid, (cl_map_item_t *)item); @@ -169,6 +172,8 @@ int parse_node_map(const char *file_name guid = strtoull(p, &e, 0); if (e == p || (!isspace(*e) && *e != '#' && *e != '\0')) { fclose(f); + fprintf (stderr, "%s: Parse error in line: %s\n", + __func__, line); return -1; } Thanks, Bernd -- Bernd Schubert Q-Leap Networks GmbH From bs at q-leap.de Fri Apr 4 02:47:27 2008 From: bs at q-leap.de (Bernd Schubert) Date: Fri, 4 Apr 2008 11:47:27 +0200 Subject: [ofa-general] ERR 0108: Unknown remote side Message-ID: <200804041147.27565.bs@q-leap.de> Hello, opensm-3.2.1 logs some error messages like this: Apr 04 00:00:08 325114 [4580A960] 0x01 -> __osm_state_mgr_light_sweep_start: ERR 0108: Unknown remote side for node 0 x000b8cffff002ba2(SW_pfs1_leaf4) port 13. Adding to light sweep sampling list Apr 04 00:00:08 325126 [4580A960] 0x01 -> Directed Path Dump of 3 hop path: Path = 0,1,14,13 From ibnetdiscover output I see port13 of this switch is a switch-interconnect (sorry, I don't know what the correct name/identifier for switches within switches): [13] "S-000b8cffff002bfa"[13] # "SW_pfs1_inter7" lid 263 4xSDR Apr 04 00:00:08 325219 [4580A960] 0x01 -> __osm_state_mgr_light_sweep_start: ERR 0108: Unknown remote side for node 0 x000b8cffff002bf9(SW_pfs1_inter6) port 9. Adding to light sweep sampling list Apr 04 00:00:08 325234 [4580A960] 0x01 -> Directed Path Dump of 2 hop path: Path = 0,1,18 This is again an interconnection: [9] "S-000b8cffff002b9e"[15] # "SW_pfs1_leaf1" lid 177 4xDDR Apr 04 00:00:08 325288 [4580A960] 0x01 -> __osm_state_mgr_light_sweep_start: ERR 0108: Unknown remote side for node 0 x000b8cffff002bfa(SW_pfs1_inter7) port 13. Adding to light sweep sampling list Apr 04 00:00:08 325301 [4580A960] 0x01 -> Directed Path Dump of 2 hop path: Path = 0,1,14 And again an interconnection: [13] "S-000b8cffff002ba2"[13] # "SW_pfs1_leaf4" lid 182 4xDDR All the other interconnections seem to be fine. Thanks, Bernd -- Bernd Schubert Q-Leap Networks GmbH From andrea at qumranet.com Fri Apr 4 05:30:40 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Fri, 4 Apr 2008 14:30:40 +0200 Subject: [ofa-general] Re: EMM: disable other notifiers before register and unregister In-Reply-To: References: <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> <20080402220148.GV19189@duo.random> <20080402221716.GY19189@duo.random> <20080403151908.GB9603@duo.random> Message-ID: <20080404123040.GC10185@duo.random> On Thu, Apr 03, 2008 at 12:20:41PM -0700, Christoph Lameter wrote: > On Thu, 3 Apr 2008, Andrea Arcangeli wrote: > > > My attempt to fix this once and for all is to walk all vmas of the > > "mm" inside mmu_notifier_register and take all anon_vma locks and > > i_mmap_locks in virtual address order in a row. It's ok to take those > > inside the mmap_sem. Supposedly if anybody will ever take a double > > lock it'll do in order too. Then I can dump all the other locking and > > What about concurrent mmu_notifier registrations from two mm_structs > that have shared mappings? Isnt there a potential deadlock situation? No, the ordering of the lock avoids that. Here a snippnet. /* * This operation locks against the VM for all pte/vma/mm related * operations that could ever happen on a certain mm. This includes * vmtruncate, try_to_unmap, and all page faults. The holder * must not hold any mm related lock. A single task can't take more * than one mm lock in a row or it would deadlock. */ So you can't do: mm_lock(mm1); mm_lock(mm2); But if two different tasks run the mm_lock everything is ok. Each task in the system can lock at most 1 mm at time. > Well good luck. Hopefully we will get to something that works. Looks good so far but I didn't finish it yet. From Bennett at fpi-associates.com Fri Apr 4 06:17:14 2008 From: Bennett at fpi-associates.com (Elijah Simmons) Date: Fri, 04 Apr 2008 10:17:14 -0300 Subject: [ofa-general] 2003 microsoft office professional with business contact manager for outlook - $69 Message-ID: <000801c89656$79825e80$0100007f@aojsx> Type %lunoem. com% in Inter_net_Exp1o_rer Please kill any %%% symbols from address roxio easy media creator 8 - $39 adobe after effects cs3 - $69 adobe font folio 11 - $189 adobe photoshop cs3 extended - $89 microsoft visual basic professional 6.0 - $49 adobe audition 2.0 - $49 ulead photoimpact 12 - $79 Goto %lunoem. com% From Brian.Murrell at Sun.COM Fri Apr 4 07:36:59 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Fri, 04 Apr 2008 10:36:59 -0400 Subject: [ofa-general] can not join due to rate:2.5Gbps < group:10Gbps? Message-ID: <1207319819.1750.72.camel@pc.ilinx> I'm trying to get a few nodes here connected with IPoIB. On the first node I have tried with, after ifconfig'ing the interface into the network with other IPoIB nodes I cannot seem to ping any other nodes. I ran ibdiagnet and got a /tmp/ibdiagnet.pkey file with the following contents: sata14:/ # cat /tmp/ibdiagnet.pkey GROUP PKey:0x7fff Hosts:4 Full sata15/P2 lid=0x0004 guid=0x00066a01a0000363 dev=23108 Full sata14/P2 lid=0x0006 guid=0x00066a01a00002bf dev=23108 Full sata23/P2 lid=0x0008 guid=0x00066a01a00002fe dev=23108 Full sata16/P2 lid=0x0007 guid=0x00066a01a00002c1 dev=23108 When I run an "ibdiagpath -l 0x0004" I get the following: -W- Topology file is not specified. Reports regarding cluster links will use direct routes. -I- Using port 2 as the local port. -I--------------------------------------------------- -I- Traversing the path from local to destination -I--------------------------------------------------- -I- From: lid=0x0006 guid=0x00066a01a00002bf dev=23108 sata14/P2 -I- To: lid=0x0001 guid=0x00066a00c8000180 dev=5 Port=1 -I- From: lid=0x0001 guid=0x00066a00c8000180 dev=5 Port=2 -I- To: lid=0x0004 guid=0x00066a01a0000363 dev=23108 sata15/P2 -I--------------------------------------------------- -I- PM Counters Info -I--------------------------------------------------- -I- No illegal PM counters values were found -I--------------------------------------------------- -I- Path Partitions Report -I--------------------------------------------------- -I- Source sata14/P2 lid=0x0006 guid=0x00066a01a00002bf dev=23108 Port 2 PKeys:0xffff -I- Destination sata15 lid=0x0004 guid=0x00066a01a0000363 dev=23108 PKeys:0xffff -I- Path shared PKeys: 0xffff -I--------------------------------------------------- -I- IPoIB Path Check -I--------------------------------------------------- -I- Subnet: IPv4 PKey:0x7fff QKey:0x00000000 MTU:2048Byte rate:10Gbps SL:0x00 -W- Port sata14/P2 lid=0x0006 guid=0x00066a01a00002bf dev=23108 can not join due to rate:2.5Gbps < group:10Gbps -W- Port sata15/P2 lid=0x0004 guid=0x00066a01a0000363 dev=23108 can not join due to rate:2.5Gbps < group:10Gbps -E- No IPoIB Subnets found on Path! Nodes can not communicate via IPoIB! -I--------------------------------------------------- -I- QoS on Path Check -I--------------------------------------------------- -W- Blocked VLs:4 5 at node:sata14 lid=0x0006 guid=0x00066a01a00002bf dev=23108 port:2 -W- Blocked VLs:4 5 at node: lid=0x0001 guid=0x00066a00c8000180 dev=5 port:2 -I- The following SLs can be used:0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 -I- Done. Run time was 0 seconds. That IPoIB Path Check looks a bit alarming. Anyone have any suggestions? b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From swise at opengridcomputing.com Fri Apr 4 07:41:55 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 04 Apr 2008 09:41:55 -0500 Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com> References: <47F3C2EF.6010304@oracle.com> <47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com> <47F4F526.3060709@opengridcomputing.com> <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com> Message-ID: <47F63E33.5080709@opengridcomputing.com> Or Gerlitz wrote: > On Thu, Apr 3, 2008 at 6:17 PM, Steve Wise wrote: >> I think RDS might be getting confused because the 10GbE rnic shows up as a >> dumb NIC hooked into the native TCP stack -and- an rdma device. > >> Jon Mason will be working to enable RDS soon on the chelsio device. He'll >> feed back the changes needed, if any, to RDS. Stay tuned. > > Steve, > > I understand that a similar work has been done at least to some extent > with open MPI, and I will be > very happy to hear the lessons learned. Did you manage to have the > same (say point to point) > open mpi "transport" design/code work over rdma-cm over both IB and iWARP? > Definitely. We're running over rdma-cm over mthca and cxgb3 on 2 nodes today. 8 nodes over cxgb3. We're working out the details now. > Can someone from OGC or Chelsio drive a BOF on that in Sonoma? > > If not, can some notes be sent to the list? I say lets learn from what > you did so far... We won't be in Sonoma, but perhaps Jon can email some info to the list on what we've done to-date for open mpi. Steve. From hrosenstock at xsigo.com Fri Apr 4 07:55:58 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Fri, 04 Apr 2008 07:55:58 -0700 Subject: [ofa-general] can not join due to rate:2.5Gbps < group:10Gbps? In-Reply-To: <1207319819.1750.72.camel@pc.ilinx> References: <1207319819.1750.72.camel@pc.ilinx> Message-ID: <1207320958.15625.47.camel@hrosenstock-ws.xsigo.com> On Fri, 2008-04-04 at 10:36 -0400, Brian J. Murrell wrote: > I'm trying to get a few nodes here connected with IPoIB. On the first > node I have tried with, after ifconfig'ing the interface into the > network with other IPoIB nodes I cannot seem to ping any other nodes. I > ran ibdiagnet and got a /tmp/ibdiagnet.pkey file with the following > contents: > > sata14:/ # cat /tmp/ibdiagnet.pkey > GROUP PKey:0x7fff Hosts:4 > Full sata15/P2 lid=0x0004 guid=0x00066a01a0000363 dev=23108 > Full sata14/P2 lid=0x0006 guid=0x00066a01a00002bf dev=23108 > Full sata23/P2 lid=0x0008 guid=0x00066a01a00002fe dev=23108 > Full sata16/P2 lid=0x0007 guid=0x00066a01a00002c1 dev=23108 > > When I run an "ibdiagpath -l 0x0004" I get the following: > > -W- Topology file is not specified. > Reports regarding cluster links will use direct routes. > -I- Using port 2 as the local port. > > -I--------------------------------------------------- > -I- Traversing the path from local to destination > -I--------------------------------------------------- > -I- From: lid=0x0006 guid=0x00066a01a00002bf dev=23108 sata14/P2 > -I- To: lid=0x0001 guid=0x00066a00c8000180 dev=5 Port=1 > > -I- From: lid=0x0001 guid=0x00066a00c8000180 dev=5 Port=2 > -I- To: lid=0x0004 guid=0x00066a01a0000363 dev=23108 sata15/P2 > > > -I--------------------------------------------------- > -I- PM Counters Info > -I--------------------------------------------------- > -I- No illegal PM counters values were found > > -I--------------------------------------------------- > -I- Path Partitions Report > -I--------------------------------------------------- > -I- Source sata14/P2 lid=0x0006 guid=0x00066a01a00002bf dev=23108 Port 2 > PKeys:0xffff > -I- Destination sata15 lid=0x0004 guid=0x00066a01a0000363 dev=23108 PKeys:0xffff > -I- Path shared PKeys: 0xffff > > -I--------------------------------------------------- > -I- IPoIB Path Check > -I--------------------------------------------------- > -I- Subnet: IPv4 PKey:0x7fff QKey:0x00000000 MTU:2048Byte rate:10Gbps SL:0x00 > -W- Port sata14/P2 lid=0x0006 guid=0x00066a01a00002bf dev=23108 can not join due > to rate:2.5Gbps < group:10Gbps > -W- Port sata15/P2 lid=0x0004 guid=0x00066a01a0000363 dev=23108 can not join due > to rate:2.5Gbps < group:10Gbps > -E- No IPoIB Subnets found on Path! Nodes can not communicate via IPoIB! > > -I--------------------------------------------------- > -I- QoS on Path Check > -I--------------------------------------------------- > -W- Blocked VLs:4 5 at node:sata14 lid=0x0006 guid=0x00066a01a00002bf dev=23108 > port:2 > -W- Blocked VLs:4 5 at node: lid=0x0001 guid=0x00066a00c8000180 dev=5 port:2 > -I- The following SLs can be used:0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 > > -I- Done. Run time was 0 seconds. > > That IPoIB Path Check looks a bit alarming. > > Anyone have any suggestions? Looks like you have a mixed rate set of ports so you need to configure the group to 2.5 Gbps. What SM are you using ? -- Hal > b. > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From Brian.Murrell at Sun.COM Fri Apr 4 08:05:36 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Fri, 04 Apr 2008 11:05:36 -0400 Subject: [ofa-general] can not join due to rate:2.5Gbps < group:10Gbps? In-Reply-To: <1207320958.15625.47.camel@hrosenstock-ws.xsigo.com> References: <1207319819.1750.72.camel@pc.ilinx> <1207320958.15625.47.camel@hrosenstock-ws.xsigo.com> Message-ID: <1207321536.1750.80.camel@pc.ilinx> On Fri, 2008-04-04 at 07:55 -0700, Hal Rosenstock wrote: > > Looks like you have a mixed rate set of ports so you need to configure > the group to 2.5 Gbps. I'm a bit green with I/B, so please bear with me if you can. I do understand that there can be mixed rates depending on hardware. But the "hardware guys" assure me the cards in these machines should be able to do 10Gbps. Maybe they are wrong. The card is listing as: 06:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1) > What SM are you using ? That's a good question. I suspect it's running on the switch. I don't know any details on the switch (yet) though. I will need to engage the hardware folks to determine this. I did get an error when when ran ibdiagnet about more than 1 master SM running when I started opensmd on one of the nodes and none of the other nodes are running an SM so that only leaves the switch. In my limited exposure to IB, running the SM on the switch has always yielded bad results. I will see if I can get them to disable it. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From hrosenstock at xsigo.com Fri Apr 4 08:08:23 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Fri, 04 Apr 2008 08:08:23 -0700 Subject: [ofa-general] can not join due to rate:2.5Gbps < group:10Gbps? In-Reply-To: <1207321536.1750.80.camel@pc.ilinx> References: <1207319819.1750.72.camel@pc.ilinx> <1207320958.15625.47.camel@hrosenstock-ws.xsigo.com> <1207321536.1750.80.camel@pc.ilinx> Message-ID: <1207321703.15625.51.camel@hrosenstock-ws.xsigo.com> On Fri, 2008-04-04 at 11:05 -0400, Brian J. Murrell wrote: > On Fri, 2008-04-04 at 07:55 -0700, Hal Rosenstock wrote: > > > > Looks like you have a mixed rate set of ports so you need to configure > > the group to 2.5 Gbps. > > I'm a bit green with I/B, so please bear with me if you can. I do > understand that there can be mixed rates depending on hardware. But the > "hardware guys" assure me the cards in these machines should be able to > do 10Gbps. Maybe they are wrong. The card is listing as: > > 06:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1) Yes, but the multicast groups (which IPoIB uses) need to be a homogeneous rate so either it needs to be lowest common denominator or some nodes will not be able to participate. > > What SM are you using ? > > That's a good question. I suspect it's running on the switch. I don't > know any details on the switch (yet) though. I will need to engage the > hardware folks to determine this. I did get an error when when ran > ibdiagnet about more than 1 master SM running when I started opensmd on > one of the nodes and none of the other nodes are running an SM so that > only leaves the switch. > > In my limited exposure to IB, running the SM on the switch has always > yielded bad results. I will see if I can get them to disable it. That's one choice. The other is to contact your SM (switch) vendor as to how to configure the SM for this. Most SMs have some configuration to deal with the situation you are describing. -- Hal > b. > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From changquing.tang at hp.com Fri Apr 4 08:08:33 2008 From: changquing.tang at hp.com (Tang, Changqing) Date: Fri, 4 Apr 2008 15:08:33 +0000 Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4 plans In-Reply-To: <15ddcffd0804032117o21e6d62br9def3e46d4d513c4@mail.gmail.com> References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> <47F4E0C3.2030100@voltaire.com> <1207233121.29024.410.camel@hrosenstock-ws.xsigo.com> <15ddcffd0804032117o21e6d62br9def3e46d4d513c4@mail.gmail.com> Message-ID: What I mean "claim to support" is to have more people to test with this config. --CQ > -----Original Message----- > From: Or Gerlitz [mailto:or.gerlitz at gmail.com] > Sent: Thursday, April 03, 2008 11:18 PM > To: Tang, Changqing > Cc: general at lists.openfabrics.org; ewg at lists.openfabrics.org > Subject: Re: [ofa-general] Re: [ewg] OFED March 24 meeting > summary on OFED 1.4 plans > > On Thu, Apr 3, 2008 at 5:40 PM, Tang, Changqing > wrote: > > > The problem is, from MPI side, (and by default), we don't > know which > > port is on which fabric, since the subnet prefix is the > same. We rely > > on system admin to config two different subnet prefixes > for HP-MPI to work. > > > > No vendor has claimed to support this. > > CQ, not supporting a different subnet prefix per IB subnet is > against IB nature, I don't think there should be any problem > to configure a different prefix at each open SM instance and > the Linux host stack would work perfectly under this config. > If you are a ware to any problem in the opensm and/or the > host stack please let the community know and the maintainers > will fix it. > > Or. > From todd.rimmer at qlogic.com Fri Apr 4 08:14:14 2008 From: todd.rimmer at qlogic.com (Todd Rimmer) Date: Fri, 4 Apr 2008 10:14:14 -0500 Subject: [ofa-general] can not join due to rate:2.5Gbps < group:10Gbps? In-Reply-To: <1207321703.15625.51.camel@hrosenstock-ws.xsigo.com> References: <1207319819.1750.72.camel@pc.ilinx><1207320958.15625.47.camel@hrosenstock-ws.xsigo.com><1207321536.1750.80.camel@pc.ilinx> <1207321703.15625.51.camel@hrosenstock-ws.xsigo.com> Message-ID: <4FB1BCCAE6CAED44A1DC005B1DE06119428F53@EPEXCH2.qlogic.org> > From: Hal Rosenstock > Sent: Friday, April 04, 2008 11:08 AM > To: Brian J. Murrell > Cc: general at lists.openfabrics.org > Subject: Re: [ofa-general] can not join due to rate:2.5Gbps < > group:10Gbps? > > On Fri, 2008-04-04 at 11:05 -0400, Brian J. Murrell wrote: > > On Fri, 2008-04-04 at 07:55 -0700, Hal Rosenstock wrote: > > > > > > Looks like you have a mixed rate set of ports so you need to configure > > > the group to 2.5 Gbps. > > > > I'm a bit green with I/B, so please bear with me if you can. I do > > understand that there can be mixed rates depending on hardware. But the > > "hardware guys" assure me the cards in these machines should be able to > > do 10Gbps. Maybe they are wrong. The card is listing as: > > > > 06:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1) I would not recommend reconfiguring your SM for this situation. Instead, you most likely have a bad cable or possibly a bad HCA or switch port. All IB products shipped within the last 6 years support 10g, so the fact your system has negotiated to 2.5g indicates a problem with the link. Bad or poorly connected cables are the typical cause. Todd Rimmer Chief Architect QLogic System Interconnect Group Voice: 610-233-4852 Fax: 610-233-4777 Todd.Rimmer at QLogic.com www.QLogic.com From hrosenstock at xsigo.com Fri Apr 4 08:19:08 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Fri, 04 Apr 2008 08:19:08 -0700 Subject: [ofa-general] can not join due to rate:2.5Gbps < group:10Gbps? In-Reply-To: <4FB1BCCAE6CAED44A1DC005B1DE06119428F53@EPEXCH2.qlogic.org> References: <1207319819.1750.72.camel@pc.ilinx> <1207320958.15625.47.camel@hrosenstock-ws.xsigo.com> <1207321536.1750.80.camel@pc.ilinx> <1207321703.15625.51.camel@hrosenstock-ws.xsigo.com> <4FB1BCCAE6CAED44A1DC005B1DE06119428F53@EPEXCH2.qlogic.org> Message-ID: <1207322348.15625.54.camel@hrosenstock-ws.xsigo.com> On Fri, 2008-04-04 at 10:14 -0500, Todd Rimmer wrote: > > From: Hal Rosenstock > > Sent: Friday, April 04, 2008 11:08 AM > > To: Brian J. Murrell > > Cc: general at lists.openfabrics.org > > Subject: Re: [ofa-general] can not join due to rate:2.5Gbps < > > group:10Gbps? > > > > On Fri, 2008-04-04 at 11:05 -0400, Brian J. Murrell wrote: > > > On Fri, 2008-04-04 at 07:55 -0700, Hal Rosenstock wrote: > > > > > > > > Looks like you have a mixed rate set of ports so you need to > configure > > > > the group to 2.5 Gbps. > > > > > > I'm a bit green with I/B, so please bear with me if you can. I do > > > understand that there can be mixed rates depending on hardware. But > the > > > "hardware guys" assure me the cards in these machines should be able > to > > > do 10Gbps. Maybe they are wrong. The card is listing as: > > > > > > 06:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev > a1) > I would not recommend reconfiguring your SM for this situation. > Instead, you most likely have a bad cable or possibly a bad HCA or > switch port. All IB products shipped within the last 6 years support > 10g, so the fact your system has negotiated to 2.5g indicates a problem > with the link. > > Bad or poorly connected cables are the typical cause. Yes, this seems right; I misread this as the DDR/SDR issue. I would doubt he has any 1x hardware. -- Hal > Todd Rimmer > Chief Architect > QLogic System Interconnect Group > Voice: 610-233-4852 Fax: 610-233-4777 > Todd.Rimmer at QLogic.com www.QLogic.com > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From Brian.Murrell at Sun.COM Fri Apr 4 08:25:56 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Fri, 04 Apr 2008 11:25:56 -0400 Subject: [ofa-general] can not join due to rate:2.5Gbps < group:10Gbps? In-Reply-To: <4FB1BCCAE6CAED44A1DC005B1DE06119428F53@EPEXCH2.qlogic.org> References: <1207319819.1750.72.camel@pc.ilinx> <1207320958.15625.47.camel@hrosenstock-ws.xsigo.com> <1207321536.1750.80.camel@pc.ilinx> <1207321703.15625.51.camel@hrosenstock-ws.xsigo.com> <4FB1BCCAE6CAED44A1DC005B1DE06119428F53@EPEXCH2.qlogic.org> Message-ID: <1207322756.1750.86.camel@pc.ilinx> On Fri, 2008-04-04 at 10:14 -0500, Todd Rimmer wrote: > I would not recommend reconfiguring your SM for this situation. Indeed, if what you say below pans out, I'd rather not. > Instead, you most likely have a bad cable or possibly a bad HCA or > switch port. All IB products shipped within the last 6 years support > 10g, so the fact your system has negotiated to 2.5g indicates a problem > with the link. OK. I will investigate this. Is there any more direct method of determining what rate an HCA has negotiated than using the "ibdiagpath -l $nid" mechanism that I have been using? It seems like a kind of round-about method of getting that information. > Bad or poorly connected cables are the typical cause. I will have the hardware guys take another look at that. Thanx for all the pointers! b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From changquing.tang at hp.com Fri Apr 4 08:26:50 2008 From: changquing.tang at hp.com (Tang, Changqing) Date: Fri, 4 Apr 2008 15:26:50 +0000 Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4 plans In-Reply-To: <15ddcffd0804032122i2993bd00x84d9d38d2b7f34ba@mail.gmail.com> References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> <47F4E0C3.2030100@voltaire.com> <15ddcffd0804032122i2993bd00x84d9d38d2b7f34ba@mail.gmail.com> Message-ID: > > for example, in MPI, process A know the HCA guid on another node. > > After running for some time, the switch is restarted for > some reason, and the whole fabric is re-configured. > > > CQ, > > If by "the whole fabric is re-configured" you refer to a case > where a subnet prefix changes while a job runs and a process > is detached/reattached to the job so now you want to adopt > your design to handle it, is over engineering, why you want > to do that? > I am concerning the port lid change. It is always the best if a process can figure the info it needs by itself, SA query is the right way and is in IB spec. while it is possible to let processes to exchange information(port lid) again, but there are difficulties: during the middle of a long job run, it is hard to let two processes to coordinate such infomation exchange, and it requires a second channel to do so. If the second channel is IPoIB, it is broken as well, and we need to re-establish it again. I just ask for the SA functionalities. If it is not possible, we have to use a very complicated way to let HP-MPI to survive from network failure. --CQ > Or. > From hrosenstock at xsigo.com Fri Apr 4 08:29:34 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Fri, 04 Apr 2008 08:29:34 -0700 Subject: [ofa-general] can not join due to rate:2.5Gbps < group:10Gbps? In-Reply-To: <1207322756.1750.86.camel@pc.ilinx> References: <1207319819.1750.72.camel@pc.ilinx> <1207320958.15625.47.camel@hrosenstock-ws.xsigo.com> <1207321536.1750.80.camel@pc.ilinx> <1207321703.15625.51.camel@hrosenstock-ws.xsigo.com> <4FB1BCCAE6CAED44A1DC005B1DE06119428F53@EPEXCH2.qlogic.org> <1207322756.1750.86.camel@pc.ilinx> Message-ID: <1207322974.15625.57.camel@hrosenstock-ws.xsigo.com> On Fri, 2008-04-04 at 11:25 -0400, Brian J. Murrell wrote: > On Fri, 2008-04-04 at 10:14 -0500, Todd Rimmer wrote: > > I would not recommend reconfiguring your SM for this situation. > > Indeed, if what you say below pans out, I'd rather not. > > > Instead, you most likely have a bad cable or possibly a bad HCA or > > switch port. All IB products shipped within the last 6 years support > > 10g, so the fact your system has negotiated to 2.5g indicates a problem > > with the link. > > OK. I will investigate this. Is there any more direct method of > determining what rate an HCA has negotiated than using the "ibdiagpath > -l $nid" mechanism that I have been using? It seems like a kind of > round-about method of getting that information. Try ibcheckwidth for this particular problem > > Bad or poorly connected cables are the typical cause. > > I will have the hardware guys take another look at that. > > Thanx for all the pointers! > > b. > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Fri Apr 4 08:47:29 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 08:47:29 -0700 Subject: [ofa-general] linux-next: infiniband build failure In-Reply-To: <20080404133204.3edc0470.sfr@canb.auug.org.au> (Stephen Rothwell's message of "Fri, 4 Apr 2008 13:32:04 +1100") References: <20080404133204.3edc0470.sfr@canb.auug.org.au> Message-ID: > drivers/infiniband/hw/ehca/ehca_reqs.c: In function 'ehca_write_swqe': > drivers/infiniband/hw/ehca/ehca_reqs.c:191: error: 'const struct ib_send_wr' has no member named 'imm_data' Oops, thanks, I forgot to run my cross-compile (and ehca is ppc only). Anyway, your fix is correct and I rolled it into my patch. Thanks! From Thomas.Talpey at netapp.com Fri Apr 4 08:56:23 2008 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Fri, 04 Apr 2008 11:56:23 -0400 Subject: [ofa-general] [PATCH/RFC 2/2] RDMA/amso1100: Add support for "send with invalidate" work requests In-Reply-To: References: <47F5689E.90101@iol.unh.edu> <47F576B2.300@iol.unh.edu> Message-ID: At 08:52 PM 4/3/2008, Roland Dreier wrote: >But does this code start working if we add the two patches I posted? I >don't understand how you could do anything useful with the current state >of things plus send w/inval for amso1100. Does send w/inv actually work end-to-end on the Ammasso? Who's testing it? Just wondering. Tom. From rdreier at cisco.com Fri Apr 4 09:06:42 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 09:06:42 -0700 Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com> (Or Gerlitz's message of "Fri, 4 Apr 2008 07:47:40 +0300") References: <47F3C2EF.6010304@oracle.com> <47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com> <47F4F526.3060709@opengridcomputing.com> <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com> Message-ID: > If not, can some notes be sent to the list? I say lets learn from what > you did so far... In my experience, getting code to work over both IB and iWARP isn't that hard. The main points are: - Use the RDMA CM for connection establishment (duh) - Memory regions used to receive RDMA read responses must have "remote write" permission (since in the iWARP protocol, RDMA read responses are basically the same as incoming RDMA write requests) - Active side of the connection must do the first operation - Don't use IB-specific features (atomics, immediate data) - R. From rdreier at cisco.com Fri Apr 4 09:10:22 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 09:10:22 -0700 Subject: [ofa-general] Re: linux-next: infiniband build failure In-Reply-To: <20080404031020.GB24743@kroah.com> (Greg KH's message of "Thu, 3 Apr 2008 20:10:20 -0700") References: <20080404135532.70c46480.sfr@canb.auug.org.au> <20080404031020.GB24743@kroah.com> Message-ID: > Roland wanted the ib patch to go through my tree, and I figure we will > work out these issues during the 2 week merge window. Actually I said I was fine with whatever you wanted to do :) Given that the new device support for ipath seems to cause problems for ib-convert-struct-class_device-to-struct-device.patch, it seems it might be simpler for me to carry that in my tree. If someone sends me the latest patch I'll be happy to merge it in (and do the fixups for the ipath changes). Then the final struct class_device removal just needs to be merged late -- I'll send my tree to Linus to pull in the first day or two of the merge window so I shouldn't be a problem. Stephen, Greg, I really have the simplest job here managing my tree, compared to you two guys, so as before just let me know how you want to handle this ;) - R. From bjorn.finnhammar at vv.se Fri Apr 4 07:41:07 2008 From: bjorn.finnhammar at vv.se (burnard edison) Date: Fri, 04 Apr 2008 14:41:07 +0000 Subject: [ofa-general] Hot video of your high school teacher Message-ID: <000601c89670$0635ff66$829b94a3@udmtx> UUFyWibTLk Watch the video nowoOPqUUFyWib -------------- next part -------------- An HTML attachment was scrubbed... URL: From Brian.Murrell at Sun.COM Fri Apr 4 10:37:10 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Fri, 04 Apr 2008 13:37:10 -0400 Subject: [ofa-general] can not join due to rate:2.5Gbps < group:10Gbps? In-Reply-To: <1207322974.15625.57.camel@hrosenstock-ws.xsigo.com> References: <1207319819.1750.72.camel@pc.ilinx> <1207320958.15625.47.camel@hrosenstock-ws.xsigo.com> <1207321536.1750.80.camel@pc.ilinx> <1207321703.15625.51.camel@hrosenstock-ws.xsigo.com> <4FB1BCCAE6CAED44A1DC005B1DE06119428F53@EPEXCH2.qlogic.org> <1207322756.1750.86.camel@pc.ilinx> <1207322974.15625.57.camel@hrosenstock-ws.xsigo.com> Message-ID: <1207330630.1750.108.camel@pc.ilinx> On Fri, 2008-04-04 at 08:29 -0700, Hal Rosenstock wrote: > > Try ibcheckwidth for this particular problem Well, seems I solved the problem after finding the ibstatus command. Seems the hardware guys plugged port 2 into the switch because port 1 of one of the HCAs in one of the machines is broken. Thanx for all of the help! b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From hrosenstock at xsigo.com Fri Apr 4 10:55:21 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Fri, 04 Apr 2008 10:55:21 -0700 Subject: [ofa-general] ERR 0108: Unknown remote side In-Reply-To: <200804041147.27565.bs@q-leap.de> References: <200804041147.27565.bs@q-leap.de> Message-ID: <1207331721.15625.76.camel@hrosenstock-ws.xsigo.com> On Fri, 2008-04-04 at 11:47 +0200, Bernd Schubert wrote: > Hello, > > opensm-3.2.1 logs some error messages like this: > > Apr 04 00:00:08 325114 [4580A960] 0x01 -> __osm_state_mgr_light_sweep_start: > ERR 0108: Unknown remote side for node 0 > x000b8cffff002ba2(SW_pfs1_leaf4) port 13. Adding to light sweep sampling list > Apr 04 00:00:08 325126 [4580A960] 0x01 -> Directed Path Dump of 3 hop path: > Path = 0,1,14,13 > > > From ibnetdiscover output I see port13 of this switch is a switch-interconnect > (sorry, I don't know what the correct name/identifier for switches within > switches): > > [13] "S-000b8cffff002bfa"[13] # "SW_pfs1_inter7" lid 263 > 4xSDR > > > Apr 04 00:00:08 325219 [4580A960] 0x01 -> __osm_state_mgr_light_sweep_start: > ERR 0108: Unknown remote side for node 0 > x000b8cffff002bf9(SW_pfs1_inter6) port 9. Adding to light sweep sampling list > Apr 04 00:00:08 325234 [4580A960] 0x01 -> Directed Path Dump of 2 hop path: > Path = 0,1,18 > > This is again an interconnection: > > [9] "S-000b8cffff002b9e"[15] # "SW_pfs1_leaf1" lid 177 > 4xDDR > > > Apr 04 00:00:08 325288 [4580A960] 0x01 -> __osm_state_mgr_light_sweep_start: > ERR 0108: Unknown remote side for node 0 > x000b8cffff002bfa(SW_pfs1_inter7) port 13. Adding to light sweep sampling list > Apr 04 00:00:08 325301 [4580A960] 0x01 -> Directed Path Dump of 2 hop path: > Path = 0,1,14 > > > And again an interconnection: > > [13] "S-000b8cffff002ba2"[13] # "SW_pfs1_leaf4" lid 182 > 4xDDR > > > All the other interconnections seem to be fine. Any idea if OpenSM 3.1.10 has the same issue as 3.2.1 ? Is this some large Flextronics switch ? -- Hal > Thanks, > Bernd > > From rdreier at cisco.com Fri Apr 4 11:04:06 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 11:04:06 -0700 Subject: [ofa-general] Re: [PATCH V2] mlx4_core: increase max number of qp's to 128K In-Reply-To: <200711281008.10521.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Wed, 28 Nov 2007 10:08:10 +0200") References: <200711281008.10521.jackm@dev.mellanox.co.il> Message-ID: thanks, applied at long last. From tom at opengridcomputing.com Fri Apr 4 12:10:40 2008 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 04 Apr 2008 14:10:40 -0500 Subject: [ofa-general] [PATCH] AMSO1100: Add check for NULL reply_msg in c2_intr Message-ID: <1207336240.1363.20.camel@trinity.ogc.int> AMSO1100: Add check for NULL reply_msg in c2_intr This is a checker-found bug posted to bugzilla.kernel.org (7478). Upon inspection I also found a place where we could attempt to kmem_cache_free a null pointer. Signed-off-by: Tom Tucker --- Roland, I don't think anyone has ever hit this bug, so it is a low priority in my view. I also noticed that if we refactored vq_wait_for_reply that we could combine a common if (!reply) { err = -ENOMEM; goto bail; } construct by guaranteeing that reply is non-null if vq_wait_for_reply returns without an error. This patch, however, is much smaller. What do you think? drivers/infiniband/hw/amso1100/c2_cq.c | 4 ++-- drivers/infiniband/hw/amso1100/c2_intr.c | 6 +++++- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2_cq.c b/drivers/infiniband/hw/amso1100/c2_cq.c index d2b3366..bb17cce 100644 --- a/drivers/infiniband/hw/amso1100/c2_cq.c +++ b/drivers/infiniband/hw/amso1100/c2_cq.c @@ -422,8 +422,8 @@ void c2_free_cq(struct c2_dev *c2dev, struct c2_cq *cq) goto bail1; reply = (struct c2wr_cq_destroy_rep *) (unsigned long) (vq_req->reply_msg); - - vq_repbuf_free(c2dev, reply); + if (reply) + vq_repbuf_free(c2dev, reply); bail1: vq_req_free(c2dev, vq_req); bail0: diff --git a/drivers/infiniband/hw/amso1100/c2_intr.c b/drivers/infiniband/hw/amso1100/c2_intr.c index 0d0bc33..3b50954 100644 --- a/drivers/infiniband/hw/amso1100/c2_intr.c +++ b/drivers/infiniband/hw/amso1100/c2_intr.c @@ -174,7 +174,11 @@ static void handle_vq(struct c2_dev *c2dev, u32 mq_index) return; } - err = c2_errno(reply_msg); + if (reply_msg) + err = c2_errno(reply_msg); + else + err = -ENOMEM; + if (!err) switch (req->event) { case IW_CM_EVENT_ESTABLISHED: c2_set_qp_state(req->qp, From rdreier at cisco.com Fri Apr 4 12:20:14 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 12:20:14 -0700 Subject: [ofa-general] error with ibv_poll_cq() call In-Reply-To: (Roland Dreier's message of "Fri, 28 Mar 2008 22:35:19 -0700") References: <200803260901.25918.jackm@dev.mellanox.co.il> Message-ID: OK, I committed my change to libmlx4 and the equivalent thing for libmthca. - R. From rdreier at cisco.com Fri Apr 4 12:22:06 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 12:22:06 -0700 Subject: [ofa-general] Re: [PATCH] AMSO1100: Add check for NULL reply_msg in c2_intr In-Reply-To: <1207336240.1363.20.camel@trinity.ogc.int> (Tom Tucker's message of "Fri, 04 Apr 2008 14:10:40 -0500") References: <1207336240.1363.20.camel@trinity.ogc.int> Message-ID: > I don't think anyone has ever hit this bug, so it is a low priority in my view. I also noticed that > if we refactored vq_wait_for_reply that we could combine a common > > if (!reply) { > err = -ENOMEM; > goto bail; > } > > construct by guaranteeing that reply is non-null if vq_wait_for_reply returns without > an error. This patch, however, is much smaller. What do you think? Well, now is a good time to merge either version of the fix. Would be nice to kill off one of the Coverity issues so I'm happy to take this. It's up to you how much effort you want to spend on this... the refactoring sounds nice but I think we're OK without it. - R. From Brian.Murrell at Sun.COM Fri Apr 4 12:24:28 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Fri, 04 Apr 2008 15:24:28 -0400 Subject: [ofa-general] where to report bugs? Message-ID: <1207337068.1750.114.camel@pc.ilinx> I'm wondering what the official mechanism is to report bugs? Just about anything I'm going to find is likely to be limited to build and installation bugs, like this one... In infiniband-diags-1.3.6/Makefile.am we have the line: INCLUDES = -I$(srcdir)/include -I$(includedir) -I$(includedir)/infiniband This is assuming that other OFED packages have been installed in the general system $PREFIX, usually /usr as $includedir should be /usr/include. But in particular, I have installed the opensm{,-devel} in an alternate location (i.e. PREFIX) and the infiniband-diags build fails with: if gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I/usr/include -I/usr/include/infiniband -I/home/brian/ofed_1.3_integration/tree/usr/include -Wall -I/home/brian/ofed_1.3_integration/tree/usr/include -O2 -g -fmessage-length=0 -D_FORTIFY_SOURCE=2 -MT src_ibnetdiscover-ibnetdiscover.o -MD -MP -MF ".deps/src_ibnetdiscover-ibnetdiscover.Tpo" -c -o src_ibnetdiscover-ibnetdiscover.o `test -f 'src/ibnetdiscover.c' || echo './'`src/ibnetdiscover.c; \ then mv -f ".deps/src_ibnetdiscover-ibnetdiscover.Tpo" ".deps/src_ibnetdiscover-ibnetdiscover.Po"; else rm -f ".deps/src_ibnetdiscover-ibnetdiscover.Tpo"; exit 1; fi In file included from src/ibnetdiscover.c:53: /home/brian/ofed_1.3_integration/tree/usr/include/infiniband/complib/cl_nodenamemap.h:39:29: error: complib/cl_qmap.h: No such file or directory In file included from src/ibnetdiscover.c:53: /home/brian/ofed_1.3_integration/tree/usr/include/infiniband/complib/cl_nodenamemap.h:45: error: expected specifier-qualifier-list before ‘cl_map_item_t’ /home/brian/ofed_1.3_integration/tree/usr/include/infiniband/complib/cl_nodenamemap.h:51: error: expected specifier-qualifier-list before ‘cl_qmap_t’ make[1]: *** [src_ibnetdiscover-ibnetdiscover.o] Error 1 make[1]: Leaving directory `/home/brian/rpm/BUILD/infiniband-diags-1.3.6' On my system, with opensm-devel (and all other OFED RPMs) installed in an alternate PREFIX, the above list of include paths should be s#/usr/include/infiniband#PREFIX/include/infiniband#. It seems probably infiniband-diags needs to have the same "--with-osm" switch that ibutils has. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From richard.frank at oracle.com Fri Apr 4 13:26:04 2008 From: richard.frank at oracle.com (Richard Frank) Date: Fri, 04 Apr 2008 15:26:04 -0500 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's in infiniband.git) In-Reply-To: References: <47F37CA4.8000109@mellanox.co.il> Message-ID: <47F68EDC.4050107@oracle.com> > We want to add send with invalidate & mask compare and swap. > Eli will be able to send the patches next week and since they are > small I think they can be in for 2.6.26 We are very interested in these new operations and are moving in the direction of tightly integrating RDMA along with atomics (if available) into Oracle. We plan on testing some early prototypes of the these in the few months. Send with invalidate is an exact match for our current RDS V3 rdma driver - and should be more efficient than the current background syncing of the tpt to ensure keys are invalidated. We intend on exposing the atomics via the RDS driver along with simple low level rdma operations to Oracle's internal clients. If Oracle is running over a transport which exports atomics and rdma - Oracle will see a dramatic performance boost for several database operations. Roland Dreier wrote: > > We want to add send with invalidate & mask compare and swap. > > Eli will be able to send the patches next week and since they are > > small I think they can be in for 2.6.26 > > Send with invalidate should be OK. Let's see about the masked atomics > stuff -- we have a ton of new verbs and I think we might want to slow > down and make sure it all makes sense. > > > What about the split CQ for UD mode? It's improved the IPoIB > > performance for small messages significantly. > > Oh yeah... I'll try to get that in too. > > > mlx4- we plan to send patches for the low level driver only to enable > > mlx4_en. These only affect our low level driver. > > No problem in principle, let's see the actual patches. > > > I think we should try to push for XEC in 2.6.26 since there are > > already MPI implementation that use it and this ties them to use OFED > > only. > > Also this feature is stable and now being defined in IBTA > > Not taking it causing changes between OFED and the kernel and your > > libibverbs and we wish to avoid such gaps. > > Is there any thing we can do to help and make it into 2.6.26? > > I don't have a good feeling that the user-kernel interface is well > thought out, so I want to consider XRC + ehca LL stuff + new iWARP verbs > and make sure we have something that makes sense for the future. > > - R. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From tom at opengridcomputing.com Fri Apr 4 12:10:40 2008 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 04 Apr 2008 14:10:40 -0500 Subject: [ofa-general] [PATCH] AMSO1100: Add check for NULL reply_msg in c2_intr Message-ID: <1207336240.1363.20.camel@trinity.ogc.int> AMSO1100: Add check for NULL reply_msg in c2_intr This is a checker-found bug posted to bugzilla.kernel.org (7478). Upon inspection I also found a place where we could attempt to kmem_cache_free a null pointer. Signed-off-by: Tom Tucker --- Roland, I don't think anyone has ever hit this bug, so it is a low priority in my view. I also noticed that if we refactored vq_wait_for_reply that we could combine a common if (!reply) { err = -ENOMEM; goto bail; } construct by guaranteeing that reply is non-null if vq_wait_for_reply returns without an error. This patch, however, is much smaller. What do you think? drivers/infiniband/hw/amso1100/c2_cq.c | 4 ++-- drivers/infiniband/hw/amso1100/c2_intr.c | 6 +++++- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2_cq.c b/drivers/infiniband/hw/amso1100/c2_cq.c index d2b3366..bb17cce 100644 --- a/drivers/infiniband/hw/amso1100/c2_cq.c +++ b/drivers/infiniband/hw/amso1100/c2_cq.c @@ -422,8 +422,8 @@ void c2_free_cq(struct c2_dev *c2dev, struct c2_cq *cq) goto bail1; reply = (struct c2wr_cq_destroy_rep *) (unsigned long) (vq_req->reply_msg); - - vq_repbuf_free(c2dev, reply); + if (reply) + vq_repbuf_free(c2dev, reply); bail1: vq_req_free(c2dev, vq_req); bail0: diff --git a/drivers/infiniband/hw/amso1100/c2_intr.c b/drivers/infiniband/hw/amso1100/c2_intr.c index 0d0bc33..3b50954 100644 --- a/drivers/infiniband/hw/amso1100/c2_intr.c +++ b/drivers/infiniband/hw/amso1100/c2_intr.c @@ -174,7 +174,11 @@ static void handle_vq(struct c2_dev *c2dev, u32 mq_index) return; } - err = c2_errno(reply_msg); + if (reply_msg) + err = c2_errno(reply_msg); + else + err = -ENOMEM; + if (!err) switch (req->event) { case IW_CM_EVENT_ESTABLISHED: c2_set_qp_state(req->qp, From tom at opengridcomputing.com Fri Apr 4 12:32:43 2008 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 04 Apr 2008 14:32:43 -0500 Subject: [ofa-general] Re: [PATCH] AMSO1100: Add check for NULL reply_msg in c2_intr In-Reply-To: References: <1207336240.1363.20.camel@trinity.ogc.int> Message-ID: <1207337563.1363.22.camel@trinity.ogc.int> On Fri, 2008-04-04 at 12:22 -0700, Roland Dreier wrote: > > I don't think anyone has ever hit this bug, so it is a low priority in my view. I also noticed that > > if we refactored vq_wait_for_reply that we could combine a common > > > > if (!reply) { > > err = -ENOMEM; > > goto bail; > > } > > > > construct by guaranteeing that reply is non-null if vq_wait_for_reply returns without > > an error. This patch, however, is much smaller. What do you think? > > Well, now is a good time to merge either version of the fix. Would be > nice to kill off one of the Coverity issues so I'm happy to take this. > > It's up to you how much effort you want to spend on this... the > refactoring sounds nice but I think we're OK without it. > I'm up to my eyeballs right now. If it's ok with you I'd say defer the refactoring. > - R. From rdreier at cisco.com Fri Apr 4 12:34:52 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 12:34:52 -0700 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's in infiniband.git) In-Reply-To: <47F68EDC.4050107@oracle.com> (Richard Frank's message of "Fri, 04 Apr 2008 15:26:04 -0500") References: <47F37CA4.8000109@mellanox.co.il> <47F68EDC.4050107@oracle.com> Message-ID: > We are very interested in these new operations and are moving in the > direction of tightly integrating RDMA along with atomics (if > available) into Oracle. We plan on testing some early prototypes of > the these in the few months. And you need the ConnectX-only masked atomics? Or do the standard IB atomic operations work for you? Of course using atomics at all means that things don't work on iWARP. > Send with invalidate is an exact match for our current RDS V3 rdma > driver - and should be more efficient than the current background > syncing of the tpt to ensure keys are invalidated. How does send with invalidate interact with the current IB FMR stuff? Seems that you would run into trouble keeping the state of the FMR straight if the remote side is invalidating them. Also I would think that send-with-invalidate would be much more expensive than the current FMR method of batching up the invalidates, since you don't get to amortize the cost of syncing up all the internal HCA state. - R. From rdreier at cisco.com Fri Apr 4 12:35:39 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 12:35:39 -0700 Subject: [ofa-general] Re: [PATCH] AMSO1100: Add check for NULL reply_msg in c2_intr In-Reply-To: <1207337563.1363.22.camel@trinity.ogc.int> (Tom Tucker's message of "Fri, 04 Apr 2008 14:32:43 -0500") References: <1207336240.1363.20.camel@trinity.ogc.int> <1207337563.1363.22.camel@trinity.ogc.int> Message-ID: > I'm up to my eyeballs right now. If it's ok with you I'd say defer the > refactoring. No problem, I'll queue this up and if you ever get time to work on amso1100 you can send the refactoring. But are you working on a pmtu fix? - R. From rdreier at cisco.com Fri Apr 4 12:38:15 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 12:38:15 -0700 Subject: [ofa-general] Re: [PATCH 7/10] IB/ipoib: Add ethtool support In-Reply-To: <1205767448.25950.142.camel@mtls03> (Eli Cohen's message of "Mon, 17 Mar 2008 17:24:08 +0200") References: <1205767448.25950.142.camel@mtls03> Message-ID: thanks, applied From hrosenstock at xsigo.com Fri Apr 4 12:56:00 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Fri, 04 Apr 2008 12:56:00 -0700 Subject: [ofa-general] where to report bugs? In-Reply-To: <1207337068.1750.114.camel@pc.ilinx> References: <1207337068.1750.114.camel@pc.ilinx> Message-ID: <1207338960.15625.147.camel@hrosenstock-ws.xsigo.com> On Fri, 2008-04-04 at 15:24 -0400, Brian J. Murrell wrote: > I'm wondering what the official mechanism is to report bugs? http://www.openfabrics.org/bugzilla but that's usually used when email is insufficient and some issue needs tracking but it's up to you. -- Hal From rdreier at cisco.com Fri Apr 4 12:58:16 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 12:58:16 -0700 Subject: [ofa-general] Re: [PATCH 10/10] IB/mlx4: add support for modifying CQ parameters In-Reply-To: <1205767465.25950.144.camel@mtls03> (Eli Cohen's message of "Mon, 17 Mar 2008 17:24:25 +0200") References: <1205767465.25950.144.camel@mtls03> Message-ID: thanks, I applied 8/10 and 9/10, and changed this one around a bit before applying it... it seemed cleaner to me not to expose the CQ context to the mlx4_ib driver. For CQ resize we can just add a new mlx4_cq_resize() function in mlx4_core, since the context parameters that matter there are completely different. (And there's no need for mlx4_ib to worry about either the modify moderation or resize cases) >From a1f375e52ce0b39bebaa27adc6d3724816f7e963 Mon Sep 17 00:00:00 2001 From: Eli Cohen Date: Mon, 17 Mar 2008 17:24:25 +0200 Subject: [PATCH] IB/mlx4: Add support for modifying CQ moderation parameters Signed-off-by: Eli Cohen Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mlx4/cq.c | 8 ++++++++ drivers/infiniband/hw/mlx4/main.c | 1 + drivers/infiniband/hw/mlx4/mlx4_ib.h | 1 + drivers/net/mlx4/cq.c | 31 +++++++++++++++++++++++++++++++ include/linux/mlx4/cmd.h | 2 +- include/linux/mlx4/cq.h | 3 +++ 6 files changed, 45 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 7d70af7..e4fb64b 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -85,6 +85,14 @@ static struct mlx4_cqe *next_cqe_sw(struct mlx4_ib_cq *cq) return get_sw_cqe(cq, cq->mcq.cons_index); } +int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period) +{ + struct mlx4_ib_cq *mcq = to_mcq(cq); + struct mlx4_ib_dev *dev = to_mdev(cq->device); + + return mlx4_cq_modify(dev->dev, &mcq->mcq, cq_count, cq_period); +} + struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector, struct ib_ucontext *context, struct ib_udata *udata) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index e9330a0..76dd45c 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -609,6 +609,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) ibdev->ib_dev.post_send = mlx4_ib_post_send; ibdev->ib_dev.post_recv = mlx4_ib_post_recv; ibdev->ib_dev.create_cq = mlx4_ib_create_cq; + ibdev->ib_dev.modify_cq = mlx4_ib_modify_cq; ibdev->ib_dev.destroy_cq = mlx4_ib_destroy_cq; ibdev->ib_dev.poll_cq = mlx4_ib_poll_cq; ibdev->ib_dev.req_notify_cq = mlx4_ib_arm_cq; diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index 3f8bd0a..ef8ad96 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -254,6 +254,7 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, struct ib_udata *udata); int mlx4_ib_dereg_mr(struct ib_mr *mr); +int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period); struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector, struct ib_ucontext *context, struct ib_udata *udata); diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c index d4441fe..00a270b 100644 --- a/drivers/net/mlx4/cq.c +++ b/drivers/net/mlx4/cq.c @@ -121,6 +121,13 @@ static int mlx4_SW2HW_CQ(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox, MLX4_CMD_TIME_CLASS_A); } +static int mlx4_MODIFY_CQ(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox, + int cq_num, u32 opmod) +{ + return mlx4_cmd(dev, mailbox->dma, cq_num, opmod, MLX4_CMD_MODIFY_CQ, + MLX4_CMD_TIME_CLASS_A); +} + static int mlx4_HW2SW_CQ(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox, int cq_num) { @@ -129,6 +136,30 @@ static int mlx4_HW2SW_CQ(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox, MLX4_CMD_TIME_CLASS_A); } +int mlx4_cq_modify(struct mlx4_dev *dev, struct mlx4_cq *cq, + u16 count, u16 period) +{ + struct mlx4_cmd_mailbox *mailbox; + struct mlx4_cq_context *cq_context; + int err; + + mailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(mailbox)) + return PTR_ERR(mailbox); + + cq_context = mailbox->buf; + memset(cq_context, 0, sizeof *cq_context); + + cq_context->cq_max_count = cpu_to_be16(count); + cq_context->cq_period = cpu_to_be16(period); + + err = mlx4_MODIFY_CQ(dev, mailbox, cq->cqn, 1); + + mlx4_free_cmd_mailbox(dev, mailbox); + return err; +} +EXPORT_SYMBOL_GPL(mlx4_cq_modify); + int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq) { diff --git a/include/linux/mlx4/cmd.h b/include/linux/mlx4/cmd.h index 7d1eaa9..77323a7 100644 --- a/include/linux/mlx4/cmd.h +++ b/include/linux/mlx4/cmd.h @@ -81,7 +81,7 @@ enum { MLX4_CMD_SW2HW_CQ = 0x16, MLX4_CMD_HW2SW_CQ = 0x17, MLX4_CMD_QUERY_CQ = 0x18, - MLX4_CMD_RESIZE_CQ = 0x2c, + MLX4_CMD_MODIFY_CQ = 0x2c, /* SRQ commands */ MLX4_CMD_SW2HW_SRQ = 0x35, diff --git a/include/linux/mlx4/cq.h b/include/linux/mlx4/cq.h index 1243eba..f7c3511 100644 --- a/include/linux/mlx4/cq.h +++ b/include/linux/mlx4/cq.h @@ -130,4 +130,7 @@ enum { MLX4_CQ_DB_REQ_NOT = 2 << 24 }; +int mlx4_cq_modify(struct mlx4_dev *dev, struct mlx4_cq *cq, + u16 count, u16 period); + #endif /* MLX4_CQ_H */ -- 1.5.4.5 From michael.heinz at qlogic.com Fri Apr 4 13:08:18 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Fri, 4 Apr 2008 15:08:18 -0500 Subject: [ofa-general] MVAPICH2 crashes on mixed fabric Message-ID: Hey, all, I'm not sure if this is a known bug or some sort of limitation I'm unaware of, but I've been building and testing with the OFED 1.3 GA release on a small fabric that has a mix of Arbel-based and newer Connect-X HCAs. What I've discovered is that mvapich and openmpi work fine across the entire fabric, but mvapich2 crashes when I use a mix of Arbels and Connect-X. The errors vary depending on the test program but here's an example: [mheinz at compute-0-0 IMB-3.0]$ mpirun -n 5 ./IMB-MPI1 . . . (output snipped) . . . #----------------------------------------------------------------------- ------ # Benchmarking Sendrecv # #processes = 2 # ( 3 additional processes waiting in MPI_Barrier) #----------------------------------------------------------------------- ------ #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 3.51 3.51 3.51 0.00 1 1000 3.63 3.63 3.63 0.52 2 1000 3.67 3.67 3.67 1.04 4 1000 3.64 3.64 3.64 2.09 8 1000 3.67 3.67 3.67 4.16 16 1000 3.67 3.67 3.67 8.31 32 1000 3.74 3.74 3.74 16.32 64 1000 3.90 3.90 3.90 31.28 128 1000 4.75 4.75 4.75 51.39 256 1000 5.21 5.21 5.21 93.79 512 1000 5.96 5.96 5.96 163.77 1024 1000 7.88 7.89 7.89 247.54 2048 1000 11.42 11.42 11.42 342.00 4096 1000 15.33 15.33 15.33 509.49 8192 1000 22.19 22.20 22.20 703.83 16384 1000 34.57 34.57 34.57 903.88 32768 1000 51.32 51.32 51.32 1217.94 65536 640 85.80 85.81 85.80 1456.74 131072 320 155.23 155.24 155.24 1610.40 262144 160 301.84 301.86 301.85 1656.39 524288 80 598.62 598.69 598.66 1670.31 1048576 40 1175.22 1175.30 1175.26 1701.69 2097152 20 2309.05 2309.05 2309.05 1732.32 4194304 10 4548.72 4548.98 4548.85 1758.64 [0] Abort: Got FATAL event 3 at line 796 in file ibv_channel_manager.c rank 0 in job 1 compute-0-0.local_36049 caused collective abort of all ranks exit status of rank 0: killed by signal 9 If, however, I define my mpdring to contain only Connect-X systems OR only Arbel systems, IMB-MPI1 runs to completion. Can any suggest a workaround or is this a real bug with mvapich2? -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrea at qumranet.com Fri Apr 4 13:20:56 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Fri, 4 Apr 2008 22:20:56 +0200 Subject: [ofa-general] [PATCH] mmu notifier #v11 In-Reply-To: References: <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> <20080402220148.GV19189@duo.random> <20080402221716.GY19189@duo.random> <20080403151908.GB9603@duo.random> Message-ID: <20080404202055.GA14784@duo.random> This should guarantee that nobody can register when any of the mmu notifiers is running avoiding all the races including guaranteeing range_start not to be missed. I'll adapt the other patches to provide the sleeping-feature on top of this (only needed by XPMEM) soon. KVM seems to run fine on top of this one. Andrew can you apply this to -mm? Signed-off-by: Andrea Arcangeli Signed-off-by: Nick Piggin Signed-off-by: Christoph Lameter diff --git a/include/linux/mm.h b/include/linux/mm.h --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1050,6 +1050,9 @@ unsigned long addr, unsigned long len, unsigned long flags, struct page **pages); +extern void mm_lock(struct mm_struct *mm); +extern void mm_unlock(struct mm_struct *mm); + extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -225,6 +225,9 @@ #ifdef CONFIG_CGROUP_MEM_RES_CTLR struct mem_cgroup *mem_cgroup; #endif +#ifdef CONFIG_MMU_NOTIFIER + struct hlist_head mmu_notifier_list; +#endif }; #endif /* _LINUX_MM_TYPES_H */ diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h new file mode 100644 --- /dev/null +++ b/include/linux/mmu_notifier.h @@ -0,0 +1,175 @@ +#ifndef _LINUX_MMU_NOTIFIER_H +#define _LINUX_MMU_NOTIFIER_H + +#include +#include +#include + +struct mmu_notifier; +struct mmu_notifier_ops; + +#ifdef CONFIG_MMU_NOTIFIER + +struct mmu_notifier_ops { + /* + * Called when nobody can register any more notifier in the mm + * and after the "mn" notifier has been disarmed already. + */ + void (*release)(struct mmu_notifier *mn, + struct mm_struct *mm); + + /* + * clear_flush_young is called after the VM is + * test-and-clearing the young/accessed bitflag in the + * pte. This way the VM will provide proper aging to the + * accesses to the page through the secondary MMUs and not + * only to the ones through the Linux pte. + */ + int (*clear_flush_young)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long address); + + /* + * Before this is invoked any secondary MMU is still ok to + * read/write to the page previously pointed by the Linux pte + * because the old page hasn't been freed yet. If required + * set_page_dirty has to be called internally to this method. + */ + void (*invalidate_page)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long address); + + /* + * invalidate_range_start() and invalidate_range_end() must be + * paired. Multiple invalidate_range_start/ends may be nested + * or called concurrently. + */ + void (*invalidate_range_start)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, unsigned long end); + void (*invalidate_range_end)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, unsigned long end); +}; + +struct mmu_notifier { + struct hlist_node hlist; + const struct mmu_notifier_ops *ops; +}; + +static inline int mm_has_notifiers(struct mm_struct *mm) +{ + return unlikely(!hlist_empty(&mm->mmu_notifier_list)); +} + +extern void mmu_notifier_register(struct mmu_notifier *mn, + struct mm_struct *mm); +extern void __mmu_notifier_release(struct mm_struct *mm); +extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address); +extern void __mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address); +extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm, + unsigned long start, unsigned long end); +extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end); + + +static inline void mmu_notifier_release(struct mm_struct *mm) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_release(mm); +} + +static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address) +{ + if (mm_has_notifiers(mm)) + return __mmu_notifier_clear_flush_young(mm, address); + return 0; +} + +static inline void mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_invalidate_page(mm, address); +} + +static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_invalidate_range_start(mm, start, end); +} + +static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_invalidate_range_end(mm, start, end); +} + +static inline void mmu_notifier_mm_init(struct mm_struct *mm) +{ + INIT_HLIST_HEAD(&mm->mmu_notifier_list); +} + +#define ptep_clear_flush_notify(__vma, __address, __ptep) \ +({ \ + pte_t __pte; \ + struct vm_area_struct *___vma = __vma; \ + unsigned long ___address = __address; \ + __pte = ptep_clear_flush(___vma, ___address, __ptep); \ + mmu_notifier_invalidate_page(___vma->vm_mm, ___address); \ + __pte; \ +}) + +#define ptep_clear_flush_young_notify(__vma, __address, __ptep) \ +({ \ + int __young; \ + struct vm_area_struct *___vma = __vma; \ + unsigned long ___address = __address; \ + __young = ptep_clear_flush_young(___vma, ___address, __ptep); \ + __young |= mmu_notifier_clear_flush_young(___vma->vm_mm, \ + ___address); \ + __young; \ +}) + +#else /* CONFIG_MMU_NOTIFIER */ + +static inline void mmu_notifier_release(struct mm_struct *mm) +{ +} + +static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address) +{ + return 0; +} + +static inline void mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address) +{ +} + +static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ +} + +static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ +} + +static inline void mmu_notifier_mm_init(struct mm_struct *mm) +{ +} + +#define ptep_clear_flush_young_notify ptep_clear_flush_young +#define ptep_clear_flush_notify ptep_clear_flush + +#endif /* CONFIG_MMU_NOTIFIER */ + +#endif /* _LINUX_MMU_NOTIFIER_H */ diff --git a/kernel/fork.c b/kernel/fork.c --- a/kernel/fork.c +++ b/kernel/fork.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include @@ -362,6 +363,7 @@ if (likely(!mm_alloc_pgd(mm))) { mm->def_flags = 0; + mmu_notifier_mm_init(mm); return mm; } diff --git a/mm/Kconfig b/mm/Kconfig --- a/mm/Kconfig +++ b/mm/Kconfig @@ -193,3 +193,7 @@ config VIRT_TO_BUS def_bool y depends on !ARCH_NO_VIRT_TO_BUS + +config MMU_NOTIFIER + def_bool y + bool "MMU notifier, for paging KVM/RDMA" diff --git a/mm/Makefile b/mm/Makefile --- a/mm/Makefile +++ b/mm/Makefile @@ -33,4 +33,5 @@ obj-$(CONFIG_SMP) += allocpercpu.o obj-$(CONFIG_QUICKLIST) += quicklist.o obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o +obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c --- a/mm/filemap_xip.c +++ b/mm/filemap_xip.c @@ -194,7 +194,7 @@ if (pte) { /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); - pteval = ptep_clear_flush(vma, address, pte); + pteval = ptep_clear_flush_notify(vma, address, pte); page_remove_rmap(page, vma); dec_mm_counter(mm, file_rss); BUG_ON(pte_dirty(pteval)); diff --git a/mm/fremap.c b/mm/fremap.c --- a/mm/fremap.c +++ b/mm/fremap.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -214,7 +215,9 @@ spin_unlock(&mapping->i_mmap_lock); } + mmu_notifier_invalidate_range_start(mm, start, start + size); err = populate_range(mm, vma, start, size, pgoff); + mmu_notifier_invalidate_range_end(mm, start, start + size); if (!err && !(flags & MAP_NONBLOCK)) { if (unlikely(has_write_lock)) { downgrade_write(&mm->mmap_sem); diff --git a/mm/hugetlb.c b/mm/hugetlb.c --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include @@ -799,6 +800,7 @@ BUG_ON(start & ~HPAGE_MASK); BUG_ON(end & ~HPAGE_MASK); + mmu_notifier_invalidate_range_start(mm, start, end); spin_lock(&mm->page_table_lock); for (address = start; address < end; address += HPAGE_SIZE) { ptep = huge_pte_offset(mm, address); @@ -819,6 +821,7 @@ } spin_unlock(&mm->page_table_lock); flush_tlb_range(vma, start, end); + mmu_notifier_invalidate_range_end(mm, start, end); list_for_each_entry_safe(page, tmp, &page_list, lru) { list_del(&page->lru); put_page(page); diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -51,6 +51,7 @@ #include #include #include +#include #include #include @@ -611,6 +612,9 @@ if (is_vm_hugetlb_page(vma)) return copy_hugetlb_page_range(dst_mm, src_mm, vma); + if (is_cow_mapping(vma->vm_flags)) + mmu_notifier_invalidate_range_start(src_mm, addr, end); + dst_pgd = pgd_offset(dst_mm, addr); src_pgd = pgd_offset(src_mm, addr); do { @@ -621,6 +625,11 @@ vma, addr, next)) return -ENOMEM; } while (dst_pgd++, src_pgd++, addr = next, addr != end); + + if (is_cow_mapping(vma->vm_flags)) + mmu_notifier_invalidate_range_end(src_mm, + vma->vm_start, end); + return 0; } @@ -897,7 +906,9 @@ lru_add_drain(); tlb = tlb_gather_mmu(mm, 0); update_hiwater_rss(mm); + mmu_notifier_invalidate_range_start(mm, address, end); end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details); + mmu_notifier_invalidate_range_end(mm, address, end); if (tlb) tlb_finish_mmu(tlb, address, end); return end; @@ -1463,10 +1474,11 @@ { pgd_t *pgd; unsigned long next; - unsigned long end = addr + size; + unsigned long start = addr, end = addr + size; int err; BUG_ON(addr >= end); + mmu_notifier_invalidate_range_start(mm, start, end); pgd = pgd_offset(mm, addr); do { next = pgd_addr_end(addr, end); @@ -1474,6 +1486,7 @@ if (err) break; } while (pgd++, addr = next, addr != end); + mmu_notifier_invalidate_range_end(mm, start, end); return err; } EXPORT_SYMBOL_GPL(apply_to_page_range); @@ -1675,7 +1688,7 @@ * seen in the presence of one thread doing SMC and another * thread doing COW. */ - ptep_clear_flush(vma, address, page_table); + ptep_clear_flush_notify(vma, address, page_table); set_pte_at(mm, address, page_table, entry); update_mmu_cache(vma, address, entry); lru_cache_add_active(new_page); diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -1747,11 +1748,13 @@ lru_add_drain(); tlb = tlb_gather_mmu(mm, 0); update_hiwater_rss(mm); + mmu_notifier_invalidate_range_start(mm, start, end); unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL); vm_unacct_memory(nr_accounted); free_pgtables(&tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS, next? next->vm_start: 0); tlb_finish_mmu(tlb, start, end); + mmu_notifier_invalidate_range_end(mm, start, end); } /* @@ -2037,6 +2040,7 @@ unsigned long end; /* mm's last user has gone, and its about to be pulled down */ + mmu_notifier_release(mm); arch_exit_mmap(mm); lru_add_drain(); @@ -2242,3 +2246,69 @@ return 0; } + +static void mm_lock_unlock(struct mm_struct *mm, int lock) +{ + struct vm_area_struct *vma; + spinlock_t *i_mmap_lock_last, *anon_vma_lock_last; + + i_mmap_lock_last = NULL; + for (;;) { + spinlock_t *i_mmap_lock = (spinlock_t *) -1UL; + for (vma = mm->mmap; vma; vma = vma->vm_next) + if (vma->vm_file && vma->vm_file->f_mapping && + (unsigned long) i_mmap_lock > + (unsigned long) + &vma->vm_file->f_mapping->i_mmap_lock && + (unsigned long) + &vma->vm_file->f_mapping->i_mmap_lock > + (unsigned long) i_mmap_lock_last) + i_mmap_lock = + &vma->vm_file->f_mapping->i_mmap_lock; + if (i_mmap_lock == (spinlock_t *) -1UL) + break; + i_mmap_lock_last = i_mmap_lock; + if (lock) + spin_lock(i_mmap_lock); + else + spin_unlock(i_mmap_lock); + } + + anon_vma_lock_last = NULL; + for (;;) { + spinlock_t *anon_vma_lock = (spinlock_t *) -1UL; + for (vma = mm->mmap; vma; vma = vma->vm_next) + if (vma->anon_vma && + (unsigned long) anon_vma_lock > + (unsigned long) &vma->anon_vma->lock && + (unsigned long) &vma->anon_vma->lock > + (unsigned long) anon_vma_lock_last) + anon_vma_lock = &vma->anon_vma->lock; + if (anon_vma_lock == (spinlock_t *) -1UL) + break; + anon_vma_lock_last = anon_vma_lock; + if (lock) + spin_lock(anon_vma_lock); + else + spin_unlock(anon_vma_lock); + } +} + +/* + * This operation locks against the VM for all pte/vma/mm related + * operations that could ever happen on a certain mm. This includes + * vmtruncate, try_to_unmap, and all page faults. The holder + * must not hold any mm related lock. A single task can't take more + * than one mm lock in a row or it would deadlock. + */ +void mm_lock(struct mm_struct * mm) +{ + down_write(&mm->mmap_sem); + mm_lock_unlock(mm, 1); +} + +void mm_unlock(struct mm_struct *mm) +{ + mm_lock_unlock(mm, 0); + up_write(&mm->mmap_sem); +} diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c new file mode 100644 --- /dev/null +++ b/mm/mmu_notifier.c @@ -0,0 +1,100 @@ +/* + * linux/mm/mmu_notifier.c + * + * Copyright (C) 2008 Qumranet, Inc. + * Copyright (C) 2008 SGI + * Christoph Lameter + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ + +#include +#include +#include + +/* + * No synchronization. This function can only be called when only a single + * process remains that performs teardown. + */ +void __mmu_notifier_release(struct mm_struct *mm) +{ + struct mmu_notifier *mn; + + while (unlikely(!hlist_empty(&mm->mmu_notifier_list))) { + mn = hlist_entry(mm->mmu_notifier_list.first, + struct mmu_notifier, + hlist); + hlist_del(&mn->hlist); + if (mn->ops->release) + mn->ops->release(mn, mm); + } +} + +/* + * If no young bitflag is supported by the hardware, ->clear_flush_young can + * unmap the address and return 1 or 0 depending if the mapping previously + * existed or not. + */ +int __mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + int young = 0; + + hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) { + if (mn->ops->clear_flush_young) + young |= mn->ops->clear_flush_young(mn, mm, address); + } + + return young; +} + +void __mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + + hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) { + if (mn->ops->invalidate_page) + mn->ops->invalidate_page(mn, mm, address); + } +} + +void __mmu_notifier_invalidate_range_start(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + + hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) { + if (mn->ops->invalidate_range_start) + mn->ops->invalidate_range_start(mn, mm, start, end); + } +} + +void __mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + + hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) { + if (mn->ops->invalidate_range_end) + mn->ops->invalidate_range_end(mn, mm, start, end); + } +} + +/* + * Must not hold mmap_sem nor any other VM related lock when calling + * this registration function. + */ +void mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm) +{ + mm_lock(mm); + hlist_add_head(&mn->hlist, &mm->mmu_notifier_list); + mm_unlock(mm); +} +EXPORT_SYMBOL_GPL(mmu_notifier_register); diff --git a/mm/mprotect.c b/mm/mprotect.c --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include #include @@ -198,10 +199,12 @@ dirty_accountable = 1; } + mmu_notifier_invalidate_range_start(mm, start, end); if (is_vm_hugetlb_page(vma)) hugetlb_change_protection(vma, start, end, vma->vm_page_prot); else change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable); + mmu_notifier_invalidate_range_end(mm, start, end); vm_stat_account(mm, oldflags, vma->vm_file, -nrpages); vm_stat_account(mm, newflags, vma->vm_file, nrpages); return 0; diff --git a/mm/mremap.c b/mm/mremap.c --- a/mm/mremap.c +++ b/mm/mremap.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -74,7 +75,11 @@ struct mm_struct *mm = vma->vm_mm; pte_t *old_pte, *new_pte, pte; spinlock_t *old_ptl, *new_ptl; + unsigned long old_start; + old_start = old_addr; + mmu_notifier_invalidate_range_start(vma->vm_mm, + old_start, old_end); if (vma->vm_file) { /* * Subtle point from Rajesh Venkatasubramanian: before @@ -116,6 +121,7 @@ pte_unmap_unlock(old_pte - 1, old_ptl); if (mapping) spin_unlock(&mapping->i_mmap_lock); + mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end); } #define LATENCY_LIMIT (64 * PAGE_SIZE) diff --git a/mm/rmap.c b/mm/rmap.c --- a/mm/rmap.c +++ b/mm/rmap.c @@ -49,6 +49,7 @@ #include #include #include +#include #include @@ -287,7 +288,7 @@ if (vma->vm_flags & VM_LOCKED) { referenced++; *mapcount = 1; /* break early from loop */ - } else if (ptep_clear_flush_young(vma, address, pte)) + } else if (ptep_clear_flush_young_notify(vma, address, pte)) referenced++; /* Pretend the page is referenced if the task has the @@ -456,7 +457,7 @@ pte_t entry; flush_cache_page(vma, address, pte_pfn(*pte)); - entry = ptep_clear_flush(vma, address, pte); + entry = ptep_clear_flush_notify(vma, address, pte); entry = pte_wrprotect(entry); entry = pte_mkclean(entry); set_pte_at(mm, address, pte, entry); @@ -717,14 +718,14 @@ * skipped over this mm) then we should reactivate it. */ if (!migration && ((vma->vm_flags & VM_LOCKED) || - (ptep_clear_flush_young(vma, address, pte)))) { + (ptep_clear_flush_young_notify(vma, address, pte)))) { ret = SWAP_FAIL; goto out_unmap; } /* Nuke the page table entry. */ flush_cache_page(vma, address, page_to_pfn(page)); - pteval = ptep_clear_flush(vma, address, pte); + pteval = ptep_clear_flush_notify(vma, address, pte); /* Move the dirty bit to the physical page now the pte is gone. */ if (pte_dirty(pteval)) @@ -849,12 +850,12 @@ page = vm_normal_page(vma, address, *pte); BUG_ON(!page || PageAnon(page)); - if (ptep_clear_flush_young(vma, address, pte)) + if (ptep_clear_flush_young_notify(vma, address, pte)) continue; /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); - pteval = ptep_clear_flush(vma, address, pte); + pteval = ptep_clear_flush_notify(vma, address, pte); /* If nonlinear, store the file page offset in the pte. */ if (page->index != linear_page_index(vma, address)) From or.gerlitz at gmail.com Fri Apr 4 13:23:18 2008 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Fri, 4 Apr 2008 23:23:18 +0300 Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: References: <47F3C2EF.6010304@oracle.com> <47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com> <47F4F526.3060709@opengridcomputing.com> <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com> Message-ID: <15ddcffd0804041323v480b4e3fi7061526184ab26b5@mail.gmail.com> On Fri, Apr 4, 2008 at 7:06 PM, Roland Dreier wrote: > - Don't use IB-specific features (atomics, immediate data) and don't use RNRs as a means for HW based "flow control" mechanism. The current RDS implementation does not have a SW based flow control but rather does some sort of back pressure through SW based congestion management. I think that to some extent it relies on RNRs which don't exist under iWARP. Or. From or.gerlitz at gmail.com Fri Apr 4 13:25:32 2008 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Fri, 4 Apr 2008 23:25:32 +0300 Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: <47F63E33.5080709@opengridcomputing.com> References: <47F3C2EF.6010304@oracle.com> <47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com> <47F4F526.3060709@opengridcomputing.com> <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com> <47F63E33.5080709@opengridcomputing.com> Message-ID: <15ddcffd0804041325i17e8f620xaa1ec9ec823afd60@mail.gmail.com> On Fri, Apr 4, 2008 at 5:41 PM, Steve Wise wrote: > We won't be in Sonoma, but perhaps Jon can email some info to the list on > what we've done to-date for open mpi. This would be very much helpful, best if done before Monday so we can discuss there the RDS port with the maintainer. Jon - any chance you will be able to send something (even raw, sketch)? Or. From richard.frank at oracle.com Fri Apr 4 14:27:52 2008 From: richard.frank at oracle.com (Richard Frank) Date: Fri, 04 Apr 2008 16:27:52 -0500 Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: <15ddcffd0804041323v480b4e3fi7061526184ab26b5@mail.gmail.com> References: <47F3C2EF.6010304@oracle.com> <47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com> <47F4F526.3060709@opengridcomputing.com> <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com> <15ddcffd0804041323v480b4e3fi7061526184ab26b5@mail.gmail.com> Message-ID: <47F69D58.6040800@oracle.com> Hmmm - so what happens with IWARP NIC when no buffer is posted on recv q and a message arrives ? Or Gerlitz wrote: > On Fri, Apr 4, 2008 at 7:06 PM, Roland Dreier wrote: > >> - Don't use IB-specific features (atomics, immediate data) >> > > and don't use RNRs as a means for HW based "flow control" mechanism. > The current RDS implementation > does not have a SW based flow control but rather does some sort of > back pressure through SW based congestion > management. I think that to some extent it relies on RNRs which don't > exist under iWARP. > > Or. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From richard.frank at oracle.com Fri Apr 4 14:28:38 2008 From: richard.frank at oracle.com (Richard Frank) Date: Fri, 04 Apr 2008 16:28:38 -0500 Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: <15ddcffd0804041325i17e8f620xaa1ec9ec823afd60@mail.gmail.com> References: <47F3C2EF.6010304@oracle.com> <47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com> <47F4F526.3060709@opengridcomputing.com> <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com> <47F63E33.5080709@opengridcomputing.com> <15ddcffd0804041325i17e8f620xaa1ec9ec823afd60@mail.gmail.com> Message-ID: <47F69D86.9040407@oracle.com> How about a pointer to an IWARP spec - so we can sort out all the details.../ implications...to RDS. Or Gerlitz wrote: > On Fri, Apr 4, 2008 at 5:41 PM, Steve Wise wrote: > >> We won't be in Sonoma, but perhaps Jon can email some info to the list on >> what we've done to-date for open mpi. >> > > This would be very much helpful, best if done before Monday so we can > discuss there the RDS port with the maintainer. > Jon - any chance you will be able to send something (even raw, sketch)? > > Or. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From or.gerlitz at gmail.com Fri Apr 4 13:30:51 2008 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Fri, 4 Apr 2008 23:30:51 +0300 Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: <47F69D58.6040800@oracle.com> References: <47F3C2EF.6010304@oracle.com> <47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com> <47F4F526.3060709@opengridcomputing.com> <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com> <15ddcffd0804041323v480b4e3fi7061526184ab26b5@mail.gmail.com> <47F69D58.6040800@oracle.com> Message-ID: <15ddcffd0804041330h3df8497tc81776ebfd106a19@mail.gmail.com> On Sat, Apr 5, 2008 at 12:27 AM, Richard Frank wrote: > Hmmm - so what happens with IWARP NIC when no buffer is posted on recv q and > a message arrives ? I am quite sure the L2 ethernet HW just drops it, but you better verify this with an iWARP HW provider. Or. From weiny2 at llnl.gov Fri Apr 4 13:31:37 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Fri, 4 Apr 2008 13:31:37 -0700 Subject: [ofa-general] where to report bugs? In-Reply-To: <1207337068.1750.114.camel@pc.ilinx> References: <1207337068.1750.114.camel@pc.ilinx> Message-ID: <20080404133137.083027ae.weiny2@llnl.gov> On Fri, 04 Apr 2008 15:24:28 -0400 "Brian J. Murrell" wrote: > I'm wondering what the official mechanism is to report bugs? Just about > anything I'm going to find is likely to be limited to build and > installation bugs, like this one... > > In infiniband-diags-1.3.6/Makefile.am we have the line: > > INCLUDES = -I$(srcdir)/include -I$(includedir) -I$(includedir)/infiniband > > This is assuming that other OFED packages have been installed in the > general system $PREFIX, usually /usr as $includedir should > be /usr/include. > > But in particular, I have installed the opensm{,-devel} in an alternate > location (i.e. PREFIX) and the infiniband-diags build fails with: Are you specifying --prefix on the infiniband-diags configure? I think that should work. Ira > > if gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I/usr/include -I/usr/include/infiniband -I/home/brian/ofed_1.3_integration/tree/usr/include -Wall -I/home/brian/ofed_1.3_integration/tree/usr/include -O2 -g -fmessage-length=0 -D_FORTIFY_SOURCE=2 -MT src_ibnetdiscover-ibnetdiscover.o -MD -MP -MF ".deps/src_ibnetdiscover-ibnetdiscover.Tpo" -c -o src_ibnetdiscover-ibnetdiscover.o `test -f 'src/ibnetdiscover.c' || echo './'`src/ibnetdiscover.c; \ > then mv -f ".deps/src_ibnetdiscover-ibnetdiscover.Tpo" ".deps/src_ibnetdiscover-ibnetdiscover.Po"; else rm -f ".deps/src_ibnetdiscover-ibnetdiscover.Tpo"; exit 1; fi > In file included from src/ibnetdiscover.c:53: > /home/brian/ofed_1.3_integration/tree/usr/include/infiniband/complib/cl_nodenamemap.h:39:29: error: complib/cl_qmap.h: No such file or directory > In file included from src/ibnetdiscover.c:53: > /home/brian/ofed_1.3_integration/tree/usr/include/infiniband/complib/cl_nodenamemap.h:45: error: expected specifier-qualifier-list before ‘cl_map_item_t’ > /home/brian/ofed_1.3_integration/tree/usr/include/infiniband/complib/cl_nodenamemap.h:51: error: expected specifier-qualifier-list before ‘cl_qmap_t’ > make[1]: *** [src_ibnetdiscover-ibnetdiscover.o] Error 1 > make[1]: Leaving directory `/home/brian/rpm/BUILD/infiniband-diags-1.3.6' > > On my system, with opensm-devel (and all other OFED RPMs) installed in > an alternate PREFIX, the above list of include paths should be > s#/usr/include/infiniband#PREFIX/include/infiniband#. > > It seems probably infiniband-diags needs to have the same "--with-osm" > switch that ibutils has. > > b. > > From Brian.Murrell at Sun.COM Fri Apr 4 13:43:07 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Fri, 04 Apr 2008 16:43:07 -0400 Subject: [ofa-general] where to report bugs? In-Reply-To: <20080404133137.083027ae.weiny2@llnl.gov> References: <1207337068.1750.114.camel@pc.ilinx> <20080404133137.083027ae.weiny2@llnl.gov> Message-ID: <1207341787.1750.123.camel@pc.ilinx> On Fri, 2008-04-04 at 13:31 -0700, Ira Weiny wrote: > > Are you specifying --prefix on the infiniband-diags configure? Ahhh. That would have the undesired effect of relocating my infiniband-diags wherever I specify --prefix. This is not quite what I want. The ugly details are about to come out. The problem is that I am not setting a --prefix when I build any of the prerequisite packages (i.e. opensm, the libraries it depends on, etc.) as I want everything to actually have a /usr prefix, however for the purposes of building this stack from the downloadable package of what's basically SRPMs, I install the prerequisites into a temporary path. So I have a dir "./tree/" in which I use rpm2cpio < $rpm | cpio -id to roll the packages into and then point the various configure scripts to using various --with-* options. This method has worked so far for: SRPMS/libibcommon-1.0.8-1.ofed1.3 SRPMS/libibumad-1.1.7-1.ofed1.3 SRPMS/opensm-3.1.10-1.ofed1.3 SRPMS/ibutils-1.2-1.ofed1.3 SRPMS/libibmad-1.1.6-1.ofed1.3 The overall problem is that I cannot taint my pristine build environment by going along the normal process of "build rpm, install it, build next rpm, install it, etc.", so I have to install prerequisite RPMs into a sandbox and point subsequent users (in the build process) of it into the sandbox. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From akepner at sgi.com Fri Apr 4 13:47:58 2008 From: akepner at sgi.com (akepner at sgi.com) Date: Fri, 4 Apr 2008 13:47:58 -0700 Subject: [ofa-general] ofed works on kernels with 64Kbyte pages? Message-ID: <20080404204758.GU29410@sgi.com> I know it's a long shot, but has anyone tried using OFED on a kernel with 64Kbyte pages? SGI would like to support that, but I've gotten reports that something is not working (e.g., "ib_rdma_bw" doesn't work on an ia64 kernel with 64Kb pages). This is with the mthca driver, fwiw. Unfortunately a conspiracy of h/w prevents me from reproducing this right now, so I don't have more details. But I'd be very curious to know if anyone can verify that OFED does/doesn't work with 64Kbyte pages. -- Arthur From rdreier at cisco.com Fri Apr 4 13:55:11 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 13:55:11 -0700 Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: <47F69D86.9040407@oracle.com> (Richard Frank's message of "Fri, 04 Apr 2008 16:28:38 -0500") References: <47F3C2EF.6010304@oracle.com> <47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com> <47F4F526.3060709@opengridcomputing.com> <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com> <47F63E33.5080709@opengridcomputing.com> <15ddcffd0804041325i17e8f620xaa1ec9ec823afd60@mail.gmail.com> <47F69D86.9040407@oracle.com> Message-ID: > How about a pointer to an IWARP spec - so we can sort out all the > details.../ implications...to RDS. www.rdmaconsortium.org has most of it... the verbs are at: http://www.rdmaconsortium.org/home/draft-hilland-iwarp-verbs-v1.0-RDMAC.pdf the iWARP RDMA protocol is RFC 5040 et al: http://www.ietf.org/rfc/rfc5040.txt (the next few RFCs have lower-level details) From rdreier at cisco.com Fri Apr 4 14:02:03 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 14:02:03 -0700 Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: <15ddcffd0804041330h3df8497tc81776ebfd106a19@mail.gmail.com> (Or Gerlitz's message of "Fri, 4 Apr 2008 23:30:51 +0300") References: <47F3C2EF.6010304@oracle.com> <47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com> <47F4F526.3060709@opengridcomputing.com> <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com> <15ddcffd0804041323v480b4e3fi7061526184ab26b5@mail.gmail.com> <47F69D58.6040800@oracle.com> <15ddcffd0804041330h3df8497tc81776ebfd106a19@mail.gmail.com> Message-ID: > > Hmmm - so what happens with IWARP NIC when no buffer is posted on recv q and > > a message arrives ? > > I am quite sure the L2 ethernet HW just drops it, but you better > verify this with an iWARP HW provider. Why would it be dropped at L2? What I believe will happen is that it will generate an error at the DDP layer that will probably result in the connection being closed. Section 7.1 of RFC 5041 says: For non-zero-length Untagged DDP Segments, the DDP Segment MUST be validated before Placement by verifying: ["untagged DDP segments" are incoming send data, as vs. "tagged" RDMA operations] 2. The QN and MSN have an associated buffer that allows Placement of the payload. Implementers' note: DDP implementations SHOULD consider lack of an associated buffer as a system fault. DDP implementations MAY try to recover from the system fault using LLP means in a ULP- transparent way. DDP implementations SHOULD NOT permit system faults to occur repeatedly or frequently. If there is not an associated buffer, DDP implementations MAY choose to disable the stream for the reception and report an error to the ULP at the Data Sink. From rdreier at cisco.com Fri Apr 4 14:03:55 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 14:03:55 -0700 Subject: [ofa-general] ofed works on kernels with 64Kbyte pages? In-Reply-To: <20080404204758.GU29410@sgi.com> (akepner@sgi.com's message of "Fri, 4 Apr 2008 13:47:58 -0700") References: <20080404204758.GU29410@sgi.com> Message-ID: > I know it's a long shot, but has anyone tried using OFED on > a kernel with 64Kbyte pages? > > SGI would like to support that, but I've gotten reports that > something is not working (e.g., "ib_rdma_bw" doesn't work on > an ia64 kernel with 64Kb pages). This is with the mthca driver, > fwiw. > > Unfortunately a conspiracy of h/w prevents me from reproducing > this right now, so I don't have more details. But I'd be very > curious to know if anyone can verify that OFED does/doesn't > work with 64Kbyte pages. I don't know about OFED, but I've tried various things on 64KB PAGE_SIZE systems and it seems to work. It wouldn't surprise me if there are issues since the drivers and firmware gets a lot less testing in such situations but it "should work" -- I'd be happy to help debug if anyone has concrete problems. - R. From weiny2 at llnl.gov Fri Apr 4 14:06:46 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Fri, 4 Apr 2008 14:06:46 -0700 Subject: [ofa-general] where to report bugs? In-Reply-To: <1207341787.1750.123.camel@pc.ilinx> References: <1207337068.1750.114.camel@pc.ilinx> <20080404133137.083027ae.weiny2@llnl.gov> <1207341787.1750.123.camel@pc.ilinx> Message-ID: <20080404140646.05387839.weiny2@llnl.gov> On Fri, 04 Apr 2008 16:43:07 -0400 "Brian J. Murrell" wrote: > On Fri, 2008-04-04 at 13:31 -0700, Ira Weiny wrote: > > > > Are you specifying --prefix on the infiniband-diags configure? > > Ahhh. That would have the undesired effect of relocating my > infiniband-diags wherever I specify --prefix. This is not quite what I > want. > > The ugly details are about to come out. > > The problem is that I am not setting a --prefix when I build any of the > prerequisite packages (i.e. opensm, the libraries it depends on, etc.) > as I want everything to actually have a /usr prefix, however for the > purposes of building this stack from the downloadable package of what's > basically SRPMs, I install the prerequisites into a temporary path. > > So I have a dir "./tree/" in which I use rpm2cpio < $rpm | cpio -id to > roll the packages into and then point the various configure scripts to > using various --with-* options. This method has worked so far for: > > SRPMS/libibcommon-1.0.8-1.ofed1.3 > SRPMS/libibumad-1.1.7-1.ofed1.3 > SRPMS/opensm-3.1.10-1.ofed1.3 > SRPMS/ibutils-1.2-1.ofed1.3 > SRPMS/libibmad-1.1.6-1.ofed1.3 > > The overall problem is that I cannot taint my pristine build environment > by going along the normal process of "build rpm, install it, build next > rpm, install it, etc.", so I have to install prerequisite RPMs into a > sandbox and point subsequent users (in the build process) of it into the > sandbox. > So I guess you want something like: export CPPFLAGS="-I/include" Before you do the configure and build? Ira From rdreier at cisco.com Fri Apr 4 14:12:11 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 14:12:11 -0700 Subject: [ofa-general] Re: [PATCH 17/20] IB/ipath - user mode send DMA In-Reply-To: <20080402225028.28598.648.stgit@eng-46.mv.qlogic.com> (Ralph Campbell's message of "Wed, 02 Apr 2008 15:50:28 -0700") References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> <20080402225028.28598.648.stgit@eng-46.mv.qlogic.com> Message-ID: By the way... > +int ipath_user_sdma_pkt_sent(const struct ipath_user_sdma_queue *pq, > + u32 counter) > +{ > + const u32 scounter = ipath_user_sdma_complete_counter(pq); > + const s32 dcounter = scounter - counter; > + > + return dcounter >= 0; > +} I don't see this called anywhere... should I just delete it? From Brian.Murrell at Sun.COM Fri Apr 4 14:13:42 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Fri, 04 Apr 2008 17:13:42 -0400 Subject: [ofa-general] where to report bugs? In-Reply-To: <20080404140646.05387839.weiny2@llnl.gov> References: <1207337068.1750.114.camel@pc.ilinx> <20080404133137.083027ae.weiny2@llnl.gov> <1207341787.1750.123.camel@pc.ilinx> <20080404140646.05387839.weiny2@llnl.gov> Message-ID: <1207343622.1750.128.camel@pc.ilinx> On Fri, 2008-04-04 at 14:06 -0700, Ira Weiny wrote: > So I guess you want something like: > > export CPPFLAGS="-I/include" CPPFLAGS or CFLAGS? I could see it being the former but I used the latter. > > Before you do the configure and build? That is in effect exactly what I did to deal with this issue. I just didn't find it very elegant. But if that is how the package is meant to operate, that is fine. If it were CFLAGS you were promoting the setting of I would be a bit more sticky because RPM wants to have the CFLAGS for it's own use: $ rpm --eval="%configure" CFLAGS="${CFLAGS:--O2 -g -fmessage-length=0 -D_FORTIFY_SOURCE=2}" ; export CFLAGS ; CXXFLAGS="${CXXFLAGS:--O2 -g -fmessage-length=0 -D_FORTIFY_SOURCE=2}" ; export CXXFLAGS ; FFLAGS="${FFLAGS:--O2 -g -fmessage-length=0 -D_FORTIFY_SOURCE=2}" ; export FFLAGS ; ./configure --host=x86_64-suse-linux --build=x86_64-suse-linux \ --target=x86_64-suse-linux \ --program-prefix= \ ... And while, yes, you can override CFLAGS and the %configure macro will use it, I'd rather defer the CFLAGS to whatever the vendor has put into the RPM macros file(s). b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From rdreier at cisco.com Fri Apr 4 14:15:01 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 14:15:01 -0700 Subject: [ofa-general] Re: [PATCH 19/20] IB/ipath - add calls to new 7220 code and enable in build In-Reply-To: <20080402225038.28598.43308.stgit@eng-46.mv.qlogic.com> (Ralph Campbell's message of "Wed, 02 Apr 2008 15:50:38 -0700") References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> <20080402225038.28598.43308.stgit@eng-46.mv.qlogic.com> Message-ID: > +enum ib_rate ipath_mult_to_ib_rate(unsigned mult) > +{ > + switch (mult) { > + case 8: return IB_RATE_2_5_GBPS; > + case 4: return IB_RATE_5_GBPS; > + case 2: return IB_RATE_10_GBPS; > + case 1: return IB_RATE_20_GBPS; > + default: return IB_RATE_PORT_CURRENT; > + } > +} Looks suspiciously like a copy of the existing mult_to_ib_rate() except it handles fewer cases... is there a reason to copy this? - R. From rdreier at cisco.com Fri Apr 4 14:16:14 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 14:16:14 -0700 Subject: [ofa-general] Re: [PATCH 17/20] IB/ipath - user mode send DMA In-Reply-To: <20080402225028.28598.648.stgit@eng-46.mv.qlogic.com> (Ralph Campbell's message of "Wed, 02 Apr 2008 15:50:28 -0700") References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> <20080402225028.28598.648.stgit@eng-46.mv.qlogic.com> Message-ID: > +void ipath_user_sdma_set_complete_counter(struct ipath_user_sdma_queue *pq, > + u32 c) > +{ > + pq->sent_counter = c; > +} This is only used in one file... OK to make it static? From rdreier at cisco.com Fri Apr 4 14:21:30 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 14:21:30 -0700 Subject: [ofa-general] Re: [PATCH 1/1 v1] MLX4: Added resize_cq capability. In-Reply-To: <47F0A5A5.2010208@dev.mellanox.co.il> (Vladimir Sokolovsky's message of "Mon, 31 Mar 2008 11:49:41 +0300") References: <47E923CA.90804@dev.mellanox.co.il> <47F0A5A5.2010208@dev.mellanox.co.il> Message-ID: Thanks, I applied this with a lot of changes. Some comments: > entries = roundup_pow_of_two(entries + 1); your patch was corrupted in a very strange way... the context lines had two spaces instead of one at the beginning. I just deleted the extra space by hand. > + err = mlx4_alloc_cq_buf(dev, &cq->resize_buf->buf, entries); > + if (err) { > + spin_lock_irq(&cq->lock); > + kfree(cq->resize_buf); > + cq->resize_buf = NULL; > + spin_unlock_irq(&cq->lock); > + goto out; > + } > +err_buf: > + if (cq->resize_buf) { > + if (!ibcq->uobject) > + mlx4_free_cq_buf(dev, &cq->resize_buf->buf, > + cq->resize_buf->cqe); > + > + spin_lock_irq(&cq->lock); > + kfree(cq->resize_buf); > + cq->resize_buf = NULL; > + spin_unlock_irq(&cq->lock); > + } Why do we need the spinlock in these places? There's no way for this to race with mlx4_ib_poll_one() is there, since that should never see the RESIZE CQE? (If there is such a race, then we're in trouble even with the lock, since we're aborting the resize, and the poll code shouldn't swap the buffers) Also I got rid of the duplicated code to allocate buffers and get userspace buffers, so that the allocate and resize paths use the same code. And I cleaned up some other stuff. So please review/test my work to make sure I didn't break your patch... --- drivers/infiniband/hw/mlx4/cq.c | 292 ++++++++++++++++++++++++++++++---- drivers/infiniband/hw/mlx4/main.c | 2 + drivers/infiniband/hw/mlx4/mlx4_ib.h | 9 + drivers/net/mlx4/cq.c | 28 ++++ include/linux/mlx4/cq.h | 2 + 5 files changed, 300 insertions(+), 33 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index e4fb64b..3557e7e 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -93,6 +93,74 @@ int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period) return mlx4_cq_modify(dev->dev, &mcq->mcq, cq_count, cq_period); } +static int mlx4_ib_alloc_cq_buf(struct mlx4_ib_dev *dev, struct mlx4_ib_cq_buf *buf, int nent) +{ + int err; + + err = mlx4_buf_alloc(dev->dev, nent * sizeof(struct mlx4_cqe), + PAGE_SIZE * 2, &buf->buf); + + if (err) + goto out; + + err = mlx4_mtt_init(dev->dev, buf->buf.npages, buf->buf.page_shift, + &buf->mtt); + if (err) + goto err_buf; + + err = mlx4_buf_write_mtt(dev->dev, &buf->mtt, &buf->buf); + if (err) + goto err_mtt; + + return 0; + +err_mtt: + mlx4_mtt_cleanup(dev->dev, &buf->mtt); + +err_buf: + mlx4_buf_free(dev->dev, nent * sizeof(struct mlx4_cqe), + &buf->buf); + +out: + return err; +} + +static void mlx4_ib_free_cq_buf(struct mlx4_ib_dev *dev, struct mlx4_ib_cq_buf *buf, int cqe) +{ + mlx4_buf_free(dev->dev, (cqe + 1) * sizeof(struct mlx4_cqe), &buf->buf); +} + +static int mlx4_ib_get_cq_umem(struct mlx4_ib_dev *dev, struct ib_ucontext *context, + struct mlx4_ib_cq_buf *buf, struct ib_umem **umem, + u64 buf_addr, int cqe) +{ + int err; + + *umem = ib_umem_get(context, buf_addr, cqe * sizeof (struct mlx4_cqe), + IB_ACCESS_LOCAL_WRITE); + if (IS_ERR(*umem)) + return PTR_ERR(*umem); + + err = mlx4_mtt_init(dev->dev, ib_umem_page_count(*umem), + ilog2((*umem)->page_size), &buf->mtt); + if (err) + goto err_buf; + + err = mlx4_ib_umem_write_mtt(dev, &buf->mtt, *umem); + if (err) + goto err_mtt; + + return 0; + +err_mtt: + mlx4_mtt_cleanup(dev->dev, &buf->mtt); + +err_buf: + ib_umem_release(*umem); + + return err; +} + struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector, struct ib_ucontext *context, struct ib_udata *udata) @@ -100,7 +168,6 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector struct mlx4_ib_dev *dev = to_mdev(ibdev); struct mlx4_ib_cq *cq; struct mlx4_uar *uar; - int buf_size; int err; if (entries < 1 || entries > dev->dev->caps.max_cqes) @@ -112,8 +179,10 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector entries = roundup_pow_of_two(entries + 1); cq->ibcq.cqe = entries - 1; - buf_size = entries * sizeof (struct mlx4_cqe); + mutex_init(&cq->resize_mutex); spin_lock_init(&cq->lock); + cq->resize_buf = NULL; + cq->resize_umem = NULL; if (context) { struct mlx4_ib_create_cq ucmd; @@ -123,21 +192,10 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector goto err_cq; } - cq->umem = ib_umem_get(context, ucmd.buf_addr, buf_size, - IB_ACCESS_LOCAL_WRITE); - if (IS_ERR(cq->umem)) { - err = PTR_ERR(cq->umem); - goto err_cq; - } - - err = mlx4_mtt_init(dev->dev, ib_umem_page_count(cq->umem), - ilog2(cq->umem->page_size), &cq->buf.mtt); + err = mlx4_ib_get_cq_umem(dev, context, &cq->buf, &cq->umem, + ucmd.buf_addr, entries); if (err) - goto err_buf; - - err = mlx4_ib_umem_write_mtt(dev, &cq->buf.mtt, cq->umem); - if (err) - goto err_mtt; + goto err_cq; err = mlx4_ib_db_map_user(to_mucontext(context), ucmd.db_addr, &cq->db); @@ -155,19 +213,9 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector *cq->mcq.set_ci_db = 0; *cq->mcq.arm_db = 0; - if (mlx4_buf_alloc(dev->dev, buf_size, PAGE_SIZE * 2, &cq->buf.buf)) { - err = -ENOMEM; - goto err_db; - } - - err = mlx4_mtt_init(dev->dev, cq->buf.buf.npages, cq->buf.buf.page_shift, - &cq->buf.mtt); + err = mlx4_ib_alloc_cq_buf(dev, &cq->buf, entries); if (err) - goto err_buf; - - err = mlx4_buf_write_mtt(dev->dev, &cq->buf.mtt, &cq->buf.buf); - if (err) - goto err_mtt; + goto err_db; uar = &dev->priv_uar; } @@ -195,12 +243,10 @@ err_dbmap: err_mtt: mlx4_mtt_cleanup(dev->dev, &cq->buf.mtt); -err_buf: if (context) ib_umem_release(cq->umem); else - mlx4_buf_free(dev->dev, entries * sizeof (struct mlx4_cqe), - &cq->buf.buf); + mlx4_ib_free_cq_buf(dev, &cq->buf, entries); err_db: if (!context) @@ -212,6 +258,170 @@ err_cq: return ERR_PTR(err); } +static int mlx4_alloc_resize_buf(struct mlx4_ib_dev *dev, struct mlx4_ib_cq *cq, + int entries) +{ + int err; + + if (cq->resize_buf) + return -EBUSY; + + cq->resize_buf = kmalloc(sizeof *cq->resize_buf, GFP_ATOMIC); + if (!cq->resize_buf) + return -ENOMEM; + + err = mlx4_ib_alloc_cq_buf(dev, &cq->resize_buf->buf, entries); + if (err) { + kfree(cq->resize_buf); + cq->resize_buf = NULL; + return err; + } + + cq->resize_buf->cqe = entries - 1; + + return 0; +} + +static int mlx4_alloc_resize_umem(struct mlx4_ib_dev *dev, struct mlx4_ib_cq *cq, + int entries, struct ib_udata *udata) +{ + struct mlx4_ib_resize_cq ucmd; + int err; + + if (cq->resize_umem) + return -EBUSY; + + if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd)) + return -EFAULT; + + cq->resize_buf = kmalloc(sizeof *cq->resize_buf, GFP_ATOMIC); + if (!cq->resize_buf) + return -ENOMEM; + + err = mlx4_ib_get_cq_umem(dev, cq->umem->context, &cq->resize_buf->buf, + &cq->resize_umem, ucmd.buf_addr, entries); + if (err) { + kfree(cq->resize_buf); + cq->resize_buf = NULL; + return err; + } + + cq->resize_buf->cqe = entries - 1; + + return 0; +} + +static int mlx4_ib_get_outstanding_cqes(struct mlx4_ib_cq *cq) +{ + u32 i; + + i = cq->mcq.cons_index; + while (get_sw_cqe(cq, i & cq->ibcq.cqe)) + ++i; + + return i - cq->mcq.cons_index; +} + +static void mlx4_ib_cq_resize_copy_cqes(struct mlx4_ib_cq *cq) +{ + struct mlx4_cqe *cqe; + int i; + + i = cq->mcq.cons_index; + cqe = get_cqe(cq, i & cq->ibcq.cqe); + while ((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) != MLX4_CQE_OPCODE_RESIZE) { + memcpy(get_cqe_from_buf(&cq->resize_buf->buf, + (i + 1) & cq->resize_buf->cqe), + get_cqe(cq, i & cq->ibcq.cqe), sizeof(struct mlx4_cqe)); + cqe = get_cqe(cq, ++i & cq->ibcq.cqe); + } + ++cq->mcq.cons_index; +} + +int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata) +{ + struct mlx4_ib_dev *dev = to_mdev(ibcq->device); + struct mlx4_ib_cq *cq = to_mcq(ibcq); + int outst_cqe; + int err; + + mutex_lock(&cq->resize_mutex); + + if (entries < 1 || entries > dev->dev->caps.max_cqes) { + err = -EINVAL; + goto out; + } + + entries = roundup_pow_of_two(entries + 1); + if (entries == ibcq->cqe + 1) { + err = 0; + goto out; + } + + if (ibcq->uobject) { + err = mlx4_alloc_resize_umem(dev, cq, entries, udata); + if (err) + goto out; + } else { + /* Can't be smaller then the number of outstanding CQEs */ + outst_cqe = mlx4_ib_get_outstanding_cqes(cq); + if (entries < outst_cqe + 1) { + err = 0; + goto out; + } + + err = mlx4_alloc_resize_buf(dev, cq, entries); + if (err) + goto out; + } + + err = mlx4_cq_resize(dev->dev, &cq->mcq, entries, &cq->resize_buf->buf.mtt); + if (err) + goto err_buf; + + if (ibcq->uobject) { + cq->buf = cq->resize_buf->buf; + cq->ibcq.cqe = cq->resize_buf->cqe; + ib_umem_release(cq->umem); + cq->umem = cq->resize_umem; + + kfree(cq->resize_buf); + cq->resize_buf = NULL; + cq->resize_umem = NULL; + } else { + spin_lock_irq(&cq->lock); + if (cq->resize_buf) { + mlx4_ib_cq_resize_copy_cqes(cq); + mlx4_ib_free_cq_buf(dev, &cq->buf, cq->ibcq.cqe); + cq->buf = cq->resize_buf->buf; + cq->ibcq.cqe = cq->resize_buf->cqe; + + kfree(cq->resize_buf); + cq->resize_buf = NULL; + } + spin_unlock_irq(&cq->lock); + } + + goto out; + +err_buf: + if (!ibcq->uobject) + mlx4_ib_free_cq_buf(dev, &cq->resize_buf->buf, + cq->resize_buf->cqe); + + kfree(cq->resize_buf); + cq->resize_buf = NULL; + + if (cq->resize_umem) { + ib_umem_release(cq->resize_umem); + cq->resize_umem = NULL; + } + +out: + mutex_unlock(&cq->resize_mutex); + return err; +} + int mlx4_ib_destroy_cq(struct ib_cq *cq) { struct mlx4_ib_dev *dev = to_mdev(cq->device); @@ -224,8 +434,7 @@ int mlx4_ib_destroy_cq(struct ib_cq *cq) mlx4_ib_db_unmap_user(to_mucontext(cq->uobject->context), &mcq->db); ib_umem_release(mcq->umem); } else { - mlx4_buf_free(dev->dev, (cq->cqe + 1) * sizeof (struct mlx4_cqe), - &mcq->buf.buf); + mlx4_ib_free_cq_buf(dev, &mcq->buf, cq->cqe + 1); mlx4_ib_db_free(dev, &mcq->db); } @@ -332,6 +541,7 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq, u32 g_mlpath_rqpn; u16 wqe_ctr; +repoll: cqe = next_cqe_sw(cq); if (!cqe) return -EAGAIN; @@ -354,6 +564,22 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq, return -EINVAL; } + /* Resize CQ in progress */ + if (unlikely((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) == MLX4_CQE_OPCODE_RESIZE)) { + if (cq->resize_buf) { + struct mlx4_ib_dev *dev = to_mdev(cq->ibcq.device); + + mlx4_ib_free_cq_buf(dev, &cq->buf, cq->ibcq.cqe); + cq->buf = cq->resize_buf->buf; + cq->ibcq.cqe = cq->resize_buf->cqe; + + kfree(cq->resize_buf); + cq->resize_buf = NULL; + } + + goto repoll; + } + if (!*cur_qp || (be32_to_cpu(cqe->my_qpn) & 0xffffff) != (*cur_qp)->mqp.qpn) { /* diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 76dd45c..57885cd 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -571,6 +571,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) (1ull << IB_USER_VERBS_CMD_DEREG_MR) | (1ull << IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL) | (1ull << IB_USER_VERBS_CMD_CREATE_CQ) | + (1ull << IB_USER_VERBS_CMD_RESIZE_CQ) | (1ull << IB_USER_VERBS_CMD_DESTROY_CQ) | (1ull << IB_USER_VERBS_CMD_CREATE_QP) | (1ull << IB_USER_VERBS_CMD_MODIFY_QP) | @@ -610,6 +611,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) ibdev->ib_dev.post_recv = mlx4_ib_post_recv; ibdev->ib_dev.create_cq = mlx4_ib_create_cq; ibdev->ib_dev.modify_cq = mlx4_ib_modify_cq; + ibdev->ib_dev.resize_cq = mlx4_ib_resize_cq; ibdev->ib_dev.destroy_cq = mlx4_ib_destroy_cq; ibdev->ib_dev.poll_cq = mlx4_ib_poll_cq; ibdev->ib_dev.req_notify_cq = mlx4_ib_arm_cq; diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index ef8ad96..9e63732 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -78,13 +78,21 @@ struct mlx4_ib_cq_buf { struct mlx4_mtt mtt; }; +struct mlx4_ib_cq_resize { + struct mlx4_ib_cq_buf buf; + int cqe; +}; + struct mlx4_ib_cq { struct ib_cq ibcq; struct mlx4_cq mcq; struct mlx4_ib_cq_buf buf; + struct mlx4_ib_cq_resize *resize_buf; struct mlx4_ib_db db; spinlock_t lock; + struct mutex resize_mutex; struct ib_umem *umem; + struct ib_umem *resize_umem; }; struct mlx4_ib_mr { @@ -255,6 +263,7 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, int mlx4_ib_dereg_mr(struct ib_mr *mr); int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period); +int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata); struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector, struct ib_ucontext *context, struct ib_udata *udata); diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c index 8c31434..caa5bcf 100644 --- a/drivers/net/mlx4/cq.c +++ b/drivers/net/mlx4/cq.c @@ -159,6 +159,34 @@ int mlx4_cq_modify(struct mlx4_dev *dev, struct mlx4_cq *cq, } EXPORT_SYMBOL_GPL(mlx4_cq_modify); +int mlx4_cq_resize(struct mlx4_dev *dev, struct mlx4_cq *cq, + int entries, struct mlx4_mtt *mtt) +{ + struct mlx4_cmd_mailbox *mailbox; + struct mlx4_cq_context *cq_context; + u64 mtt_addr; + int err; + + mailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(mailbox)) + return PTR_ERR(mailbox); + + cq_context = mailbox->buf; + memset(cq_context, 0, sizeof *cq_context); + + cq_context->logsize_usrpage = cpu_to_be32(ilog2(entries) << 24); + cq_context->log_page_size = mtt->page_shift - 12; + mtt_addr = mlx4_mtt_addr(dev, mtt); + cq_context->mtt_base_addr_h = mtt_addr >> 32; + cq_context->mtt_base_addr_l = cpu_to_be32(mtt_addr & 0xffffffff); + + err = mlx4_MODIFY_CQ(dev, mailbox, cq->cqn, 1); + + mlx4_free_cmd_mailbox(dev, mailbox); + return err; +} +EXPORT_SYMBOL_GPL(mlx4_cq_resize); + int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq) { diff --git a/include/linux/mlx4/cq.h b/include/linux/mlx4/cq.h index f7c3511..071cf96 100644 --- a/include/linux/mlx4/cq.h +++ b/include/linux/mlx4/cq.h @@ -132,5 +132,7 @@ enum { int mlx4_cq_modify(struct mlx4_dev *dev, struct mlx4_cq *cq, u16 count, u16 period); +int mlx4_cq_resize(struct mlx4_dev *dev, struct mlx4_cq *cq, + int entries, struct mlx4_mtt *mtt); #endif /* MLX4_CQ_H */ -- 1.5.4.5 From richard.frank at oracle.com Fri Apr 4 15:21:59 2008 From: richard.frank at oracle.com (Richard Frank) Date: Fri, 04 Apr 2008 17:21:59 -0500 Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's in infiniband.git) In-Reply-To: References: <47F37CA4.8000109@mellanox.co.il> <47F68EDC.4050107@oracle.com> Message-ID: <47F6AA07.70706@oracle.com> Roland Dreier wrote: > > We are very interested in these new operations and are moving in the > > direction of tightly integrating RDMA along with atomics (if > > available) into Oracle. We plan on testing some early prototypes of > > the these in the few months. > > And you need the ConnectX-only masked atomics? Or do the standard IB > atomic operations work for you? Of course using atomics at all means > that things don't work on iWARP. > > We specifically asked for the masked operations. Yes, this means Oracle will not get the performance boost of atomics on IWARP - but we still get rdma - and that's a real win / benefit for Oracle today - and more so over the next few months. > > Send with invalidate is an exact match for our current RDS V3 rdma > > driver - and should be more efficient than the current background > > syncing of the tpt to ensure keys are invalidated. > > How does send with invalidate interact with the current IB FMR stuff? > Seems that you would run into trouble keeping the state of the FMR > straight if the remote side is invalidating them. > > The model we implement is based on "use once" keys - we issue the key to the rdma server and want to toss it as soon as the rdma is complete. Today, we explicitly free the key after the rdma completes and we get a message from the rdma server - saying rdma is complete. If the key is auto invalidated by the recv'ing HCA then we do not need to do it in the driver... which also meanswe do not need to issue the sync tpts to force the HCA to be update its cache. At least this is how I think it works - Olaf is the divine source here. > Also I would think that send-with-invalidate would be much more > expensive than the current FMR method of batching up the invalidates, > since you don't get to amortize the cost of syncing up all the internal > HCA state. > > This is the one piece we do not know - our plans are to test this and see where the trade offs are. We will keep the current design / implementation to run over NICs that do not support send-with-invalidate. > - R. > From ralph.campbell at qlogic.com Fri Apr 4 14:30:51 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Fri, 04 Apr 2008 14:30:51 -0700 Subject: [ofa-general] Re: [PATCH 17/20] IB/ipath - user mode send DMA In-Reply-To: References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> <20080402225028.28598.648.stgit@eng-46.mv.qlogic.com> Message-ID: <1207344651.8715.14.camel@brick.pathscale.com> On Fri, 2008-04-04 at 14:12 -0700, Roland Dreier wrote: > By the way... > > > +int ipath_user_sdma_pkt_sent(const struct ipath_user_sdma_queue *pq, > > + u32 counter) > > +{ > > + const u32 scounter = ipath_user_sdma_complete_counter(pq); > > + const s32 dcounter = scounter - counter; > > + > > + return dcounter >= 0; > > +} > > I don't see this called anywhere... should I just delete it? Yes. You can remove it. From ralph.campbell at qlogic.com Fri Apr 4 14:42:24 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Fri, 04 Apr 2008 14:42:24 -0700 Subject: [ofa-general] Re: [PATCH 19/20] IB/ipath - add calls to new 7220 code and enable in build In-Reply-To: References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> <20080402225038.28598.43308.stgit@eng-46.mv.qlogic.com> Message-ID: <1207345344.8715.21.camel@brick.pathscale.com> On Fri, 2008-04-04 at 14:15 -0700, Roland Dreier wrote: > > +enum ib_rate ipath_mult_to_ib_rate(unsigned mult) > > +{ > > + switch (mult) { > > + case 8: return IB_RATE_2_5_GBPS; > > + case 4: return IB_RATE_5_GBPS; > > + case 2: return IB_RATE_10_GBPS; > > + case 1: return IB_RATE_20_GBPS; > > + default: return IB_RATE_PORT_CURRENT; > > + } > > +} > > Looks suspiciously like a copy of the existing mult_to_ib_rate() except > it handles fewer cases... is there a reason to copy this? > > - R. It looks similar but the values are reversed. This is converting the ib_rate enum to a multiplier of the DDR clock rate which is used as a counter to delay packets. So IB_RATE_2_5_GBPS is 8 times slower than IB_RATE_20_GBPS. The standard functions map the enum to a multiplier of the slowest rate so IB_RATE_2_5_GBPS is one. If I used the standard functions, I would still need a lookup table to map 8->1, 1->8, etc. From ralph.campbell at qlogic.com Fri Apr 4 14:44:03 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Fri, 04 Apr 2008 14:44:03 -0700 Subject: [ofa-general] Re: [PATCH 17/20] IB/ipath - user mode send DMA In-Reply-To: References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> <20080402225028.28598.648.stgit@eng-46.mv.qlogic.com> Message-ID: <1207345443.8715.23.camel@brick.pathscale.com> On Fri, 2008-04-04 at 14:16 -0700, Roland Dreier wrote: > > +void ipath_user_sdma_set_complete_counter(struct ipath_user_sdma_queue *pq, > > + u32 c) > > +{ > > + pq->sent_counter = c; > > +} > > This is only used in one file... OK to make it static? Yes, thanks. From bs at q-leap.de Fri Apr 4 14:45:54 2008 From: bs at q-leap.de (Bernd Schubert) Date: Fri, 4 Apr 2008 23:45:54 +0200 Subject: [ofa-general] ERR 0108: Unknown remote side In-Reply-To: <1207331721.15625.76.camel@hrosenstock-ws.xsigo.com> References: <200804041147.27565.bs@q-leap.de> <1207331721.15625.76.camel@hrosenstock-ws.xsigo.com> Message-ID: <20080404214553.GA15927@lanczos.q-leap.de> On Fri, Apr 04, 2008 at 10:55:21AM -0700, Hal Rosenstock wrote: > On Fri, 2008-04-04 at 11:47 +0200, Bernd Schubert wrote: > > Hello, > > > > opensm-3.2.1 logs some error messages like this: > > > > Apr 04 00:00:08 325114 [4580A960] 0x01 -> __osm_state_mgr_light_sweep_start: > > ERR 0108: Unknown remote side for node 0 > > x000b8cffff002ba2(SW_pfs1_leaf4) port 13. Adding to light sweep sampling list > > Apr 04 00:00:08 325126 [4580A960] 0x01 -> Directed Path Dump of 3 hop path: > > Path = 0,1,14,13 > > > > > > From ibnetdiscover output I see port13 of this switch is a switch-interconnect > > (sorry, I don't know what the correct name/identifier for switches within > > switches): > > > > [13] "S-000b8cffff002bfa"[13] # "SW_pfs1_inter7" lid 263 > > 4xSDR > > > > > > Apr 04 00:00:08 325219 [4580A960] 0x01 -> __osm_state_mgr_light_sweep_start: > > ERR 0108: Unknown remote side for node 0 > > x000b8cffff002bf9(SW_pfs1_inter6) port 9. Adding to light sweep sampling list > > Apr 04 00:00:08 325234 [4580A960] 0x01 -> Directed Path Dump of 2 hop path: > > Path = 0,1,18 > > > > This is again an interconnection: > > > > [9] "S-000b8cffff002b9e"[15] # "SW_pfs1_leaf1" lid 177 > > 4xDDR > > > > > > Apr 04 00:00:08 325288 [4580A960] 0x01 -> __osm_state_mgr_light_sweep_start: > > ERR 0108: Unknown remote side for node 0 > > x000b8cffff002bfa(SW_pfs1_inter7) port 13. Adding to light sweep sampling list > > Apr 04 00:00:08 325301 [4580A960] 0x01 -> Directed Path Dump of 2 hop path: > > Path = 0,1,14 > > > > > > And again an interconnection: > > > > [13] "S-000b8cffff002ba2"[13] # "SW_pfs1_leaf4" lid 182 > > 4xDDR > > > > > > All the other interconnections seem to be fine. > > Any idea if OpenSM 3.1.10 has the same issue as 3.2.1 ? Yes, from the log file I see these messages also did happen with opensm-3.1.10. > > Is this some large Flextronics switch ? Again you are right, this is a Flextronics F-X430075, presently with 144 ports. Thanks, Bernd From rdreier at cisco.com Fri Apr 4 14:47:17 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 14:47:17 -0700 Subject: [ofa-general] Re: [PATCH 19/20] IB/ipath - add calls to new 7220 code and enable in build In-Reply-To: <1207345344.8715.21.camel@brick.pathscale.com> (Ralph Campbell's message of "Fri, 04 Apr 2008 14:42:24 -0700") References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> <20080402225038.28598.43308.stgit@eng-46.mv.qlogic.com> <1207345344.8715.21.camel@brick.pathscale.com> Message-ID: > It looks similar but the values are reversed. This is converting > the ib_rate enum to a multiplier of the DDR clock rate which is > used as a counter to delay packets. So IB_RATE_2_5_GBPS is 8 > times slower than IB_RATE_20_GBPS. The standard functions map > the enum to a multiplier of the slowest rate so > IB_RATE_2_5_GBPS is one. If I used the standard functions, I would > still need a lookup table to map 8->1, 1->8, etc. OK, got it thanks From sfr at canb.auug.org.au Fri Apr 4 14:48:32 2008 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Sat, 5 Apr 2008 08:48:32 +1100 Subject: [ofa-general] linux-next: infiniband build failure In-Reply-To: References: <20080404133204.3edc0470.sfr@canb.auug.org.au> Message-ID: <20080405084832.5e4a0c53.sfr@canb.auug.org.au> Hi Roland, On Fri, 04 Apr 2008 08:47:29 -0700 Roland Dreier wrote: > > > drivers/infiniband/hw/ehca/ehca_reqs.c: In function 'ehca_write_swqe': > > drivers/infiniband/hw/ehca/ehca_reqs.c:191: error: 'const struct ib_send_wr' has no member named 'imm_data' > > Oops, thanks, I forgot to run my cross-compile (and ehca is ppc only). > > Anyway, your fix is correct and I rolled it into my patch. Thanks. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From rdreier at cisco.com Fri Apr 4 15:02:15 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 15:02:15 -0700 Subject: [ofa-general] Re: [PATCH] mthca: update QP state after query QP In-Reply-To: <200803271636.00414.dotanb@dev.mellanox.co.il> (Dotan Barak's message of "Thu, 27 Mar 2008 16:36:00 +0200") References: <200803271636.00414.dotanb@dev.mellanox.co.il> Message-ID: thanks, applied From rdreier at cisco.com Fri Apr 4 15:04:13 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 15:04:13 -0700 Subject: [ofa-general] Re: [PATCH] mlx4: update QP state after query QP In-Reply-To: <200803271708.41638.dotanb@dev.mellanox.co.il> (Dotan Barak's message of "Thu, 27 Mar 2008 17:08:41 +0200") References: <200803271708.41638.dotanb@dev.mellanox.co.il> Message-ID: thanks, applied From clameter at sgi.com Fri Apr 4 15:06:18 2008 From: clameter at sgi.com (Christoph Lameter) Date: Fri, 4 Apr 2008 15:06:18 -0700 (PDT) Subject: [ofa-general] Re: [PATCH] mmu notifier #v11 In-Reply-To: <20080404202055.GA14784@duo.random> References: <20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random> <20080402220148.GV19189@duo.random> <20080402221716.GY19189@duo.random> <20080403151908.GB9603@duo.random> <20080404202055.GA14784@duo.random> Message-ID: I am always the guy doing the cleanup after Andrea it seems. Sigh. Here is the mm_lock/mm_unlock logic separated out for easier review. Adds some comments. Still objectionable is the multiple ways of invalidating pages in #v11. Callout now has similar locking to emm. From: Christoph Lameter Subject: mm_lock: Lock a process against reclaim Provide a way to lock an mm_struct against reclaim (try_to_unmap etc). This is necessary for the invalidate notifier approaches so that they can reliably add and remove a notifier. Signed-off-by: Andrea Arcangeli Signed-off-by: Christoph Lameter --- include/linux/mm.h | 10 ++++++++ mm/mmap.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 76 insertions(+) Index: linux-2.6/include/linux/mm.h =================================================================== --- linux-2.6.orig/include/linux/mm.h 2008-04-02 11:41:47.741678873 -0700 +++ linux-2.6/include/linux/mm.h 2008-04-04 15:02:17.660504756 -0700 @@ -1050,6 +1050,16 @@ extern int install_special_mapping(struc unsigned long addr, unsigned long len, unsigned long flags, struct page **pages); +/* + * Locking and unlocking an mm against reclaim. + * + * mm_lock will take mmap_sem writably (to prevent additional vmas from being + * added) and then take all mapping locks of the existing vmas. With that + * reclaim is effectively stopped. + */ +extern void mm_lock(struct mm_struct *mm); +extern void mm_unlock(struct mm_struct *mm); + extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr, Index: linux-2.6/mm/mmap.c =================================================================== --- linux-2.6.orig/mm/mmap.c 2008-04-04 14:55:03.477593980 -0700 +++ linux-2.6/mm/mmap.c 2008-04-04 14:59:05.505395402 -0700 @@ -2242,3 +2242,69 @@ int install_special_mapping(struct mm_st return 0; } + +static void mm_lock_unlock(struct mm_struct *mm, int lock) +{ + struct vm_area_struct *vma; + spinlock_t *i_mmap_lock_last, *anon_vma_lock_last; + + i_mmap_lock_last = NULL; + for (;;) { + spinlock_t *i_mmap_lock = (spinlock_t *) -1UL; + for (vma = mm->mmap; vma; vma = vma->vm_next) + if (vma->vm_file && vma->vm_file->f_mapping && + (unsigned long) i_mmap_lock > + (unsigned long) + &vma->vm_file->f_mapping->i_mmap_lock && + (unsigned long) + &vma->vm_file->f_mapping->i_mmap_lock > + (unsigned long) i_mmap_lock_last) + i_mmap_lock = + &vma->vm_file->f_mapping->i_mmap_lock; + if (i_mmap_lock == (spinlock_t *) -1UL) + break; + i_mmap_lock_last = i_mmap_lock; + if (lock) + spin_lock(i_mmap_lock); + else + spin_unlock(i_mmap_lock); + } + + anon_vma_lock_last = NULL; + for (;;) { + spinlock_t *anon_vma_lock = (spinlock_t *) -1UL; + for (vma = mm->mmap; vma; vma = vma->vm_next) + if (vma->anon_vma && + (unsigned long) anon_vma_lock > + (unsigned long) &vma->anon_vma->lock && + (unsigned long) &vma->anon_vma->lock > + (unsigned long) anon_vma_lock_last) + anon_vma_lock = &vma->anon_vma->lock; + if (anon_vma_lock == (spinlock_t *) -1UL) + break; + anon_vma_lock_last = anon_vma_lock; + if (lock) + spin_lock(anon_vma_lock); + else + spin_unlock(anon_vma_lock); + } +} + +/* + * This operation locks against the VM for all pte/vma/mm related + * operations that could ever happen on a certain mm. This includes + * vmtruncate, try_to_unmap, and all page faults. The holder + * must not hold any mm related lock. A single task can't take more + * than one mm lock in a row or it would deadlock. + */ +void mm_lock(struct mm_struct * mm) +{ + down_write(&mm->mmap_sem); + mm_lock_unlock(mm, 1); +} + +void mm_unlock(struct mm_struct *mm) +{ + mm_lock_unlock(mm, 0); + up_write(&mm->mmap_sem); +} From rdreier at cisco.com Fri Apr 4 15:07:25 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 15:07:25 -0700 Subject: [ofa-general] [PATCH 2 of 2] mlx4: update module version and release date (for 2.6.25) In-Reply-To: <200802271620.53952.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Wed, 27 Feb 2008 16:20:53 +0200") References: <200802271620.53952.jackm@dev.mellanox.co.il> Message-ID: thanks, applied both this and mthca equivalent From bs at q-leap.de Fri Apr 4 15:12:39 2008 From: bs at q-leap.de (Bernd Schubert) Date: Sat, 5 Apr 2008 00:12:39 +0200 Subject: [ofa-general] XmtDiscards Message-ID: <200804050012.39893.bs@q-leap.de> Hello, after I upgraded one of our clusters to opensm-3.2.1 it seems to have gotten much better there, at least no further RcvSwRelayErrors, even when the cluster is in idle state and so far also no SymbolErrors, which we also have seens before. However, after I just started a lustre stress test on 50 clients (to a lustre storage system with 20 OSS servers and 60 OSTs), ibcheckerrors reports about 9000 XmtDiscards within 30 minutes. Searching for this error I find "This is a symptom of congestion and may require tweaking either HOQ or switch lifetime values". Well, I have to admit I neither know what HOQ is, nor do I know how to tweak it. I also do not have an idea to set switch lifetime values. I guess this isn't related to the opensm timeout option, is it? Hmm, I just found a cisci pdf describing how to set the lifetime on these switches, but is this also possible on Flextronics switches? Thanks for any help, Bernd -- Bernd Schubert Q-Leap Networks GmbH From rdreier at cisco.com Fri Apr 4 15:27:06 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 04 Apr 2008 15:27:06 -0700 Subject: [ofa-general] Re: [PATCH] mlx4: make firmware diagnostic counters available via sysfs In-Reply-To: <200804021615.44982.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Wed, 2 Apr 2008 16:15:44 +0300") References: <200804021615.44982.jackm@dev.mellanox.co.il> Message-ID: > +int mlx4_query_diag_counters(struct mlx4_dev *dev, int array_length, > + int in_modifier, unsigned int in_offset[], > + u32 counter_out[]) > +{ > + struct mlx4_cmd_mailbox *mailbox; > + u32 *outbox; > + u32 op_modifer = (u32)in_modifier; This coding style looks strange to me... you have an int parameter in_modifier that is not used for anything except to assign it to a u32 op_modifer [sic] variable with a (u32) cast that doesn't do anything. Why not just have op_modifier be the parameter in the first place? Also the array_length stuff looks kind of funny since you only ever pass in a value of 1... why not just pass in int offset and u32 *counter? > + /* clear counters file, can't read it */ > + if(offset < 0) > + return sprintf(buf,"This file is write only\n"); Why not just set the permissions on the file so it can't be opened for reading? This just looks like a recipe for making userspace code go crazy on unexpected input. Also watch out for the space in "if (" And if I'm understanding correctly, you use a magic offset of -1 for the clear_diag attribute that makes mlx4_query_diag_counters() read before the beginning of the output mailbox. > +err_diag: > + ib_unregister_device(&ibdev->ib_dev); > + > err_reg: > ib_unregister_device(&ibdev->ib_dev); This doesn't look like a good idea. - R. From boris at mellanox.com Fri Apr 4 15:28:46 2008 From: boris at mellanox.com (Boris Shpolyansky) Date: Fri, 4 Apr 2008 15:28:46 -0700 Subject: [ofa-general] XmtDiscards In-Reply-To: <200804050012.39893.bs@q-leap.de> Message-ID: <1E3DCD1C63492545881FACB6063A57C1023F6AE8@mtiexch01.mti.com> Hi Bernd, You can configure the HOQ (Head-Of-Queue-Lifetime) value programmed in any switch in the fabric managed by OpenSM following these simple steps: 1. Stop the SM /etc/init.d/opensmd stop 2. Run the SM manually with the "-c" option (to dump its default configuration to a file) opensm -c 3. Kill the SM with ^C 4. The configuration is saved in /var/cache/opensm/opensm.opts. Open the file and look for head_of_queue_lifetime. Change the value and save the file. 5. Restart the SM /etc/init.d/opensmd start P.S. You might find 'opensm -h' and 'man opensm' useful. Hope this helps, Boris Shpolyansky Sr. Member of Technical Staff Applications Mellanox Technologies Inc. 2900 Stender Way Santa Clara, CA 95054 Tel.: (408) 916 0014 Fax: (408) 970 3403 Cell: (408) 834 9365 www.mellanox.com -----Original Message----- From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Bernd Schubert Sent: Friday, April 04, 2008 3:13 PM To: OpenIB Subject: [ofa-general] XmtDiscards Hello, after I upgraded one of our clusters to opensm-3.2.1 it seems to have gotten much better there, at least no further RcvSwRelayErrors, even when the cluster is in idle state and so far also no SymbolErrors, which we also have seens before. However, after I just started a lustre stress test on 50 clients (to a lustre storage system with 20 OSS servers and 60 OSTs), ibcheckerrors reports about 9000 XmtDiscards within 30 minutes. Searching for this error I find "This is a symptom of congestion and may require tweaking either HOQ or switch lifetime values". Well, I have to admit I neither know what HOQ is, nor do I know how to tweak it. I also do not have an idea to set switch lifetime values. I guess this isn't related to the opensm timeout option, is it? Hmm, I just found a cisci pdf describing how to set the lifetime on these switches, but is this also possible on Flextronics switches? Thanks for any help, Bernd -- Bernd Schubert Q-Leap Networks GmbH _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From weiny2 at llnl.gov Fri Apr 4 15:29:32 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Fri, 4 Apr 2008 15:29:32 -0700 Subject: [ofa-general] XmtDiscards In-Reply-To: <200804050012.39893.bs@q-leap.de> References: <200804050012.39893.bs@q-leap.de> Message-ID: <20080404152932.5e294e47.weiny2@llnl.gov> On Sat, 5 Apr 2008 00:12:39 +0200 Bernd Schubert wrote: > Hello, > > after I upgraded one of our clusters to opensm-3.2.1 it seems to have gotten > much better there, at least no further RcvSwRelayErrors, even when the > cluster is in idle state and so far also no SymbolErrors, which we also have > seens before. > > However, after I just started a lustre stress test on 50 clients (to a lustre > storage system with 20 OSS servers and 60 OSTs), ibcheckerrors reports about > 9000 XmtDiscards within 30 minutes. Yea, those are bad. > > Searching for this error I find "This is a symptom of congestion and may > require tweaking either HOQ or switch lifetime values". > Well, I have to admit I neither know what HOQ is, nor do I know how to tweak > it. I also do not have an idea to set switch lifetime values. I guess this > isn't related to the opensm timeout option, is it? Yes you should adjust these values. > > Hmm, I just found a cisci pdf describing how to set the lifetime on these > switches, but is this also possible on Flextronics switches? > I don't know about the Vendor SMs but in opensm look for the following options in the opensm.opts file (Default path is: /var/cache/opensm): # The code of maximal time a packet can wait at the head of # transmission queue. # The actual time is 4.096usec * 2^ # The value 0x14 disables this mechanism head_of_queue_lifetime 0x12 # The maximal time a packet can wait at the head of queue on # switch port connected to a CA or router port leaf_head_of_queue_lifetime 0x0c Ira From clameter at sgi.com Fri Apr 4 15:30:48 2008 From: clameter at sgi.com (Christoph Lameter) Date: Fri, 04 Apr 2008 15:30:48 -0700 Subject: [ofa-general] [patch 00/10] [RFC] EMM Notifier V3 Message-ID: <20080404223048.374852899@sgi.com> V2->V3: - Fix rcu issues - Fix emm_referenced handling - Use Andrea's mm_lock/unlock to prevent registration races. - Keep simple API since there does not seem to be a need to add additional callbacks (mm_lock does not require callbacks like emm_start/stop that I envisioned). - Reduce CC list (the volume we are producing here must be annoying...). V1->V2: - Additional optimizations in the VM - Convert vm spinlocks to rw sems. - Add XPMEM driver (requires sleeping in callbacks) - Add XPMEM example This patch implements a simple callback for device drivers that establish their own references to pages (KVM, GRU, XPmem, RDMA/Infiniband, DMA engines etc). These references are unknown to the VM (therefore external). With these callbacks it is possible for the device driver to release external references when the VM requests it. This enables swapping, page migration and allows support of remapping, permission changes etc etc for the externally mapped memory. With this functionality it becomes also possible to avoid pinning or mlocking pages (commonly done to stop the VM from unmapping device mapped pages). A device driver must subscribe to a process using emm_register_notifier(struct emm_notifier *, struct mm_struct *) The VM will then perform callbacks for operations that unmap or change permissions of pages in that address space. When the process terminates the callback function is called with emm_release. Callbacks are performed before and after the unmapping action of the VM. emm_invalidate_start before emm_invalidate_end after The device driver must hold off establishing new references to pages in the range specified between a callback with emm_invalidate_start and the subsequent call with emm_invalidate_end set. This allows the VM to ensure that no concurrent driver actions are performed on an address range while performing remapping or unmapping operations. This patchset contains additional modifications needed to ensure that the callbacks can sleep. For that purpose two key locks in the vm need to be converted to rw_sems. These patches are brand new, invasive and need extensive discussion and evaluation. The first patch alone may be applied if callbacks in atomic context are sufficient for a device driver (likely the case for KVM and GRU and simple DMA drivers). Following the VM modifications is the XPMEM device driver that allows sharing of memory between processes running on different instances of Linux. This is also a prototype. It is known to run trivial sample programs included as the last patch. -- From clameter at sgi.com Fri Apr 4 15:30:49 2008 From: clameter at sgi.com (Christoph Lameter) Date: Fri, 04 Apr 2008 15:30:49 -0700 Subject: [ofa-general] [patch 01/10] emm: mm_lock: Lock a process against reclaim References: <20080404223048.374852899@sgi.com> Message-ID: <20080404223131.271668133@sgi.com> An embedded and charset-unspecified text was scrubbed... Name: mm_lock_unlock URL: From clameter at sgi.com Fri Apr 4 15:30:54 2008 From: clameter at sgi.com (Christoph Lameter) Date: Fri, 04 Apr 2008 15:30:54 -0700 Subject: [ofa-general] [patch 06/10] emm: Convert anon_vma lock to rw_sem and refcount References: <20080404223048.374852899@sgi.com> Message-ID: <20080404223132.477298248@sgi.com> An embedded and charset-unspecified text was scrubbed... Name: emm_anon_vma_sem URL: From clameter at sgi.com Fri Apr 4 15:30:52 2008 From: clameter at sgi.com (Christoph Lameter) Date: Fri, 04 Apr 2008 15:30:52 -0700 Subject: [ofa-general] [patch 04/10] emm: Convert i_mmap_lock to i_mmap_sem References: <20080404223048.374852899@sgi.com> Message-ID: <20080404223131.999993077@sgi.com> An embedded and charset-unspecified text was scrubbed... Name: emm_immap_sem URL: From clameter at sgi.com Fri Apr 4 15:30:58 2008 From: clameter at sgi.com (Christoph Lameter) Date: Fri, 04 Apr 2008 15:30:58 -0700 Subject: [ofa-general] [patch 10/10] xpmem: Simple example References: <20080404223048.374852899@sgi.com> Message-ID: <20080404223133.463091757@sgi.com> An embedded and charset-unspecified text was scrubbed... Name: xpmem_test URL: From clameter at sgi.com Fri Apr 4 15:30:50 2008 From: clameter at sgi.com (Christoph Lameter) Date: Fri, 04 Apr 2008 15:30:50 -0700 Subject: [ofa-general] [patch 02/10] emm: notifier logic References: <20080404223048.374852899@sgi.com> Message-ID: <20080404223131.469710551@sgi.com> An embedded and charset-unspecified text was scrubbed... Name: emm_notifier URL: From clameter at sgi.com Fri Apr 4 15:30:56 2008 From: clameter at sgi.com (Christoph Lameter) Date: Fri, 04 Apr 2008 15:30:56 -0700 Subject: [ofa-general] [patch 08/10] xpmem: Locking rules for taking multiple mmap_sem locks. References: <20080404223048.374852899@sgi.com> Message-ID: <20080404223132.971442620@sgi.com> An embedded and charset-unspecified text was scrubbed... Name: xpmem_v003_lock-rule URL: From clameter at sgi.com Fri Apr 4 15:30:51 2008 From: clameter at sgi.com (Christoph Lameter) Date: Fri, 04 Apr 2008 15:30:51 -0700 Subject: [ofa-general] [patch 03/10] emm: Move tlb flushing into free_pgtables References: <20080404223048.374852899@sgi.com> Message-ID: <20080404223131.727813758@sgi.com> An embedded and charset-unspecified text was scrubbed... Name: move_tlb_flush URL: From clameter at sgi.com Fri Apr 4 15:30:57 2008 From: clameter at sgi.com (Christoph Lameter) Date: Fri, 04 Apr 2008 15:30:57 -0700 Subject: [ofa-general] [patch 09/10] xpmem: The device driver References: <20080404223048.374852899@sgi.com> Message-ID: <20080404223133.216189171@sgi.com> An embedded and charset-unspecified text was scrubbed... Name: xpmem_v003_emm_SSI_v3 URL: From clameter at sgi.com Fri Apr 4 15:30:53 2008 From: clameter at sgi.com (Christoph Lameter) Date: Fri, 04 Apr 2008 15:30:53 -0700 Subject: [ofa-general] [patch 05/10] emm: Remove tlb pointer from the parameters of unmap vmas References: <20080404223048.374852899@sgi.com> Message-ID: <20080404223132.259410373@sgi.com> An embedded and charset-unspecified text was scrubbed... Name: cleanup_unmap_vmas URL: From clameter at sgi.com Fri Apr 4 15:30:55 2008 From: clameter at sgi.com (Christoph Lameter) Date: Fri, 04 Apr 2008 15:30:55 -0700 Subject: [ofa-general] [patch 07/10] xpmem: This patch exports zap_page_range as it is needed by XPMEM. References: <20080404223048.374852899@sgi.com> Message-ID: <20080404223132.734091146@sgi.com> An embedded and charset-unspecified text was scrubbed... Name: xpmem_v003_export-zap_page_range URL: From jeremy at goop.org Fri Apr 4 16:12:42 2008 From: jeremy at goop.org (Jeremy Fitzhardinge) Date: Fri, 04 Apr 2008 16:12:42 -0700 Subject: [ofa-general] Re: [patch 01/10] emm: mm_lock: Lock a process against reclaim In-Reply-To: <20080404223131.271668133@sgi.com> References: <20080404223048.374852899@sgi.com> <20080404223131.271668133@sgi.com> Message-ID: <47F6B5EA.6060106@goop.org> Christoph Lameter wrote: > Provide a way to lock an mm_struct against reclaim (try_to_unmap > etc). This is necessary for the invalidate notifier approaches so > that they can reliably add and remove a notifier. > > Signed-off-by: Andrea Arcangeli > Signed-off-by: Christoph Lameter > > --- > include/linux/mm.h | 10 ++++++++ > mm/mmap.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 76 insertions(+) > > Index: linux-2.6/include/linux/mm.h > =================================================================== > --- linux-2.6.orig/include/linux/mm.h 2008-04-02 11:41:47.741678873 -0700 > +++ linux-2.6/include/linux/mm.h 2008-04-04 15:02:17.660504756 -0700 > @@ -1050,6 +1050,16 @@ extern int install_special_mapping(struc > unsigned long addr, unsigned long len, > unsigned long flags, struct page **pages); > > +/* > + * Locking and unlocking am mm against reclaim. > + * > + * mm_lock will take mmap_sem writably (to prevent additional vmas from being > + * added) and then take all mapping locks of the existing vmas. With that > + * reclaim is effectively stopped. > + */ > +extern void mm_lock(struct mm_struct *mm); > +extern void mm_unlock(struct mm_struct *mm); > + > extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); > > extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr, > Index: linux-2.6/mm/mmap.c > =================================================================== > --- linux-2.6.orig/mm/mmap.c 2008-04-04 14:55:03.477593980 -0700 > +++ linux-2.6/mm/mmap.c 2008-04-04 14:59:05.505395402 -0700 > @@ -2242,3 +2242,69 @@ int install_special_mapping(struct mm_st > > return 0; > } > + > +static void mm_lock_unlock(struct mm_struct *mm, int lock) > +{ > + struct vm_area_struct *vma; > + spinlock_t *i_mmap_lock_last, *anon_vma_lock_last; > + > + i_mmap_lock_last = NULL; > + for (;;) { > + spinlock_t *i_mmap_lock = (spinlock_t *) -1UL; > + for (vma = mm->mmap; vma; vma = vma->vm_next) > + if (vma->vm_file && vma->vm_file->f_mapping && > I think you can break this if() down a bit: if (!(vma->vm_file && vma->vm_file->f_mapping)) continue; > + (unsigned long) i_mmap_lock > > + (unsigned long) > + &vma->vm_file->f_mapping->i_mmap_lock && > + (unsigned long) > + &vma->vm_file->f_mapping->i_mmap_lock > > + (unsigned long) i_mmap_lock_last) > + i_mmap_lock = > + &vma->vm_file->f_mapping->i_mmap_lock; > So this is an O(n^2) algorithm to take the i_mmap_locks from low to high order? A comment would be nice. And O(n^2)? Ouch. How often is it called? And is it necessary to mush lock and unlock together? Unlock ordering doesn't matter, so you should just be able to have a much simpler loop, no? > + if (i_mmap_lock == (spinlock_t *) -1UL) > + break; > + i_mmap_lock_last = i_mmap_lock; > + if (lock) > + spin_lock(i_mmap_lock); > + else > + spin_unlock(i_mmap_lock); > + } > + > + anon_vma_lock_last = NULL; > + for (;;) { > + spinlock_t *anon_vma_lock = (spinlock_t *) -1UL; > + for (vma = mm->mmap; vma; vma = vma->vm_next) > + if (vma->anon_vma && > + (unsigned long) anon_vma_lock > > + (unsigned long) &vma->anon_vma->lock && > + (unsigned long) &vma->anon_vma->lock > > + (unsigned long) anon_vma_lock_last) > + anon_vma_lock = &vma->anon_vma->lock; > + if (anon_vma_lock == (spinlock_t *) -1UL) > + break; > + anon_vma_lock_last = anon_vma_lock; > + if (lock) > + spin_lock(anon_vma_lock); > + else > + spin_unlock(anon_vma_lock); > + } > +} > > + > +/* > + * This operation locks against the VM for all pte/vma/mm related > + * operations that could ever happen on a certain mm. This includes > + * vmtruncate, try_to_unmap, and all page faults. The holder > + * must not hold any mm related lock. A single task can't take more > + * than one mm lock in a row or it would deadlock. > + */ > +void mm_lock(struct mm_struct * mm) > +{ > + down_write(&mm->mmap_sem); > + mm_lock_unlock(mm, 1); > +} > + > +void mm_unlock(struct mm_struct *mm) > +{ > + mm_lock_unlock(mm, 0); > + up_write(&mm->mmap_sem); > +} > > From bs at q-leap.de Fri Apr 4 16:21:11 2008 From: bs at q-leap.de (Bernd Schubert) Date: Sat, 5 Apr 2008 01:21:11 +0200 Subject: [ofa-general] XmtDiscards In-Reply-To: <1E3DCD1C63492545881FACB6063A57C1023F6AE8@mtiexch01.mti.com> References: <200804050012.39893.bs@q-leap.de> <1E3DCD1C63492545881FACB6063A57C1023F6AE8@mtiexch01.mti.com> Message-ID: <20080404232111.GA17576@lanczos.q-leap.de> Hello Boris, On Fri, Apr 04, 2008 at 03:28:46PM -0700, Boris Shpolyansky wrote: > Hi Bernd, > > You can configure the HOQ (Head-Of-Queue-Lifetime) value programmed in > any switch in the fabric managed by OpenSM following these simple steps: > > 1. Stop the SM > /etc/init.d/opensmd stop > > 2. Run the SM manually with the "-c" option (to dump its default > configuration to a file) > opensm -c > > 3. Kill the SM with ^C > > 4. The configuration is saved in /var/cache/opensm/opensm.opts. Open the > file and look for head_of_queue_lifetime. Change the value and save the > file. > > 5. Restart the SM > /etc/init.d/opensmd start thanks a lot for your help. This did help quite a lot. > > P.S. You might find 'opensm -h' and 'man opensm' useful. Sorry about my dumb question, I did read the man page of opensm quite often already, but "--cache-options" and "OSM_CACHE_DIR" did activate my brain-internal filter to entirely skip this part of the man page ;) Somehow I associated "cache" with "opensm-performance", but not at all with options... Thanks again, Bernd From arlin.r.davis at intel.com Fri Apr 4 16:40:43 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Fri, 4 Apr 2008 16:40:43 -0700 Subject: [ofa-general] [PATCH 2/4][v2] dapl: add support for logging errors in non-debug build. Message-ID: Add debug logging (stdout, syslog) for error cases during device open, cm, async, and dto operations. Default settings are ERR for DAPL_DBG_TYPE, and stdout for DAPL_DBG_DEST. Change default configuration to build non-debug. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- configure.in | 4 +- dapl/common/dapl_debug.c | 2 - dapl/common/dapl_evd_util.c | 8 +- dapl/include/dapl_debug.h | 10 ++- dapl/openib_cma/dapl_ib_cm.c | 196 +++++++++++++++++++++++----------------- dapl/openib_cma/dapl_ib_util.c | 87 +++++++++--------- dapl/udapl/dapl_init.c | 16 +++- dapl/udapl/linux/dapl_osd.h | 2 +- 8 files changed, 179 insertions(+), 146 deletions(-) diff --git a/configure.in b/configure.in index eaf597b..d1c2664 100644 --- a/configure.in +++ b/configure.in @@ -42,12 +42,12 @@ AM_CONDITIONAL(HAVE_LD_VERSION_SCRIPT, test "$ac_cv_version_script" = "yes") dnl Support debug mode build - if enable-debug provided the DEBUG variable is set AC_ARG_ENABLE(debug, -[ --enable-debug Turn on debug mode, default=on], +[ --enable-debug Turn on debug mode, default=off], [case "${enableval}" in yes) debug=true ;; no) debug=false ;; *) AC_MSG_ERROR(bad value ${enableval} for --enable-debug) ;; -esac],[debug=true]) +esac],[debug=false]) AM_CONDITIONAL(DEBUG, test x$debug = xtrue) dnl Support ib_extension build - if enable-ext-type == ib diff --git a/dapl/common/dapl_debug.c b/dapl/common/dapl_debug.c index 7ddce52..cbc356c 100644 --- a/dapl/common/dapl_debug.c +++ b/dapl/common/dapl_debug.c @@ -32,7 +32,6 @@ #include #endif /* __KDAPL__ */ -#ifdef DAPL_DBG DAPL_DBG_TYPE g_dapl_dbg_type; /* initialized in dapl_init.c */ DAPL_DBG_DEST g_dapl_dbg_dest; /* initialized in dapl_init.c */ @@ -117,5 +116,4 @@ void dapl_dump_cntr( int cntr ) } #endif /* DAPL_COUNTERS */ -#endif diff --git a/dapl/common/dapl_evd_util.c b/dapl/common/dapl_evd_util.c index a993b02..2ae1b59 100755 --- a/dapl/common/dapl_evd_util.c +++ b/dapl/common/dapl_evd_util.c @@ -1209,10 +1209,10 @@ dapli_evd_cqe_to_event ( dapl_os_unlock ( &ep_ptr->header.lock ); } - dapl_dbg_log (DAPL_DBG_TYPE_DTO_COMP_ERR, - " DTO completion ERROR: %d: op %#x (ep disconnected)\n", - DAPL_GET_CQE_STATUS (cqe_ptr), - DAPL_GET_CQE_OPTYPE (cqe_ptr)); + dapl_log(DAPL_DBG_TYPE_ERR, + "DTO completion ERR: status %d, opcode %s \n", + DAPL_GET_CQE_STATUS(cqe_ptr), + DAPL_GET_CQE_OP_STR(cqe_ptr)); } } diff --git a/dapl/include/dapl_debug.h b/dapl/include/dapl_debug.h index 76db8fd..f0de7c8 100644 --- a/dapl/include/dapl_debug.h +++ b/dapl/include/dapl_debug.h @@ -75,14 +75,16 @@ typedef enum DAPL_DBG_DEST_SYSLOG = 0x0002, } DAPL_DBG_DEST; - -#if defined(DAPL_DBG) - extern DAPL_DBG_TYPE g_dapl_dbg_type; extern DAPL_DBG_DEST g_dapl_dbg_dest; +extern void dapl_internal_dbg_log(DAPL_DBG_TYPE type, const char *fmt, ...); + +#define dapl_log g_dapl_dbg_type==0 ? (void) 1 : dapl_internal_dbg_log + +#if defined(DAPL_DBG) + #define dapl_dbg_log g_dapl_dbg_type==0 ? (void) 1 : dapl_internal_dbg_log -extern void dapl_internal_dbg_log ( DAPL_DBG_TYPE type, const char *fmt, ...); #else /* !DAPL_DBG */ diff --git a/dapl/openib_cma/dapl_ib_cm.c b/dapl/openib_cma/dapl_ib_cm.c index a040ffb..33f299d 100755 --- a/dapl/openib_cma/dapl_ib_cm.c +++ b/dapl/openib_cma/dapl_ib_cm.c @@ -95,9 +95,9 @@ static void dapli_addr_resolve(struct dapl_cm_id *conn) ret = rdma_resolve_route(conn->cm_id, conn->route_timeout); if (ret) { - dapl_dbg_log(DAPL_DBG_TYPE_ERR, - " rdma_connect failed: %s\n",strerror(errno)); - + dapl_log(DAPL_DBG_TYPE_ERR, + " dapl_cma_connect: rdma_resolve_route ERR %d %s\n", + ret, strerror(errno)); dapl_evd_connection_callback(conn, IB_CME_LOCAL_FAILURE, NULL, conn->ep); @@ -146,8 +146,9 @@ static void dapli_route_resolve(struct dapl_cm_id *conn) ret = rdma_connect(conn->cm_id, &conn->params); if (ret) { - dapl_dbg_log(DAPL_DBG_TYPE_ERR, " rdma_connect failed: %s\n", - strerror(errno)); + dapl_log(DAPL_DBG_TYPE_ERR, + " dapl_cma_connect: rdma_connect ERR %d %s\n", + ret, strerror(errno)); goto bail; } return; @@ -310,12 +311,15 @@ static void dapli_cm_active_cb(struct dapl_cm_id *conn, case RDMA_CM_EVENT_UNREACHABLE: case RDMA_CM_EVENT_CONNECT_ERROR: { - dapl_dbg_log( - DAPL_DBG_TYPE_WARN, - " dapli_cm_active_handler: CONN_ERR " - " event=0x%x status=%d %s\n", + dapl_log(DAPL_DBG_TYPE_WARN, + "dapl_cma_active: CONN_ERR event=0x%x" + " status=%d %s DST %s, %d\n", event->event, event->status, - (event->status == -ETIMEDOUT)?"TIMEOUT":"" ); + (event->status == -ETIMEDOUT)?"TIMEOUT":"", + inet_ntoa(((struct sockaddr_in *) + &conn->cm_id->route.addr.dst_addr)->sin_addr), + ntohs(((struct sockaddr_in *) + &conn->cm_id->route.addr.dst_addr)->sin_port)); /* per DAT SPEC provider always returns UNREACHABLE */ dapl_evd_connection_callback(conn, @@ -327,36 +331,47 @@ static void dapli_cm_active_cb(struct dapl_cm_id *conn, { ib_cm_events_t cm_event; - /* no device type specified so assume IB for now */ - if (event->status == 28) /* IB_CM_REJ_CONSUMER_DEFINED */ - cm_event = IB_CME_DESTINATION_REJECT_PRIVATE_DATA; - else - cm_event = IB_CME_DESTINATION_REJECT; - dapl_dbg_log( DAPL_DBG_TYPE_CM, " dapli_cm_active_handler: REJECTED reason=%d\n", event->status); - + + /* valid REJ from consumer will always contain private data */ + if (event->status == 28 && + event->param.conn.private_data_len) + cm_event = IB_CME_DESTINATION_REJECT_PRIVATE_DATA; + else { + cm_event = IB_CME_DESTINATION_REJECT; + dapl_log(DAPL_DBG_TYPE_WARN, + "dapl_cma_active: non-consumer REJ," + " reason=%d, DST %s, %d\n", + event->status, + inet_ntoa(((struct sockaddr_in *) + &conn->cm_id->route.addr.dst_addr)->sin_addr), + ntohs(((struct sockaddr_in *) + &conn->cm_id->route.addr.dst_addr)->sin_port)); + } dapl_evd_connection_callback(conn, cm_event, NULL, conn->ep); break; } case RDMA_CM_EVENT_ESTABLISHED: - dapl_dbg_log(DAPL_DBG_TYPE_CM, - " active_cb: cm_id %d PORT %d CONNECTED to 0x%x!\n", + " active_cb: cm_id %d PORT %d CONNECTED to %s!\n", conn->cm_id, ntohs(((struct sockaddr_in *) &conn->cm_id->route.addr.dst_addr)->sin_port), - ntohl(((struct sockaddr_in *) - &conn->cm_id->route.addr.dst_addr)->sin_addr.s_addr)); + inet_ntoa(((struct sockaddr_in *) + &conn->cm_id->route.addr.dst_addr)->sin_addr)); /* setup local and remote ports for ep query */ - conn->ep->param.remote_port_qual = PORT_TO_SID(rdma_get_dst_port(conn->cm_id)); - conn->ep->param.local_port_qual = PORT_TO_SID(rdma_get_src_port(conn->cm_id)); + conn->ep->param.remote_port_qual = + PORT_TO_SID(rdma_get_dst_port(conn->cm_id)); + conn->ep->param.local_port_qual = + PORT_TO_SID(rdma_get_src_port(conn->cm_id)); dapl_evd_connection_callback(conn, IB_CME_CONNECTED, - event->param.conn.private_data, conn->ep); + event->param.conn.private_data, + conn->ep); break; case RDMA_CM_EVENT_DISCONNECTED: @@ -383,9 +398,6 @@ static void dapli_cm_passive_cb(struct dapl_cm_id *conn, struct rdma_cm_event *event) { struct dapl_cm_id *new_conn; -#ifdef DAPL_DBG - struct rdma_addr *ipaddr = &conn->cm_id->route.addr; -#endif dapl_dbg_log(DAPL_DBG_TYPE_CM, " passive_cb: conn %p id %d event %d\n", @@ -410,57 +422,43 @@ static void dapli_cm_passive_cb(struct dapl_cm_id *conn, break; case RDMA_CM_EVENT_UNREACHABLE: case RDMA_CM_EVENT_CONNECT_ERROR: - - dapl_dbg_log( - DAPL_DBG_TYPE_WARN, - " dapli_cm_passive: CONN_ERR " - " event=0x%x status=%d %s" - " on SRC 0x%x,0x%x DST 0x%x,0x%x\n", + dapl_log(DAPL_DBG_TYPE_WARN, + "dapl_cm_passive: CONN_ERR event=0x%x status=%d %s," + " DST %s,%d\n", event->event, event->status, - (event->status == -110)?"TIMEOUT":"", - ntohl(((struct sockaddr_in *) - &ipaddr->src_addr)->sin_addr.s_addr), - ntohs(((struct sockaddr_in *) - &ipaddr->src_addr)->sin_port), - ntohl(((struct sockaddr_in *) - &ipaddr->dst_addr)->sin_addr.s_addr), - ntohs(((struct sockaddr_in *) - &ipaddr->dst_addr)->sin_port)); + (event->status == -ETIMEDOUT)?"TIMEOUT":"", + inet_ntoa(((struct sockaddr_in *) + &conn->cm_id->route.addr.dst_addr)->sin_addr), + ntohs(((struct sockaddr_in *) + &conn->cm_id->route.addr.dst_addr)->sin_port)); dapls_cr_callback(conn, IB_CME_DESTINATION_UNREACHABLE, - NULL, conn->sp); + NULL, conn->sp); break; case RDMA_CM_EVENT_REJECTED: { ib_cm_events_t cm_event; - /* no device type specified so assume IB for now */ - if (event->status == 28) /* IB_CM_REJ_CONSUMER_DEFINED */ + /* valid REJ from consumer will always contain private data */ + if (event->status == 28 && + event->param.conn.private_data_len) cm_event = IB_CME_DESTINATION_REJECT_PRIVATE_DATA; - else + else { cm_event = IB_CME_DESTINATION_REJECT; - - dapl_dbg_log( - DAPL_DBG_TYPE_WARN, - " dapli_cm_passive: REJECTED reason=%d" - " on SRC 0x%x,0x%x DST 0x%x,0x%x\n", - event->status, - ntohl(((struct sockaddr_in *) - &ipaddr->src_addr)->sin_addr.s_addr), - ntohs(((struct sockaddr_in *) - &ipaddr->src_addr)->sin_port), - ntohl(((struct sockaddr_in *) - &ipaddr->dst_addr)->sin_addr.s_addr), - ntohs(((struct sockaddr_in *) - &ipaddr->dst_addr)->sin_port)); - + dapl_log(DAPL_DBG_TYPE_WARN, + "dapl_cm_active: non-consumer REJ, reason=%d," + " DST %s, %d\n", + event->status, + inet_ntoa(((struct sockaddr_in *) + &conn->cm_id->route.addr.dst_addr)->sin_addr), + ntohs(((struct sockaddr_in *) + &conn->cm_id->route.addr.dst_addr)->sin_port)); + } dapls_cr_callback(conn, cm_event, NULL, conn->sp); - break; } case RDMA_CM_EVENT_ESTABLISHED: - dapl_dbg_log(DAPL_DBG_TYPE_CM, " passive_cb: cm_id %p PORT %d CONNECTED from 0x%x!\n", conn->cm_id, @@ -559,9 +557,12 @@ DAT_RETURN dapls_ib_connect(IN DAT_EP_HANDLE ep_handle, if (rdma_resolve_addr(conn->cm_id, NULL, (struct sockaddr *)&conn->r_addr, - conn->arp_timeout)) + conn->arp_timeout)) { + dapl_log(DAPL_DBG_TYPE_ERR, + " dapl_cma_connect: rdma_resolve_addr ERR %s\n", + strerror(errno)); return dapl_convert_errno(errno,"ib_connect"); - + } dapl_dbg_log(DAPL_DBG_TYPE_CM, " connect: resolve_addr: cm_id %p -> %s port %d\n", conn->cm_id, @@ -815,9 +816,9 @@ dapls_ib_accept_connection(IN DAT_CR_HANDLE cr_handle, */ dat_status = dapls_ib_qp_alloc(ia_ptr, ep_ptr, NULL); if (dat_status != DAT_SUCCESS) { - dapl_dbg_log(DAPL_DBG_TYPE_ERR, - " accept: ib_qp_alloc failed: %d\n", - dat_status); + dapl_log(DAPL_DBG_TYPE_ERR, + " dapl_cma_accept: qp_alloc ERR %d\n", + dat_status); goto bail; } } @@ -835,11 +836,12 @@ dapls_ib_accept_connection(IN DAT_CR_HANDLE cr_handle, ep_ptr->qp_handle->cm_id->qp = NULL; dapli_destroy_conn(ep_ptr->qp_handle); } else { - dapl_dbg_log(DAPL_DBG_TYPE_ERR, - " accept: ERR dev(%p!=%p) or port mismatch(%d!=%d)\n", + dapl_log(DAPL_DBG_TYPE_ERR, + " dapl_cma_accept: ERR dev(%p!=%p) or" + " port mismatch(%d!=%d)\n", ep_ptr->qp_handle->cm_id->verbs,cr_conn->cm_id->verbs, - ep_ptr->qp_handle->cm_id->port_num, - cr_conn->cm_id->port_num ); + ntohs(ep_ptr->qp_handle->cm_id->port_num), + ntohs(cr_conn->cm_id->port_num)); dat_status = DAT_INTERNAL_ERROR; goto bail; } @@ -850,7 +852,8 @@ dapls_ib_accept_connection(IN DAT_CR_HANDLE cr_handle, ret = rdma_accept(cr_conn->cm_id, &cr_conn->params); if (ret) { - dapl_dbg_log(DAPL_DBG_TYPE_ERR," accept: ERROR %d\n", ret); + dapl_log(DAPL_DBG_TYPE_ERR," dapl_cma_accept: ERR %d %s\n", + ret, strerror(errno)); dat_status = dapl_convert_errno(ret, "accept"); goto bail; } @@ -909,6 +912,10 @@ dapls_ib_reject_connection( return DAT_SUCCESS; } + /* + * Private data is needed so peer can determine real application + * reject from an abnormal application termination + */ ret = rdma_reject(cm_handle->cm_id, NULL, 0); dapli_destroy_conn(cm_handle); @@ -1163,11 +1170,12 @@ void dapli_cma_event_cb(void) break; case RDMA_CM_EVENT_ADDR_ERROR: - dapl_dbg_log(DAPL_DBG_TYPE_WARN, - " CM ADDR ERROR: -> %s retry (%d)..\n", - inet_ntoa(((struct sockaddr_in *) + dapl_log(DAPL_DBG_TYPE_WARN, + "dapl_cma_active: CM ADDR ERROR: ->" + " DST %s retry (%d)..\n", + inet_ntoa(((struct sockaddr_in *) &conn->r_addr)->sin_addr), - conn->arp_retries); + conn->arp_retries); /* retry address resolution */ if ((--conn->arp_retries) && @@ -1188,27 +1196,47 @@ void dapli_cma_event_cb(void) } } /* retries exhausted or resolve_addr failed */ + dapl_log(DAPL_DBG_TYPE_ERR, + "dapl_cma_active: ARP_ERR, retries(%d)" + " exhausted -> DST %s,%d\n", + IB_ARP_RETRY_COUNT, + inet_ntoa(((struct sockaddr_in *) + &conn->cm_id->route.addr.dst_addr)->sin_addr), + ntohs(((struct sockaddr_in *) + &conn->cm_id->route.addr.dst_addr)->sin_port)); + dapl_evd_connection_callback( conn, IB_CME_DESTINATION_UNREACHABLE, NULL, conn->ep); break; - case RDMA_CM_EVENT_ROUTE_ERROR: - dapl_dbg_log(DAPL_DBG_TYPE_WARN, - " CM ROUTE ERROR: -> %s retry (%d)..\n", - inet_ntoa(((struct sockaddr_in *) + dapl_log(DAPL_DBG_TYPE_WARN, + "dapl_cma_active: CM ROUTE ERROR: ->" + " DST %s retry (%d)..\n", + inet_ntoa(((struct sockaddr_in *) &conn->r_addr)->sin_addr), - conn->route_retries ); + conn->route_retries ); /* retry route resolution */ if ((--conn->route_retries) && (event->status == -ETIMEDOUT)) dapli_addr_resolve(conn); - else - dapl_evd_connection_callback( conn, + else { + dapl_log(DAPL_DBG_TYPE_ERR, + "dapl_cma_active: PATH_RECORD_ERR," + " retries(%d) exhausted, DST %s,%d\n", + IB_ROUTE_RETRY_COUNT, + inet_ntoa(((struct sockaddr_in *) + &conn->cm_id->route.addr.dst_addr)->sin_addr), + ntohs(((struct sockaddr_in *) + &conn->cm_id->route.addr.dst_addr)->sin_port)); + + dapl_evd_connection_callback( + conn, IB_CME_DESTINATION_UNREACHABLE, NULL, conn->ep); + } break; case RDMA_CM_EVENT_DEVICE_REMOVAL: diff --git a/dapl/openib_cma/dapl_ib_util.c b/dapl/openib_cma/dapl_ib_util.c index e900b59..fcd8163 100755 --- a/dapl/openib_cma/dapl_ib_util.c +++ b/dapl/openib_cma/dapl_ib_util.c @@ -113,9 +113,10 @@ static int getipaddr(char *name, char *addr, int len) /* retry using network device name */ ret = getipaddr_netdev(name,addr,len); if (ret) { - dapl_dbg_log(DAPL_DBG_TYPE_WARN, - " getipaddr: invalid name, addr, or netdev(%s)\n", - name); + dapl_log(DAPL_DBG_TYPE_ERR, + " open_hca: getaddr_netdev ERROR:" + " %s. Is %s configured?\n", + strerror(errno), name); return ret; } } else { @@ -238,18 +239,19 @@ DAT_RETURN dapls_ib_open_hca(IN IB_HCA_NAME hca_name, IN DAPL_HCA *hca_ptr) /* cm_id will bind local device/GID based on IP address */ if (rdma_create_id(g_cm_events, &cm_id, (void*)hca_ptr, RDMA_PS_TCP)) { - dapl_dbg_log (DAPL_DBG_TYPE_ERR, - " open_hca: ERR with RDMA channel: %s\n", - strerror(errno)); + dapl_log(DAPL_DBG_TYPE_ERR, + " open_hca: rdma_create_id ERR %s\n", + strerror(errno)); return DAT_INTERNAL_ERROR; } ret = rdma_bind_addr(cm_id, (struct sockaddr *)&hca_ptr->hca_address); if ((ret) || (cm_id->verbs == NULL)) { rdma_destroy_id(cm_id); - dapl_dbg_log(DAPL_DBG_TYPE_UTIL, - " open_hca: ERR bind (%d) %s \n", - ret, strerror(-ret)); + dapl_log(DAPL_DBG_TYPE_ERR, + " open_hca: rdma_bind ERR %s." + " Is %s configured?\n", + strerror(errno),hca_name); return DAT_INVALID_ADDRESS; } @@ -282,9 +284,9 @@ DAT_RETURN dapls_ib_open_hca(IN IB_HCA_NAME hca_name, IN DAPL_HCA *hca_ptr) hca_ptr->ib_trans.ib_cq = ibv_create_comp_channel(hca_ptr->ib_hca_handle); if (hca_ptr->ib_trans.ib_cq == NULL) { - dapl_dbg_log (DAPL_DBG_TYPE_ERR, - " open_hca: ERR with CQ channel: %s\n", - strerror(errno)); + dapl_log(DAPL_DBG_TYPE_ERR, + " open_hca: ibv_create_comp_channel ERR %s\n", + strerror(errno)); goto bail; } dapl_dbg_log (DAPL_DBG_TYPE_UTIL, @@ -294,9 +296,10 @@ DAT_RETURN dapls_ib_open_hca(IN IB_HCA_NAME hca_name, IN DAPL_HCA *hca_ptr) opts = fcntl(hca_ptr->ib_trans.ib_cq->fd, F_GETFL); /* uCQ */ if (opts < 0 || fcntl(hca_ptr->ib_trans.ib_cq->fd, F_SETFL, opts | O_NONBLOCK) < 0) { - dapl_dbg_log (DAPL_DBG_TYPE_ERR, - " open_hca: ERR with CQ FD (%d)\n", - hca_ptr->ib_trans.ib_cq->fd); + dapl_log(DAPL_DBG_TYPE_ERR, + " open_hca: fcntl on ib_cq->fd %d ERR %d %s\n", + hca_ptr->ib_trans.ib_cq->fd, opts, + strerror(errno)); goto bail; } @@ -453,19 +456,13 @@ DAT_RETURN dapls_ib_query_hca(IN DAPL_HCA *hca_ptr, ia_attr->ia_address_ptr = (DAT_IA_ADDRESS_PTR)&hca_ptr->hca_address; - dapl_dbg_log(DAPL_DBG_TYPE_UTIL, - " query_hca: %s %s %d.%d.%d.%d\n", hca_ptr->name, + dapl_log(DAPL_DBG_TYPE_UTIL, + "dapl_query_hca: %s %s %s\n", hca_ptr->name, ((struct sockaddr_in *) ia_attr->ia_address_ptr)->sin_family == AF_INET ? "AF_INET":"AF_INET6", - ((struct sockaddr_in *) - ia_attr->ia_address_ptr)->sin_addr.s_addr >> 0 & 0xff, - ((struct sockaddr_in *) - ia_attr->ia_address_ptr)->sin_addr.s_addr >> 8 & 0xff, - ((struct sockaddr_in *) - ia_attr->ia_address_ptr)->sin_addr.s_addr >> 16 & 0xff, - ((struct sockaddr_in *) - ia_attr->ia_address_ptr)->sin_addr.s_addr >> 24 & 0xff); + inet_ntoa(((struct sockaddr_in *) + ia_attr->ia_address_ptr)->sin_addr)); ia_attr->hardware_version_major = dev_attr.hw_ver; ia_attr->max_eps = dev_attr.max_qp; @@ -500,14 +497,15 @@ DAT_RETURN dapls_ib_query_hca(IN DAPL_HCA *hca_ptr, ia_attr->extension_supported = DAT_EXTENSION_IB; ia_attr->extension_version = DAT_IB_EXTENSION_VERSION; #endif - dapl_dbg_log(DAPL_DBG_TYPE_UTIL, - " query_hca: (ver=%x) ep %d ep_q %d evd %d evd_q %d\n", + dapl_log(DAPL_DBG_TYPE_UTIL, + "dapl_query_hca: (ver=%x) ep's %d ep_q %d" + " evd's %d evd_q %d\n", ia_attr->hardware_version_major, ia_attr->max_eps, ia_attr->max_dto_per_ep, ia_attr->max_evds, ia_attr->max_evd_qlen ); - dapl_dbg_log(DAPL_DBG_TYPE_UTIL, - " query_hca: msg %llu rdma %llu iov %d lmr %d rmr %d" - " rd_io %d inline=%d\n", + dapl_log(DAPL_DBG_TYPE_UTIL, + "dapl_query_hca: msg %llu rdma %llu iov's %d" + " lmr %d rmr %d rd_io %d inline=%d\n", ia_attr->max_mtu_size, ia_attr->max_rdma_size, ia_attr->max_iov_segments_per_dto, ia_attr->max_lmrs, ia_attr->max_rmrs, ia_attr->max_rdma_read_per_ep_in, @@ -526,8 +524,9 @@ DAT_RETURN dapls_ib_query_hca(IN DAPL_HCA *hca_ptr, ep_attr->max_rdma_read_out= dev_attr.max_qp_rd_atom; ep_attr->max_rdma_read_iov= dev_attr.max_sge; ep_attr->max_rdma_write_iov= dev_attr.max_sge; - dapl_dbg_log(DAPL_DBG_TYPE_UTIL, - " query_hca: MAX msg %llu dto %d iov %d rdma i%d,o%d\n", + dapl_log(DAPL_DBG_TYPE_UTIL, + "dapl_query_hca: MAX msg %llu dto %d iov %d" + " rdma i%d,o%d\n", ep_attr->max_mtu_size, ep_attr->max_recv_dtos, ep_attr->max_recv_iov, ep_attr->max_rdma_read_in, ep_attr->max_rdma_read_out); @@ -708,9 +707,9 @@ void dapli_async_event_cb(struct _ib_hca_transport *hca) struct dapl_ep *evd_ptr = event.element.cq->cq_context; - dapl_dbg_log( - DAPL_DBG_TYPE_WARN, - " async_event CQ (%p) ERR %d\n", + dapl_log( + DAPL_DBG_TYPE_ERR, + "dapl async_event CQ (%p) ERR %d\n", evd_ptr, event.event_type); /* report up if async callback still setup */ @@ -724,7 +723,7 @@ void dapli_async_event_cb(struct _ib_hca_transport *hca) case IBV_EVENT_COMM_EST: { /* Received msgs on connected QP before RTU */ - dapl_dbg_log( + dapl_log( DAPL_DBG_TYPE_UTIL, " async_event COMM_EST(%p) rdata beat RTU\n", event.element.qp); @@ -742,9 +741,9 @@ void dapli_async_event_cb(struct _ib_hca_transport *hca) struct dapl_ep *ep_ptr = event.element.qp->qp_context; - dapl_dbg_log( - DAPL_DBG_TYPE_WARN, - " async_event QP (%p) ERR %d\n", + dapl_log( + DAPL_DBG_TYPE_ERR, + "dapl async_event QP (%p) ERR %d\n", ep_ptr, event.event_type); /* report up if async callback still setup */ @@ -764,8 +763,8 @@ void dapli_async_event_cb(struct _ib_hca_transport *hca) case IBV_EVENT_PKEY_CHANGE: case IBV_EVENT_SM_CHANGE: { - dapl_dbg_log(DAPL_DBG_TYPE_WARN, - " async_event: DEV ERR %d\n", + dapl_log(DAPL_DBG_TYPE_WARN, + "dapl async_event: DEV ERR %d\n", event.event_type); /* report up if async callback still setup */ @@ -778,13 +777,13 @@ void dapli_async_event_cb(struct _ib_hca_transport *hca) } case IBV_EVENT_CLIENT_REREGISTER: /* no need to report this event this time */ - dapl_dbg_log (DAPL_DBG_TYPE_WARN, + dapl_log (DAPL_DBG_TYPE_UTIL, " async_event: IBV_EVENT_CLIENT_REREGISTER\n"); break; default: - dapl_dbg_log (DAPL_DBG_TYPE_WARN, - " async_event: %d UNKNOWN\n", + dapl_log (DAPL_DBG_TYPE_WARN, + "dapl async_event: %d UNKNOWN\n", event.event_type); break; diff --git a/dapl/udapl/dapl_init.c b/dapl/udapl/dapl_init.c index ce92f9f..a4afba5 100644 --- a/dapl/udapl/dapl_init.c +++ b/dapl/udapl/dapl_init.c @@ -70,16 +70,19 @@ void dapl_init ( void ) { DAT_RETURN dat_status; -#if defined(DAPL_DBG) - dapl_dbg_log (DAPL_DBG_TYPE_UTIL, "DAPL: (dapl_init)\n"); - /* set up debug type */ g_dapl_dbg_type = dapl_os_get_env_val ( "DAPL_DBG_TYPE", - DAPL_DBG_TYPE_ERR | DAPL_DBG_TYPE_WARN); + DAPL_DBG_TYPE_ERR ); /* set up debug destination */ g_dapl_dbg_dest = dapl_os_get_env_val ( "DAPL_DBG_DEST", DAPL_DBG_DEST_STDOUT ); -#endif /* DAPL_DBG */ + + /* open log file on first logging call if necessary */ + if (g_dapl_dbg_dest & DAPL_DBG_DEST_SYSLOG) + openlog("libdapl", LOG_ODELAY|LOG_PID|LOG_CONS, LOG_USER); + + dapl_log (DAPL_DBG_TYPE_UTIL, "dapl_init: dbg_type=0x%x,dbg_dest=0x%x\n", + g_dapl_dbg_type, g_dapl_dbg_dest); /* See if the user is on a loopback setup */ g_dapl_loopback_connection = dapl_os_get_env_bool ( "DAPL_LOOPBACK" ); @@ -156,6 +159,9 @@ void dapl_fini ( void ) dapl_dbg_log (DAPL_DBG_TYPE_UTIL, "DAPL: Exit (dapl_fini)\n"); + if (g_dapl_dbg_dest & DAPL_DBG_DEST_SYSLOG) + closelog(); + return; } diff --git a/dapl/udapl/linux/dapl_osd.h b/dapl/udapl/linux/dapl_osd.h index caf971f..42ced41 100644 --- a/dapl/udapl/linux/dapl_osd.h +++ b/dapl/udapl/linux/dapl_osd.h @@ -541,7 +541,7 @@ dapl_os_strtol(const char *nptr, char **endptr, int base) #define dapl_os_assert(expression) assert(expression) #define dapl_os_printf(...) printf(__VA_ARGS__) #define dapl_os_vprintf(fmt,args) vprintf(fmt,args) -#define dapl_os_syslog(fmt,args) vsyslog (LOG_USER | LOG_DEBUG,fmt,args) +#define dapl_os_syslog(fmt,args) vsyslog(LOG_USER|LOG_WARNING,fmt,args) -- 1.5.2.5 From arlin.r.davis at intel.com Fri Apr 4 16:41:17 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Fri, 4 Apr 2008 16:41:17 -0700 Subject: [ofa-general] [PATCH 4/4][v2] dapl: update vendor information for OFA v2 provider Message-ID: Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/include/dapl_vendor.h | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/dapl/include/dapl_vendor.h b/dapl/include/dapl_vendor.h index e87467a..f6d3cc0 100644 --- a/dapl/include/dapl_vendor.h +++ b/dapl/include/dapl_vendor.h @@ -52,14 +52,14 @@ * Product name of the adapter. * Returned in DAT_IA_ATTR.adapter_name */ -#define VN_ADAPTER_NAME "Generic InfiniBand HCA" +#define VN_ADAPTER_NAME "Generic OpenFabrics HCA" /* * Vendor name * Returned in DAT_IA_ATTR.vendor_name */ -#define VN_VENDOR_NAME "DAPL Reference Implementation" +#define VN_VENDOR_NAME "DAPL OpenFabrics Implementation" /********************************************************************** @@ -78,7 +78,7 @@ * DAT_PROVIDER_ATTR.provider_version_minor */ -#define VN_PROVIDER_MAJOR 1 +#define VN_PROVIDER_MAJOR 2 #define VN_PROVIDER_MINOR 0 /* -- 1.5.2.5 From arlin.r.davis at intel.com Fri Apr 4 16:41:05 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Fri, 4 Apr 2008 16:41:05 -0700 Subject: [ofa-general] [PATCH 3/4][v2] dapl: add provider vendor revision data in private data with reject Message-ID: Add 1 byte header containing provider/vendor major revision to distinguish between consumer and non-consumer rejects. Validate size of consumer reject privated data. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/openib_cma/dapl_ib_cm.c | 39 ++++++++++++++++++++++++++++++++------- dapl/openib_cma/dapl_ib_util.h | 2 +- 2 files changed, 33 insertions(+), 8 deletions(-) diff --git a/dapl/openib_cma/dapl_ib_cm.c b/dapl/openib_cma/dapl_ib_cm.c index 33f299d..dcdcc5b 100755 --- a/dapl/openib_cma/dapl_ib_cm.c +++ b/dapl/openib_cma/dapl_ib_cm.c @@ -45,6 +45,7 @@ #include "dapl_cr_util.h" #include "dapl_name_service.h" #include "dapl_ib_util.h" +#include "dapl_vendor.h" #include #include #include @@ -79,6 +80,14 @@ static inline uint64_t cpu_to_be64(uint64_t x) { return x; } #define PORT_TO_SID(p) ntohs(p) +/* private data header to validate consumer rejects versus abnormal events */ +struct dapl_pdata_hdr { + uint8_t version; +}; +static struct dapl_pdata_hdr pdata_hdr = { + .version = VN_PROVIDER_MAJOR +}; + static void dapli_addr_resolve(struct dapl_cm_id *conn) { int ret; @@ -900,6 +909,7 @@ dapls_ib_reject_connection( IN const DAT_PVOID private_data) { int ret; + int offset = sizeof(struct dapl_pdata_hdr); dapl_dbg_log(DAPL_DBG_TYPE_CM, " reject(cm_handle %p reason %x)\n", @@ -909,14 +919,29 @@ dapls_ib_reject_connection( dapl_dbg_log(DAPL_DBG_TYPE_ERR, " reject: invalid handle: reason %d\n", reason); - return DAT_SUCCESS; + return DAT_ERROR (DAT_INVALID_HANDLE,DAT_INVALID_HANDLE_CR); } - + + if (private_data_size > + dapls_ib_private_data_size( + NULL, IB_MAX_REJ_PDATA_SIZE, cm_handle->hca)) + return DAT_ERROR(DAT_INVALID_PARAMETER, DAT_INVALID_ARG3); + + /* setup pdata_hdr and users data, in CR pdata buffer */ + dapl_os_memcpy(cm_handle->p_data, &pdata_hdr, offset); + if (private_data_size) + dapl_os_memcpy(cm_handle->p_data+offset, + private_data, + private_data_size); + /* - * Private data is needed so peer can determine real application - * reject from an abnormal application termination + * Always some private data with reject so active peer can + * determine real application reject from an abnormal + * application termination */ - ret = rdma_reject(cm_handle->cm_id, NULL, 0); + ret = rdma_reject(cm_handle->cm_id, + cm_handle->p_data, + offset+private_data_size); dapli_destroy_conn(cm_handle); return dapl_convert_errno(ret, "reject"); @@ -1005,7 +1030,7 @@ int dapls_ib_private_data_size( IN DAPL_PRIVATE *prd_ptr, if (hca_ptr->ib_hca_handle->device->transport_type == IBV_TRANSPORT_IWARP) - return(IWARP_MAX_PDATA_SIZE); + return(IWARP_MAX_PDATA_SIZE-sizeof(struct dapl_pdata_hdr)); switch(conn_op) { @@ -1016,7 +1041,7 @@ int dapls_ib_private_data_size( IN DAPL_PRIVATE *prd_ptr, size = IB_MAX_REP_PDATA_SIZE; break; case DAPL_PDATA_CONN_REJ: - size = IB_MAX_REJ_PDATA_SIZE; + size = IB_MAX_REJ_PDATA_SIZE-sizeof(struct dapl_pdata_hdr); break; case DAPL_PDATA_CONN_DREQ: size = IB_MAX_DREQ_PDATA_SIZE; diff --git a/dapl/openib_cma/dapl_ib_util.h b/dapl/openib_cma/dapl_ib_util.h index f35cb9d..370f3b1 100755 --- a/dapl/openib_cma/dapl_ib_util.h +++ b/dapl/openib_cma/dapl_ib_util.h @@ -181,7 +181,7 @@ struct dapl_cm_id { struct rdma_conn_param params; DAT_SOCK_ADDR6 r_addr; int p_len; - unsigned char p_data[IB_MAX_DREP_PDATA_SIZE]; + unsigned char p_data[256]; /* dapl max private data size */ }; typedef struct dapl_cm_id *dp_ib_cm_handle_t; -- 1.5.2.5 From bs at q-leap.de Fri Apr 4 16:45:47 2008 From: bs at q-leap.de (Bernd Schubert) Date: Sat, 5 Apr 2008 01:45:47 +0200 Subject: [ofa-general] XmtDiscards In-Reply-To: <20080404152932.5e294e47.weiny2@llnl.gov> References: <200804050012.39893.bs@q-leap.de> <20080404152932.5e294e47.weiny2@llnl.gov> Message-ID: <20080404234547.GA17618@lanczos.q-leap.de> On Fri, Apr 04, 2008 at 03:29:32PM -0700, Ira Weiny wrote: > On Sat, 5 Apr 2008 00:12:39 +0200 > Bernd Schubert wrote: > > > Hello, > > > > after I upgraded one of our clusters to opensm-3.2.1 it seems to have gotten > > much better there, at least no further RcvSwRelayErrors, even when the > > cluster is in idle state and so far also no SymbolErrors, which we also have > > seens before. > > > > However, after I just started a lustre stress test on 50 clients (to a lustre > > storage system with 20 OSS servers and 60 OSTs), ibcheckerrors reports about > > 9000 XmtDiscards within 30 minutes. > > Yea, those are bad. > > > > > Searching for this error I find "This is a symptom of congestion and may > > require tweaking either HOQ or switch lifetime values". > > Well, I have to admit I neither know what HOQ is, nor do I know how to tweak > > it. I also do not have an idea to set switch lifetime values. I guess this > > isn't related to the opensm timeout option, is it? > > Yes you should adjust these values. > > > > > Hmm, I just found a cisci pdf describing how to set the lifetime on these > > switches, but is this also possible on Flextronics switches? > > > > I don't know about the Vendor SMs but in opensm look for the following options > in the opensm.opts file (Default path is: /var/cache/opensm): > > # The code of maximal time a packet can wait at the head of > # transmission queue. > # The actual time is 4.096usec * 2^ > # The value 0x14 disables this mechanism > head_of_queue_lifetime 0x12 > > # The maximal time a packet can wait at the head of queue on > # switch port connected to a CA or router port > leaf_head_of_queue_lifetime 0x0c Hmm, I first increased head_of_queue_lifetime to 0x13 and leaf_head_of_queue_lifetime to 0x20, but this didn't make the error go away. So I increased head_of_queue_lifetime to 0x15 and leaf_head_of_queue_lifetime to 0x50, but this made the fabric to entirely crash. On the node of the master opensm I got an endless number of messages like these: Apr 5 01:35:03 pfs1n2 kernel: [705448.344542] NETDEV WATCHDOG: ib0: transmit timed out Apr 5 01:35:03 pfs1n2 kernel: [705448.349814] ib0: transmit timeout: latency 411908 msecs Apr 5 01:35:03 pfs1n2 kernel: [705448.355364] ib0: queue stopped 1, tx_head 441, tx_tail 377 Apr 5 01:35:04 pfs1n2 kernel: [705449.343495] NETDEV WATCHDOG: ib0: transmit timed out The slave opensm also went into D-state and is not killable anymore :( Seems I have to be very careful with these settings... Thanks for your help, Bernd From arlin.r.davis at intel.com Fri Apr 4 16:40:10 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Fri, 4 Apr 2008 16:40:10 -0700 Subject: [ofa-general] [PATCH 1/4][v2] dapl: add support for private data in CR reject. Message-ID: <000001c896ad$3b2d6b00$14fd070a@amr.corp.intel.com> Private data support via dat_cr_reject was added to the v2 DAT specification but dapl was never extended to support at the provider level. Add support in OFA uDAPL provider. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/common/dapl_adapter_util.h | 6 ++++-- dapl/common/dapl_cr_callback.c | 9 ++++++--- dapl/common/dapl_cr_reject.c | 3 ++- dapl/ibal-scm/dapl_ibal-scm_cm.c | 4 +++- dapl/ibal/dapl_ibal_cm.c | 4 +++- dapl/openib/dapl_ib_cm.c | 4 +++- dapl/openib_cma/dapl_ib_cm.c | 6 +++++- dapl/openib_scm/dapl_ib_cm.c | 4 +++- 8 files changed, 29 insertions(+), 11 deletions(-) diff --git a/dapl/common/dapl_adapter_util.h b/dapl/common/dapl_adapter_util.h index d664bf6..43175a9 100755 --- a/dapl/common/dapl_adapter_util.h +++ b/dapl/common/dapl_adapter_util.h @@ -112,8 +112,10 @@ DAT_RETURN dapls_ib_accept_connection ( IN const DAT_PVOID private_data); DAT_RETURN dapls_ib_reject_connection ( - IN dp_ib_cm_handle_t cm_handle, - IN int reject_reason); + IN dp_ib_cm_handle_t cm_handle, + IN int reject_reason, + IN DAT_COUNT private_data_size, + IN const DAT_PVOID private_data); DAT_RETURN dapls_ib_setup_async_callback ( IN DAPL_IA *ia_ptr, diff --git a/dapl/common/dapl_cr_callback.c b/dapl/common/dapl_cr_callback.c index 46d2b4c..aafdbfb 100644 --- a/dapl/common/dapl_cr_callback.c +++ b/dapl/common/dapl_cr_callback.c @@ -173,7 +173,8 @@ dapls_cr_callback ( dapl_dbg_log (DAPL_DBG_TYPE_CM, "---> dapls_cr_callback: conn event on down SP\n"); (void)dapls_ib_reject_connection (ib_cm_handle, - DAT_CONNECTION_EVENT_UNREACHABLE ); + DAT_CONNECTION_EVENT_UNREACHABLE, + 0, NULL); return; } @@ -300,7 +301,8 @@ dapls_cr_callback ( { /* The event post failed; take appropriate action. */ (void)dapls_ib_reject_connection ( ib_cm_handle, - DAT_CONNECTION_EVENT_BROKEN); + DAT_CONNECTION_EVENT_BROKEN, + 0, NULL); return; } @@ -456,7 +458,8 @@ dapli_connection_request ( { dapls_cr_free (cr_ptr); (void)dapls_ib_reject_connection (ib_cm_handle, - DAT_CONNECTION_EVENT_BROKEN); + DAT_CONNECTION_EVENT_BROKEN, + 0, NULL); /* Take the CR off the list, we can't use it */ dapl_os_lock (&sp_ptr->header.lock); diff --git a/dapl/common/dapl_cr_reject.c b/dapl/common/dapl_cr_reject.c index d6842b3..029cdfa 100755 --- a/dapl/common/dapl_cr_reject.c +++ b/dapl/common/dapl_cr_reject.c @@ -97,7 +97,8 @@ dapl_cr_reject ( } dat_status = dapls_ib_reject_connection ( cr_ptr->ib_cm_handle, - IB_CM_REJ_REASON_CONSUMER_REJ ); + IB_CM_REJ_REASON_CONSUMER_REJ, + pdata_size, pdata ); if ( dat_status != DAT_SUCCESS) { diff --git a/dapl/ibal-scm/dapl_ibal-scm_cm.c b/dapl/ibal-scm/dapl_ibal-scm_cm.c index fcf5215..df83008 100644 --- a/dapl/ibal-scm/dapl_ibal-scm_cm.c +++ b/dapl/ibal-scm/dapl_ibal-scm_cm.c @@ -951,7 +951,9 @@ dapls_ib_accept_connection ( DAT_RETURN dapls_ib_reject_connection ( IN dp_ib_cm_handle_t ib_cm_handle, - IN int reject_reason ) + IN int reject_reason, + IN DAT_COUNT private_data_size, + IN const DAT_PVOID private_data) { ib_cm_srvc_handle_t cm_ptr = ib_cm_handle; diff --git a/dapl/ibal/dapl_ibal_cm.c b/dapl/ibal/dapl_ibal_cm.c index 6cd652f..a986430 100644 --- a/dapl/ibal/dapl_ibal_cm.c +++ b/dapl/ibal/dapl_ibal_cm.c @@ -1228,7 +1228,9 @@ dapls_ib_remove_conn_listener ( */ DAT_RETURN dapls_ib_reject_connection ( IN dp_ib_cm_handle_t ib_cm_handle, - IN int reject_reason ) + IN int reject_reason, + IN DAT_COUNT private_data_size, + IN const DAT_PVOID private_data) { ib_api_status_t ib_status; ib_cm_rej_t cm_rej; diff --git a/dapl/openib/dapl_ib_cm.c b/dapl/openib/dapl_ib_cm.c index 76d5968..c887a0b 100644 --- a/dapl/openib/dapl_ib_cm.c +++ b/dapl/openib/dapl_ib_cm.c @@ -971,7 +971,9 @@ reject: DAT_RETURN dapls_ib_reject_connection ( IN ib_cm_handle_t cm_handle, - IN int reject_reason ) + IN int reject_reason, + IN DAT_COUNT private_data_size, + IN const DAT_PVOID private_data) { int status; diff --git a/dapl/openib_cma/dapl_ib_cm.c b/dapl/openib_cma/dapl_ib_cm.c index cf79142..a040ffb 100755 --- a/dapl/openib_cma/dapl_ib_cm.c +++ b/dapl/openib_cma/dapl_ib_cm.c @@ -890,7 +890,11 @@ bail: * */ DAT_RETURN -dapls_ib_reject_connection(IN dp_ib_cm_handle_t cm_handle, IN int reason) +dapls_ib_reject_connection( + IN dp_ib_cm_handle_t cm_handle, + IN int reason, + IN DAT_COUNT private_data_size, + IN const DAT_PVOID private_data) { int ret; diff --git a/dapl/openib_scm/dapl_ib_cm.c b/dapl/openib_scm/dapl_ib_cm.c index 485ab9b..94e3959 100644 --- a/dapl/openib_scm/dapl_ib_cm.c +++ b/dapl/openib_scm/dapl_ib_cm.c @@ -759,7 +759,9 @@ dapls_ib_accept_connection ( DAT_RETURN dapls_ib_reject_connection ( IN ib_cm_handle_t ib_cm_handle, - IN int reject_reason ) + IN int reject_reason, + IN DAT_COUNT private_data_size, + IN const DAT_PVOID private_data) { ib_cm_srvc_handle_t cm_ptr = ib_cm_handle; -- 1.5.2.5 From sean.hefty at intel.com Fri Apr 4 16:53:09 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 4 Apr 2008 16:53:09 -0700 Subject: [ofa-general] [PATCH 3/4][v2] dapl: add provider vendor revisiondata in private data with reject In-Reply-To: References: Message-ID: <001301c896af$09bacf20$3c98070a@amr.corp.intel.com> >Add 1 byte header containing provider/vendor major revision >to distinguish between consumer and non-consumer rejects. >Validate size of consumer reject privated data. Not saying this is a bad idea, but doesn't it break the protocol with existing DAPL? It also shifts all of the existing private data off by a byte, which could result in odd data alignment. - Sean From andrea at qumranet.com Fri Apr 4 17:23:30 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Sat, 5 Apr 2008 02:23:30 +0200 Subject: [ofa-general] Re: [PATCH] mmu notifier #v11 In-Reply-To: References: <20080402220148.GV19189@duo.random> <20080402221716.GY19189@duo.random> <20080403151908.GB9603@duo.random> <20080404202055.GA14784@duo.random> Message-ID: <20080405002330.GF14784@duo.random> On Fri, Apr 04, 2008 at 03:06:18PM -0700, Christoph Lameter wrote: > Adds some comments. Still objectionable is the multiple ways of > invalidating pages in #v11. Callout now has similar locking to emm. range_begin exists because range_end is called after the page has already been freed. invalidate_page is called _before_ the page is freed but _after_ the pte has been zapped. In short when working with single pages it's a waste to block the secondary-mmu page fault, because it's zero cost to invalidate_page before put_page. Not even GRU need to do that. Instead for the multiple-pte-zapping we have to call range_end _after_ the page is already freed. This is so that there is a single range_end call for an huge amount of address space. So we need a range_begin for the subsystems not using page pinning for example. When working with single pages (try_to_unmap_one, do_wp_page) invalidate_page avoids to block the secondary mmu page fault, and it's in turn faster. Besides avoiding need of serializing the secondary mmu page fault, invalidate_page also reduces the overhead when the mmu notifiers are disarmed (i.e. kvm not running). From andrea at qumranet.com Fri Apr 4 17:41:27 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Sat, 5 Apr 2008 02:41:27 +0200 Subject: [ofa-general] Re: [patch 01/10] emm: mm_lock: Lock a process against reclaim In-Reply-To: <47F6B5EA.6060106@goop.org> References: <20080404223048.374852899@sgi.com> <20080404223131.271668133@sgi.com> <47F6B5EA.6060106@goop.org> Message-ID: <20080405004127.GG14784@duo.random> On Fri, Apr 04, 2008 at 04:12:42PM -0700, Jeremy Fitzhardinge wrote: > I think you can break this if() down a bit: > > if (!(vma->vm_file && vma->vm_file->f_mapping)) > continue; It makes no difference at runtime, coding style preferences are quite subjective. > So this is an O(n^2) algorithm to take the i_mmap_locks from low to high > order? A comment would be nice. And O(n^2)? Ouch. How often is it > called? It's called a single time when the mmu notifier is registered. It's a very slow path of course. Any other approach to reduce the complexity would require memory allocations and it would require mmu_notifier_register to return -ENOMEM failure. It didn't seem worth it. > And is it necessary to mush lock and unlock together? Unlock ordering > doesn't matter, so you should just be able to have a much simpler loop, no? That avoids duplicating .text. Originally they were separated. unlock can't be a simpler loop because I didn't reserve vm_flags bitflags to do a single O(N) loop for unlock. If you do malloc+fork+munmap two vmas will point to the same anon-vma lock, that's why the unlock isn't simpler unless I mark what I locked with a vm_flags bitflag. From boris at mellanox.com Fri Apr 4 17:48:18 2008 From: boris at mellanox.com (Boris Shpolyansky) Date: Fri, 4 Apr 2008 17:48:18 -0700 Subject: [ofa-general] XmtDiscards In-Reply-To: <20080404234547.GA17618@lanczos.q-leap.de> Message-ID: <1E3DCD1C63492545881FACB6063A57C1023F6B30@mtiexch01.mti.com> Bernd, 0x14 is the maximal value for HOQ lifetime, which effectively disables the mechanism. I think you shouldn't exceed this value. Boris -----Original Message----- From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Bernd Schubert Sent: Friday, April 04, 2008 4:46 PM To: Ira Weiny Cc: general at lists.openfabrics.org Subject: Re: [ofa-general] XmtDiscards On Fri, Apr 04, 2008 at 03:29:32PM -0700, Ira Weiny wrote: > On Sat, 5 Apr 2008 00:12:39 +0200 > Bernd Schubert wrote: > > > Hello, > > > > after I upgraded one of our clusters to opensm-3.2.1 it seems to > > have gotten much better there, at least no further RcvSwRelayErrors, > > even when the cluster is in idle state and so far also no > > SymbolErrors, which we also have seens before. > > > > However, after I just started a lustre stress test on 50 clients (to > > a lustre storage system with 20 OSS servers and 60 OSTs), > > ibcheckerrors reports about 9000 XmtDiscards within 30 minutes. > > Yea, those are bad. > > > > > Searching for this error I find "This is a symptom of congestion and > > may require tweaking either HOQ or switch lifetime values". > > Well, I have to admit I neither know what HOQ is, nor do I know how > > to tweak it. I also do not have an idea to set switch lifetime > > values. I guess this isn't related to the opensm timeout option, is it? > > Yes you should adjust these values. > > > > > Hmm, I just found a cisci pdf describing how to set the lifetime on > > these switches, but is this also possible on Flextronics switches? > > > > I don't know about the Vendor SMs but in opensm look for the following > options in the opensm.opts file (Default path is: /var/cache/opensm): > > # The code of maximal time a packet can wait at the head of > # transmission queue. > # The actual time is 4.096usec * 2^ > # The value 0x14 disables this mechanism > head_of_queue_lifetime 0x12 > > # The maximal time a packet can wait at the head of queue on > # switch port connected to a CA or router port > leaf_head_of_queue_lifetime 0x0c Hmm, I first increased head_of_queue_lifetime to 0x13 and leaf_head_of_queue_lifetime to 0x20, but this didn't make the error go away. So I increased head_of_queue_lifetime to 0x15 and leaf_head_of_queue_lifetime to 0x50, but this made the fabric to entirely crash. On the node of the master opensm I got an endless number of messages like these: Apr 5 01:35:03 pfs1n2 kernel: [705448.344542] NETDEV WATCHDOG: ib0: transmit timed out Apr 5 01:35:03 pfs1n2 kernel: [705448.349814] ib0: transmit timeout: latency 411908 msecs Apr 5 01:35:03 pfs1n2 kernel: [705448.355364] ib0: queue stopped 1, tx_head 441, tx_tail 377 Apr 5 01:35:04 pfs1n2 kernel: [705449.343495] NETDEV WATCHDOG: ib0: transmit timed out The slave opensm also went into D-state and is not killable anymore :( Seems I have to be very careful with these settings... Thanks for your help, Bernd _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From andrea at qumranet.com Fri Apr 4 17:57:59 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Sat, 5 Apr 2008 02:57:59 +0200 Subject: [ofa-general] Re: [patch 02/10] emm: notifier logic In-Reply-To: <20080404223131.469710551@sgi.com> References: <20080404223048.374852899@sgi.com> <20080404223131.469710551@sgi.com> Message-ID: <20080405005759.GH14784@duo.random> On Fri, Apr 04, 2008 at 03:30:50PM -0700, Christoph Lameter wrote: > + mm_lock(mm); > + e->next = mm->emm_notifier; > + /* > + * The update to emm_notifier (e->next) must be visible > + * before the pointer becomes visible. > + * rcu_assign_pointer() does exactly what we need. > + */ > + rcu_assign_pointer(mm->emm_notifier, e); > + mm_unlock(mm); My mm_lock solution makes all rcu serialization an unnecessary overhead so you should remove it like I already did in #v11. If it wasn't the case, then mm_lock wouldn't be a definitive fix for the race. > + e = rcu_dereference(e->next); Same here. From arlin.r.davis at intel.com Fri Apr 4 23:52:04 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Fri, 4 Apr 2008 23:52:04 -0700 Subject: [ofa-general] [PATCH 3/4][v2] dapl: add provider vendor revisiondata in private data with reject In-Reply-To: <001301c896af$09bacf20$3c98070a@amr.corp.intel.com> References: <001301c896af$09bacf20$3c98070a@amr.corp.intel.com> Message-ID: >>Add 1 byte header containing provider/vendor major revision >>to distinguish between consumer and non-consumer rejects. >>Validate size of consumer reject privated data. > >Not saying this is a bad idea, but doesn't it break the >protocol with existing >DAPL? It also shifts all of the existing private data off by >a byte, which >could result in odd data alignment. If the cma/cm could guarantee that IB_CM_REJ_CONSUMER_DEFINED is always an indication of true consumer called reject versus abnormal termination then I would not need to add the provider header in reject private data. Anyway, private data delivery in rejects is new for DAT v2 and exposed for the first time with this patch set. There is no compatibility issue with existing DAPL because reject private data has been ignored up until this point. I will adjust for odd data alignment. Thanks for the feedback, -arlin . From hrosenstock at xsigo.com Sat Apr 5 06:17:59 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Sat, 05 Apr 2008 06:17:59 -0700 Subject: [ofa-general] XmtDiscards In-Reply-To: <1E3DCD1C63492545881FACB6063A57C1023F6B30@mtiexch01.mti.com> References: <1E3DCD1C63492545881FACB6063A57C1023F6B30@mtiexch01.mti.com> Message-ID: <1207401479.15625.221.camel@hrosenstock-ws.xsigo.com> On Fri, 2008-04-04 at 17:48 -0700, Boris Shpolyansky wrote: > Bernd, > > 0x14 is the maximal value for HOQ lifetime, which effectively disables > the mechanism. I think you shouldn't exceed this value. True about the maximal value but any 5 bit value > 19 (up through 31) should effectively be the same thing according to the spec. I also think that OpenSM could do a better job validating and setting this and other similar optional parameters. -- Hal > Boris > > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Bernd > Schubert > Sent: Friday, April 04, 2008 4:46 PM > To: Ira Weiny > Cc: general at lists.openfabrics.org > Subject: Re: [ofa-general] XmtDiscards > > On Fri, Apr 04, 2008 at 03:29:32PM -0700, Ira Weiny wrote: > > On Sat, 5 Apr 2008 00:12:39 +0200 > > Bernd Schubert wrote: > > > > > Hello, > > > > > > after I upgraded one of our clusters to opensm-3.2.1 it seems to > > > have gotten much better there, at least no further RcvSwRelayErrors, > > > > even when the cluster is in idle state and so far also no > > > SymbolErrors, which we also have seens before. > > > > > > However, after I just started a lustre stress test on 50 clients (to > > > > a lustre storage system with 20 OSS servers and 60 OSTs), > > > ibcheckerrors reports about 9000 XmtDiscards within 30 minutes. > > > > Yea, those are bad. > > > > > > > > Searching for this error I find "This is a symptom of congestion and > > > > may require tweaking either HOQ or switch lifetime values". > > > Well, I have to admit I neither know what HOQ is, nor do I know how > > > to tweak it. I also do not have an idea to set switch lifetime > > > values. I guess this isn't related to the opensm timeout option, is > it? > > > > Yes you should adjust these values. > > > > > > > > Hmm, I just found a cisci pdf describing how to set the lifetime on > > > these switches, but is this also possible on Flextronics switches? > > > > > > > I don't know about the Vendor SMs but in opensm look for the following > > > options in the opensm.opts file (Default path is: /var/cache/opensm): > > > > # The code of maximal time a packet can wait at the head of > > # transmission queue. > > # The actual time is 4.096usec * 2^ > > # The value 0x14 disables this mechanism > > head_of_queue_lifetime 0x12 > > > > # The maximal time a packet can wait at the head of queue on > > # switch port connected to a CA or router port > > leaf_head_of_queue_lifetime 0x0c > > Hmm, I first increased head_of_queue_lifetime to 0x13 and > leaf_head_of_queue_lifetime to 0x20, but this didn't make the error go > away. So I increased head_of_queue_lifetime to 0x15 and > leaf_head_of_queue_lifetime to 0x50, but this made the fabric to > entirely crash. On the node of the master opensm I got an endless number > of messages like these: > > Apr 5 01:35:03 pfs1n2 kernel: [705448.344542] NETDEV WATCHDOG: ib0: > transmit timed out Apr 5 01:35:03 pfs1n2 kernel: [705448.349814] ib0: > transmit timeout: latency 411908 msecs Apr 5 01:35:03 pfs1n2 kernel: > [705448.355364] ib0: queue stopped 1, tx_head 441, tx_tail 377 Apr 5 > 01:35:04 pfs1n2 kernel: [705449.343495] NETDEV WATCHDOG: ib0: transmit > timed out > > The slave opensm also went into D-state and is not killable anymore :( > > Seems I have to be very careful with these settings... > > > Thanks for your help, > Bernd > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From hrosenstock at xsigo.com Sat Apr 5 06:19:43 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Sat, 05 Apr 2008 06:19:43 -0700 Subject: [ofa-general] XmtDiscards In-Reply-To: <200804050012.39893.bs@q-leap.de> References: <200804050012.39893.bs@q-leap.de> Message-ID: <1207401583.15625.224.camel@hrosenstock-ws.xsigo.com> Hi Bernd, On Sat, 2008-04-05 at 00:12 +0200, Bernd Schubert wrote: > Hello, > > after I upgraded one of our clusters to opensm-3.2.1 it seems to have gotten > much better there, at least no further RcvSwRelayErrors, even when the > cluster is in idle state and so far also no SymbolErrors, which we also have > seens before. > > However, after I just started a lustre stress test on 50 clients (to a lustre > storage system with 20 OSS servers and 60 OSTs), ibcheckerrors reports about > 9000 XmtDiscards within 30 minutes. > > Searching for this error I find "This is a symptom of congestion and may > require tweaking either HOQ or switch lifetime values". > Well, I have to admit I neither know what HOQ is, nor do I know how to tweak > it. I also do not have an idea to set switch lifetime values. I guess this > isn't related to the opensm timeout option, is it? > > Hmm, I just found a cisci pdf describing how to set the lifetime on these > switches, but is this also possible on Flextronics switches? What routing algorithm are you using ? Rather than play with those switch values, if you are not using up/down, could you try that to see if it helps with the congestion you are seeing ? -- Hal > Thanks for any help, > Bernd From hrosenstock at xsigo.com Sat Apr 5 06:23:52 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Sat, 05 Apr 2008 06:23:52 -0700 Subject: [ofa-general] XmtDiscards In-Reply-To: <20080404234547.GA17618@lanczos.q-leap.de> References: <200804050012.39893.bs@q-leap.de> <20080404152932.5e294e47.weiny2@llnl.gov> <20080404234547.GA17618@lanczos.q-leap.de> Message-ID: <1207401832.15625.229.camel@hrosenstock-ws.xsigo.com> On Sat, 2008-04-05 at 01:45 +0200, Bernd Schubert wrote: > On Fri, Apr 04, 2008 at 03:29:32PM -0700, Ira Weiny wrote: > > On Sat, 5 Apr 2008 00:12:39 +0200 > > Bernd Schubert wrote: > > > > > Hello, > > > > > > after I upgraded one of our clusters to opensm-3.2.1 it seems to have gotten > > > much better there, at least no further RcvSwRelayErrors, even when the > > > cluster is in idle state and so far also no SymbolErrors, which we also have > > > seens before. > > > > > > However, after I just started a lustre stress test on 50 clients (to a lustre > > > storage system with 20 OSS servers and 60 OSTs), ibcheckerrors reports about > > > 9000 XmtDiscards within 30 minutes. > > > > Yea, those are bad. > > > > > > > > Searching for this error I find "This is a symptom of congestion and may > > > require tweaking either HOQ or switch lifetime values". > > > Well, I have to admit I neither know what HOQ is, nor do I know how to tweak > > > it. I also do not have an idea to set switch lifetime values. I guess this > > > isn't related to the opensm timeout option, is it? > > > > Yes you should adjust these values. > > > > > > > > Hmm, I just found a cisci pdf describing how to set the lifetime on these > > > switches, but is this also possible on Flextronics switches? > > > > > > > I don't know about the Vendor SMs but in opensm look for the following options > > in the opensm.opts file (Default path is: /var/cache/opensm): > > > > # The code of maximal time a packet can wait at the head of > > # transmission queue. > > # The actual time is 4.096usec * 2^ > > # The value 0x14 disables this mechanism > > head_of_queue_lifetime 0x12 > > > > # The maximal time a packet can wait at the head of queue on > > # switch port connected to a CA or router port > > leaf_head_of_queue_lifetime 0x0c > > Hmm, I first increased head_of_queue_lifetime to 0x13 and > leaf_head_of_queue_lifetime to 0x20, but this didn't make the error > go away. So I increased head_of_queue_lifetime to 0x15 and > leaf_head_of_queue_lifetime to 0x50, but this made the fabric to entirely > crash. On the node of the master opensm I got an endless number of messages > like these: > > Apr 5 01:35:03 pfs1n2 kernel: [705448.344542] NETDEV WATCHDOG: ib0: transmit timed out > Apr 5 01:35:03 pfs1n2 kernel: [705448.349814] ib0: transmit timeout: latency 411908 msecs > Apr 5 01:35:03 pfs1n2 kernel: [705448.355364] ib0: queue stopped 1, tx_head 441, tx_tail 377 > Apr 5 01:35:04 pfs1n2 kernel: [705449.343495] NETDEV WATCHDOG: ib0: transmit timed out > > The slave opensm also went into D-state and is not killable anymore :( > > Seems I have to be very careful with these settings... Yes, those settings are not for the faint of heart and one needs to really understand what changes to those parameters really mean. As far as the slave opensm behavior, this is worth understanding more IMO. -- Hal > Thanks for your help, > Bernd > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From canguera at gencat.net Sat Apr 5 08:31:31 2008 From: canguera at gencat.net (Michele Gibbs) Date: Sat, 5 Apr 2008 09:31:31 -0600 Subject: [ofa-general] AutoCAD 2008, Adobe Acrobat 8, Photoshop CS3 Message-ID: <977591215.96864692035982@gencat.net> Cheap and excellent software - too good to be true? Read information belowWir freuen uns darauf, Ihnen lokalisierte Versionen bekannter Programme anbieten zu können: Englisch, Deutsch, Französisch, Italienisch, Spanisch und viele andere Sprachen! Sofort nach dem Kauf können Sie jedes Programm herunterladen und installieren.http://weiloser.com* Office Enterprise 2007: $79.95 * Adobe Acrobat 8.0 Professional: $69.95 * AutoCAD 2008: $129.95 http://weiloser.com Wir haben mehr 300 verschiedener Programmes für PC und Macintosh! Kaufen jetzt, warten Sie nicht! -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ray at gmail.com Sat Apr 5 04:48:40 2008 From: Ray at gmail.com (Update Services) Date: Sat, 5 Apr 2008 04:48:40 -0700 Subject: [ofa-general] ***SPAM*** Next Step Message-ID: <983c66b16c201fcea354e055001942ce@gmail.com> An HTML attachment was scrubbed... URL: From Brian.Murrell at Sun.COM Sat Apr 5 09:39:29 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Sat, 05 Apr 2008 12:39:29 -0400 Subject: [ofa-general] inconsistent use of --with-backport[-patches] Message-ID: <1207413569.1750.135.camel@pc.ilinx> I'm trying to help the 1.3 ofa_kernel package along with figuring out which backport patches to use for my kernel source (because the kernel version does not work nicely with ofed_patch.sh's get_backport_dir() function) and there seems to be an inconsistent use of --with-backport-patches between configure and ofed_patch.sh. ofed_patch.sh takes the following arguments: --with-backport-patches) WITH_BACKPORT_PATCHES="yes" WITH_PATCH="yes" ;; --without-backport-patches) WITH_BACKPORT_PATCHES="no" ;; --with-backport) shift BACKPORT_DIR=$1 ;; --with-backport=*) BACKPORT_DIR=`expr "x$1" : 'x[^=]*=\(.*\)'` ;; and configure takes the following backport patches arguements: --with-backport-patches) ofed_patch_params="$ofed_patch_params $1" ;; --without-backport-patches) ofed_patch_params="$ofed_patch_params $1" ;; As you can see configure accepts "--with[out]-backport-patches " arguments and simply passes them on to ofed_patch.sh, however it does not accept the "--with-backport" argument to actually specify a set to use. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From swise at opengridcomputing.com Sat Apr 5 14:43:56 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Sat, 05 Apr 2008 16:43:56 -0500 Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: <47F69D86.9040407@oracle.com> References: <47F3C2EF.6010304@oracle.com> <47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com> <47F4F526.3060709@opengridcomputing.com> <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com> <47F63E33.5080709@opengridcomputing.com> <15ddcffd0804041325i17e8f620xaa1ec9ec823afd60@mail.gmail.com> <47F69D86.9040407@oracle.com> Message-ID: <47F7F29C.3040102@opengridcomputing.com> iWARP RFCs: > 5040 A Remote Direct Memory Access Protocol Specification. R. Recio, > B. Metzler, P. Culley, J. Hilland, D. Garcia. October 2007. (Format: > TXT=142247 bytes) (Status: PROPOSED STANDARD) > > 5041 Direct Data Placement over Reliable Transports. H. Shah, J. > Pinkerton, R. Recio, P. Culley. October 2007. (Format: TXT=84642 > bytes) (Status: PROPOSED STANDARD) > > 5042 Direct Data Placement Protocol (DDP) / Remote Direct Memory > Access Protocol (RDMAP) Security. J. Pinkerton, E. Deleganes. October > 2007. (Format: TXT=127453 bytes) (Status: PROPOSED STANDARD) > > 5043 Stream Control Transmission Protocol (SCTP) Direct Data Placement > (DDP) Adaptation. C. Bestler, Ed., R. Stewart, Ed.. October 2007. > (Format: TXT=38740 bytes) (Status: PROPOSED STANDARD) > > 5044 Marker PDU Aligned Framing for TCP Specification. P. Culley, U. > Elzur, R. Recio, S. Bailey, J. Carrier. October 2007. (Format: > TXT=168918 bytes) (Status: PROPOSED STANDARD) > > 5045 Applicability of Remote Direct Memory Access Protocol (RDMA) and > Direct Data Placement (DDP). C. Bestler, Ed., L. Coene. October 2007. > (Format: TXT=51749 bytes) (Status: INFORMATIONAL) For RDMA over TCP, refer to 5040, 5041, and 5044. iWARP Verbs: http://www.rdmaconsortium.org/home/draft-hilland-iwarp-verbs-v1.0-RDMAC.pdf Steve. From swise at opengridcomputing.com Sat Apr 5 14:55:33 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Sat, 05 Apr 2008 16:55:33 -0500 Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: <47F69D58.6040800@oracle.com> References: <47F3C2EF.6010304@oracle.com> <47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com> <47F4F526.3060709@opengridcomputing.com> <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com> <15ddcffd0804041323v480b4e3fi7061526184ab26b5@mail.gmail.com> <47F69D58.6040800@oracle.com> Message-ID: <47F7F555.2070208@opengridcomputing.com> Richard Frank wrote: > Hmmm - so what happens with IWARP NIC when no buffer is posted on recv q > and a message arrives ? > > The spec sez the implementation can terminate the connection. That is exactly what ammasso and chelsio's rnics do. The spec doesn't mandate this behavior however. So an incoming SEND could to be dropped and not ack'd at the TCP level forcing the client to retransmit. But I don't know of an rnic that does this. FYI: The reason the rnic implementation might terminate in this case is due to the protocol stack layering. If the rdma layers (mpa, ddp, rdmap) sitting on top of TCP don't tell TCP when to ack something, then the incoming SEND might be acked by TCP before the RDMA layers process the packet. Then the SEND cannot be dropped since its already acked. So the message either must be buffered until the RECV is posted, or the connection terminated. Steve. From richard.frank at oracle.com Sat Apr 5 15:55:40 2008 From: richard.frank at oracle.com (Richard Frank) Date: Sat, 05 Apr 2008 17:55:40 -0500 Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: <47F7F29C.3040102@opengridcomputing.com> References: <47F3C2EF.6010304@oracle.com> <47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com> <47F4F526.3060709@opengridcomputing.com> <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com> <47F63E33.5080709@opengridcomputing.com> <15ddcffd0804041325i17e8f620xaa1ec9ec823afd60@mail.gmail.com> <47F69D86.9040407@oracle.com> <47F7F29C.3040102@opengridcomputing.com> Message-ID: <47F8036C.1010701@oracle.com> This is all goodness - I'm looking for something specific to what does and does not work with IWARP NICs in OFED today - perhaps a matrix comparing functionality of IB vs iWARP - so we know what not to do and or what to work around when running over IWARP NICS. For now - we could probably just treat them as simple NICs and run RDS over TCP - that ought to at least work.. Steve Wise wrote: > iWARP RFCs: > >> 5040 A Remote Direct Memory Access Protocol Specification. R. Recio, >> B. Metzler, P. Culley, J. Hilland, D. Garcia. October 2007. >> (Format: >> TXT=142247 bytes) (Status: PROPOSED STANDARD) >> >> 5041 Direct Data Placement over Reliable Transports. H. Shah, J. >> Pinkerton, R. Recio, P. Culley. October 2007. (Format: TXT=84642 >> bytes) (Status: PROPOSED STANDARD) >> >> 5042 Direct Data Placement Protocol (DDP) / Remote Direct Memory >> Access Protocol (RDMAP) Security. J. Pinkerton, E. Deleganes. >> October >> 2007. (Format: TXT=127453 bytes) (Status: PROPOSED STANDARD) >> >> 5043 Stream Control Transmission Protocol (SCTP) Direct Data Placement >> (DDP) Adaptation. C. Bestler, Ed., R. Stewart, Ed.. October 2007. >> (Format: TXT=38740 bytes) (Status: PROPOSED STANDARD) >> >> 5044 Marker PDU Aligned Framing for TCP Specification. P. Culley, U. >> Elzur, R. Recio, S. Bailey, J. Carrier. October 2007. (Format: >> TXT=168918 bytes) (Status: PROPOSED STANDARD) >> >> 5045 Applicability of Remote Direct Memory Access Protocol (RDMA) and >> Direct Data Placement (DDP). C. Bestler, Ed., L. Coene. October >> 2007. >> (Format: TXT=51749 bytes) (Status: INFORMATIONAL) > > For RDMA over TCP, refer to 5040, 5041, and 5044. > > iWARP Verbs: > > http://www.rdmaconsortium.org/home/draft-hilland-iwarp-verbs-v1.0-RDMAC.pdf > > > > Steve. From sashak at voltaire.com Sat Apr 5 23:53:14 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 6 Apr 2008 06:53:14 +0000 Subject: [ofa-general] XmtDiscards In-Reply-To: <20080404234547.GA17618@lanczos.q-leap.de> References: <200804050012.39893.bs@q-leap.de> <20080404152932.5e294e47.weiny2@llnl.gov> <20080404234547.GA17618@lanczos.q-leap.de> Message-ID: <20080406065314.GA13374@sashak.voltaire.com> On 01:45 Sat 05 Apr , Bernd Schubert wrote: > > Hmm, I first increased head_of_queue_lifetime to 0x13 and > leaf_head_of_queue_lifetime to 0x20, but this didn't make the error > go away. So I increased head_of_queue_lifetime to 0x15 and > leaf_head_of_queue_lifetime to 0x50, but this made the fabric to entirely > crash. Are you using default (min hops) routing? I think it could be deadlock due to unlimited head_of_queue_lifetime values. > On the node of the master opensm I got an endless number of messages > like these: > > Apr 5 01:35:03 pfs1n2 kernel: [705448.344542] NETDEV WATCHDOG: ib0: transmit timed out > Apr 5 01:35:03 pfs1n2 kernel: [705448.349814] ib0: transmit timeout: latency 411908 msecs > Apr 5 01:35:03 pfs1n2 kernel: [705448.355364] ib0: queue stopped 1, tx_head 441, tx_tail 377 > Apr 5 01:35:04 pfs1n2 kernel: [705449.343495] NETDEV WATCHDOG: ib0: transmit timed out > > The slave opensm also went into D-state and is not killable anymore :( Interesting... Any more details about this? Sasha From rdreier at cisco.com Sat Apr 5 21:41:02 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sat, 05 Apr 2008 21:41:02 -0700 Subject: [ofa-general] Directions for verbs API extensions Message-ID: Here is a little document I wrote trying to summarize all the things that we might want to add to the verbs API to support device capabilities that aren't exposed yet. There are a number of issues to resolve, and answers to the questions I ask below would help us make progress towards actually supporting all this. There are a number of verbs that are common to the iWARP/RDMA consortium verbs and the InfiniBand base memory management extensions (IB-BMME). We would probably add one device capability bit for "BMME" (and all iWARP devices could set it) to show support for everything here: - Allocate L_Key/STag. This allocates MR resources without actually registering memory; the MR can then be registered or invalidated as described below. - "Fast register" memory through send queue. This allows a work request to be posted to a send queue to register memory using an L_Key/STag that is in the invalid state. - Local invalidate send work requests, which can be used to invalidate an MR or MW. One subtle point here is that local invalidate operations have very loose ordering, in the sense that they can be executed before earlier requests, but support for fencing local invalidate operations is mandatory in iWARP and only optional in IB. But is there any IB device that currently exists that supports BMME but doesn't support local invalidate fencing? I really hope we can ignore this possibility. - Memory windows associated to a single QP and bound using send work requests posted with the normal post send verb rather than a separate MW verb. (See below for more) In addition there are things that are optional in both specs: - Block-list physical buffer lists; this allows memory regions to be registered with arbitrary size/alignment blocks instead of just page-aligned chunks. Yet another capability bit if we want to expose this. There are a few discrepancies between the iWARP and IB verbs that we need to decide on how we want to handle: - In IB-BMME, L_Keys and R_Keys are split up so that there is an 8-bit "key" that is owned by the consumer. As far as I know, there is no analogous concept defined for iWARP STags; is there any point in supporting this IB-only feature (which is optional even in the IB spec)? - Along similar lines, IB defines two types of memory windows, "type 1" and "type 2" and in fact type 2 is split into "2A" and "2B" (the difference is basically whether the MW is associated with just a QP, or with a QP and a PD). iWARP memory windows are always what the IB spec would call type 2B. All the IB devices that I know of with IB-BMME support can handle type 2B memory windows. Is there any point in having our API worry about the distinction between 2A or 2B, or should we just decree that we only handle type 2B? (Does anyone who hasn't just been reading specs even understand the distinction between type 2A and 2B?) - Further, the MW API that we have now, with a separate bind MW verb, corresponds to type 1 MWs. Type 2 MWs are bound by posting a work request using the standard "post send" verb. Given that no IB device drivers have implemented the bind MW verb yet, does it make sense to deprecate the API for type 1 MWs and say that everyone should use type 2[B] MWs only? - iWARP supports "RDMA read with invalidate" send work requests, while IB has no such operation. This makes sense because iWARP requires the buffer used to receive RDMA read responses to have remote write permission, while IB has no such requirement. I don't see a really clean way to handle this except to say that apps have to have "if (IB) do_this(); else /* iWARP */ do_that();" code to use this in a portable way. - Zero-based virtual addresses for memory regions. This is mandatory for iWARP and optional for IB (and is not required even for BMME). I think the simplest thing to do is just to have yet another capability bit to say whether a device supports ZBVA or not; all iWARP devices can set it. Finally, there are proprietary verbs extensions that are only supported by a single device at the moment, which we have to decide if and how to support. It is a tradeoff between making useful features available versus making the already overly complex verbs API even more impossible to fathom, although it seems all of these have users asking for them: - ConnectX has XRC, masked atomic operations, and the "block loopback" flag for UD QPs at least. - eHCA has "low-latency" QPs. From pasha at dev.mellanox.co.il Sun Apr 6 08:04:14 2008 From: pasha at dev.mellanox.co.il (Pavel Shamis (Pasha)) Date: Sun, 06 Apr 2008 18:04:14 +0300 Subject: [ofa-general] MVAPICH2 crashes on mixed fabric In-Reply-To: References: Message-ID: <47F8E66E.6060505@dev.mellanox.co.il> MVAPICH(1) and OMPI have HCA auto-detect system and both of them works well on heterogeneous cluster. I'm not sure about mvapich2 but I think that mvapich-discussion list will be better place for this kind of question. So I'm forwarding this mail to mvapich list. Pasha. Mike Heinz wrote: > Hey, all, I'm not sure if this is a known bug or some sort of > limitation I'm unaware of, but I've been building and testing with the > OFED 1.3 GA release on a small fabric that has a mix of Arbel-based > and newer Connect-X HCAs. > > What I've discovered is that mvapich and openmpi work fine across the > entire fabric, but mvapich2 crashes when I use a mix of Arbels and > Connect-X. The errors vary depending on the test program but here's an > example: > > [mheinz at compute-0-0 IMB-3.0]$ mpirun -n 5 ./IMB-MPI1 > . > . > . > (output snipped) > . > . > . > > #----------------------------------------------------------------------------- > # Benchmarking Sendrecv > # #processes = 2 > # ( 3 additional processes waiting in MPI_Barrier) > #----------------------------------------------------------------------------- > #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] > Mbytes/sec > 0 1000 3.51 3.51 > 3.51 0.00 > 1 1000 3.63 3.63 > 3.63 0.52 > 2 1000 3.67 3.67 > 3.67 1.04 > 4 1000 3.64 3.64 > 3.64 2.09 > 8 1000 3.67 3.67 > 3.67 4.16 > 16 1000 3.67 3.67 > 3.67 8.31 > 32 1000 3.74 3.74 > 3.74 16.32 > 64 1000 3.90 3.90 > 3.90 31.28 > 128 1000 4.75 4.75 > 4.75 51.39 > 256 1000 5.21 5.21 > 5.21 93.79 > 512 1000 5.96 5.96 > 5.96 163.77 > 1024 1000 7.88 7.89 > 7.89 247.54 > 2048 1000 11.42 11.42 > 11.42 342.00 > 4096 1000 15.33 15.33 > 15.33 509.49 > 8192 1000 22.19 22.20 > 22.20 703.83 > 16384 1000 34.57 34.57 > 34.57 903.88 > 32768 1000 51.32 51.32 51.32 > 1217.94 > 65536 640 85.80 85.81 85.80 > 1456.74 > 131072 320 155.23 155.24 155.24 > 1610.40 > 262144 160 301.84 301.86 301.85 > 1656.39 > 524288 80 598.62 598.69 598.66 > 1670.31 > 1048576 40 1175.22 1175.30 1175.26 > 1701.69 > 2097152 20 2309.05 2309.05 2309.05 > 1732.32 > 4194304 10 4548.72 4548.98 4548.85 > 1758.64 > [0] Abort: Got FATAL event 3 > at line 796 in file ibv_channel_manager.c > rank 0 in job 1 compute-0-0.local_36049 caused collective abort of > all ranks > exit status of rank 0: killed by signal 9 > If, however, I define my mpdring to contain only Connect-X systems OR > only Arbel systems, IMB-MPI1 runs to completion. > > Can any suggest a workaround or is this a real bug with mvapich2? > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > ------------------------------------------------------------------------ > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- Pavel Shamis (Pasha) Mellanox Technologies From bs at q-leap.de Sun Apr 6 09:05:54 2008 From: bs at q-leap.de (Bernd Schubert) Date: Sun, 6 Apr 2008 18:05:54 +0200 Subject: [ofa-general] XmtDiscards In-Reply-To: <1207401583.15625.224.camel@hrosenstock-ws.xsigo.com> References: <200804050012.39893.bs@q-leap.de> <1207401583.15625.224.camel@hrosenstock-ws.xsigo.com> Message-ID: <20080406160554.GA28695@lanczos.q-leap.de> Hello Hal, On Sat, Apr 05, 2008 at 06:19:43AM -0700, Hal Rosenstock wrote: > Hi Bernd, > > On Sat, 2008-04-05 at 00:12 +0200, Bernd Schubert wrote: > > Hello, > > > > after I upgraded one of our clusters to opensm-3.2.1 it seems to have gotten > > much better there, at least no further RcvSwRelayErrors, even when the > > cluster is in idle state and so far also no SymbolErrors, which we also have > > seens before. > > > > However, after I just started a lustre stress test on 50 clients (to a lustre > > storage system with 20 OSS servers and 60 OSTs), ibcheckerrors reports about > > 9000 XmtDiscards within 30 minutes. > > > > Searching for this error I find "This is a symptom of congestion and may > > require tweaking either HOQ or switch lifetime values". > > Well, I have to admit I neither know what HOQ is, nor do I know how to tweak > > it. I also do not have an idea to set switch lifetime values. I guess this > > isn't related to the opensm timeout option, is it? > > > > Hmm, I just found a cisci pdf describing how to set the lifetime on these > > switches, but is this also possible on Flextronics switches? > > What routing algorithm are you using ? Rather than play with those > switch values, if you are not using up/down, could you try that to see > if it helps with the congestion you are seeing ? I now configured up/down, but still got XmtDiscards, though, only on one port. Error check on lid 205 (SW_pfs1_leaf2) port all: FAILED #warn: counter XmtDiscards = 6213 (threshold 100) lid 205 port 1 Error check on lid 205 (SW_pfs1_leaf2) port 1: FAILED #warn: counter RcvSwRelayErrors = 1431 (threshold 100) lid 205 port 13 Error check on lid 205 (SW_pfs1_leaf2) port 13: FAILED I'm also not sure if up/down is the optimal algorithm for a fabric with only two switches. Since describing the connections in words is a bit difficult, I just upload a drawing here: http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/ib/Interswitch-cabling.pdf The root-guid for the up/down algorithm is leaf-5 of of the small switch. But I'm still not sure about up/down at all. Doesn't one need for up/down at least 3 switches? Something like this ascii graphic below? root-switch / \ / \ Sw-1 ------------ Sw-2 Thanks for your help, Bernd PS: These RcvSwRelayErrors are also back again. I think this occur on some operations of Lustre. Even if these RcvSwRelayErrors are not critical, they are still a bit annoying, since they make it hard to find other errors in the output ob ibcheckerrors. If we can really ignore these errors, I will write a patch to not display these by default. From bs at q-leap.de Sun Apr 6 09:09:41 2008 From: bs at q-leap.de (Bernd Schubert) Date: Sun, 6 Apr 2008 18:09:41 +0200 Subject: [ofa-general] XmtDiscards In-Reply-To: <20080406065314.GA13374@sashak.voltaire.com> References: <200804050012.39893.bs@q-leap.de> <20080404152932.5e294e47.weiny2@llnl.gov> <20080404234547.GA17618@lanczos.q-leap.de> <20080406065314.GA13374@sashak.voltaire.com> Message-ID: <20080406160941.GA28798@lanczos.q-leap.de> Hello Sasha, On Sun, Apr 06, 2008 at 06:53:14AM +0000, Sasha Khapyorsky wrote: > On 01:45 Sat 05 Apr , Bernd Schubert wrote: > > > > Hmm, I first increased head_of_queue_lifetime to 0x13 and > > leaf_head_of_queue_lifetime to 0x20, but this didn't make the error > > go away. So I increased head_of_queue_lifetime to 0x15 and > > leaf_head_of_queue_lifetime to 0x50, but this made the fabric to entirely > > crash. > > Are you using default (min hops) routing? I think it could be deadlock > due to unlimited head_of_queue_lifetime values. > > > On the node of the master opensm I got an endless number of messages > > like these: > > > > Apr 5 01:35:03 pfs1n2 kernel: [705448.344542] NETDEV WATCHDOG: ib0: transmit timed out > > Apr 5 01:35:03 pfs1n2 kernel: [705448.349814] ib0: transmit timeout: latency 411908 msecs > > Apr 5 01:35:03 pfs1n2 kernel: [705448.355364] ib0: queue stopped 1, tx_head 441, tx_tail 377 > > Apr 5 01:35:04 pfs1n2 kernel: [705449.343495] NETDEV WATCHDOG: ib0: transmit timed out > > > > The slave opensm also went into D-state and is not killable anymore :( > > Interesting... Any more details about this? unfortunately not. As you may see, it was rather late already and I just wanted to get the entire system working, so I rebooted both nodes running the opensms :( Thanks, Bernd From huanwei at cse.ohio-state.edu Sun Apr 6 17:57:59 2008 From: huanwei at cse.ohio-state.edu (wei huang) Date: Sun, 6 Apr 2008 20:57:59 -0400 (EDT) Subject: [ofa-general] MVAPICH2 crashes on mixed fabric In-Reply-To: Message-ID: Hi Mike, Currently mvapich2 will detect different HCA type and thus select different parameters for communication, which may cause the problem. We are working on this feature and it will be available in our next release. For now, if you want to run on this setup, please set few environmental variables like: mpiexec -n 2 -env MV2_USE_COALESCE 0 -env MV2_VBUF_TOTAL_SIZE 9216 ./a.out Please let us know if this works. Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Fri, 4 Apr 2008, Mike Heinz wrote: > Hey, all, I'm not sure if this is a known bug or some sort of limitation > I'm unaware of, but I've been building and testing with the OFED 1.3 GA > release on a small fabric that has a mix of Arbel-based and newer > Connect-X HCAs. > > What I've discovered is that mvapich and openmpi work fine across the > entire fabric, but mvapich2 crashes when I use a mix of Arbels and > Connect-X. The errors vary depending on the test program but here's an > example: > > [mheinz at compute-0-0 IMB-3.0]$ mpirun -n 5 ./IMB-MPI1 > . > . > . > (output snipped) > . > . > . > > #----------------------------------------------------------------------- > ------ > # Benchmarking Sendrecv > # #processes = 2 > # ( 3 additional processes waiting in MPI_Barrier) > #----------------------------------------------------------------------- > ------ > #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] > Mbytes/sec > 0 1000 3.51 3.51 3.51 > 0.00 > 1 1000 3.63 3.63 3.63 > 0.52 > 2 1000 3.67 3.67 3.67 > 1.04 > 4 1000 3.64 3.64 3.64 > 2.09 > 8 1000 3.67 3.67 3.67 > 4.16 > 16 1000 3.67 3.67 3.67 > 8.31 > 32 1000 3.74 3.74 3.74 > 16.32 > 64 1000 3.90 3.90 3.90 > 31.28 > 128 1000 4.75 4.75 4.75 > 51.39 > 256 1000 5.21 5.21 5.21 > 93.79 > 512 1000 5.96 5.96 5.96 > 163.77 > 1024 1000 7.88 7.89 7.89 > 247.54 > 2048 1000 11.42 11.42 11.42 > 342.00 > 4096 1000 15.33 15.33 15.33 > 509.49 > 8192 1000 22.19 22.20 22.20 > 703.83 > 16384 1000 34.57 34.57 34.57 > 903.88 > 32768 1000 51.32 51.32 51.32 > 1217.94 > 65536 640 85.80 85.81 85.80 > 1456.74 > 131072 320 155.23 155.24 155.24 > 1610.40 > 262144 160 301.84 301.86 301.85 > 1656.39 > 524288 80 598.62 598.69 598.66 > 1670.31 > 1048576 40 1175.22 1175.30 1175.26 > 1701.69 > 2097152 20 2309.05 2309.05 2309.05 > 1732.32 > 4194304 10 4548.72 4548.98 4548.85 > 1758.64 > [0] Abort: Got FATAL event 3 > at line 796 in file ibv_channel_manager.c > rank 0 in job 1 compute-0-0.local_36049 caused collective abort of > all ranks > exit status of rank 0: killed by signal 9 > > If, however, I define my mpdring to contain only Connect-X systems OR > only Arbel systems, IMB-MPI1 runs to completion. > > Can any suggest a workaround or is this a real bug with mvapich2? > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > From balaji at mcs.anl.gov Sun Apr 6 17:16:11 2008 From: balaji at mcs.anl.gov (Pavan Balaji) Date: Sun, 06 Apr 2008 19:16:11 -0500 Subject: [ofa-general] [p2s2-announce] Deadline Extension: International Workshop on Parallel Programming Models and Systems Software (P2S2) Message-ID: <47F967CB.7020905@mcs.anl.gov> Due to several requests, we have extended the deadline for the P2S2 workshop to April 25th. Please find the detailed CFP below. ---------------------------------------------------------------------- CALL FOR PAPERS =============== First International Workshop on Parallel Programming Models and Systems Software for High-end Computing (P2S2) (http://www.mcs.anl.gov/events/workshops/p2s2) Sep. 8th, 2008 To be held in conjunction with ICPP-08: The 27th International Conference on Parallel Processing Sep. 8-12, 2008 Portland, Oregon, USA SCOPE ----- The goal of this workshop is to bring together researchers and practitioners in parallel programming models and systems software for high-end computing systems. Please join us in a discussion of new ideas, experiences, and the latest trends in these areas at the workshop. TOPICS OF INTEREST ------------------ The focus areas for this workshop include, but are not limited to: * Programming models and their high-performance implementations o MPI, Sockets, OpenMP, Global Arrays, X10, UPC, Chapel o Other Hybrid Programming Models * Systems software for scientific and enterprise computing o Communication sub-subsystems for high-end computing o High-performance File and storage systems o Fault-tolerance techniques and implementations o Efficient and high-performance virtualization and other management mechanisms * Tools for Management, Maintenance, Coordination and Synchronization o Software for Enterprise Data-centers using Modern Architectures o Job scheduling libraries o Management libraries for large-scale system o Toolkits for process and task coordination on modern platforms * Performance evaluation, analysis and modeling of emerging computing platforms PROCEEDINGS ----------- Proceedings of this workshop will be published by the IEEE Computer Society (together with the ICPP conference proceedings) in CD format only and will be available at the conference. SUBMISSION INSTRUCTIONS ----------------------- Submissions should be in PDF format in U.S. Letter size paper. They should not exceed 8 pages (all inclusive). Submissions will be judged based on relevance, significance, originality, correctness and clarity. DATES AND DEADLINES ------------------- Paper Submission: Extended to April 25th, 2008 Author Notification: May 20th, 2008 Camera Ready: June 2nd, 2008 PROGRAM CHAIRS -------------- * Pavan Balaji (Argonne National Laboratory) * Sayantan Sur (IBM Research) STEERING COMMITTEE ------------------ * William D. Gropp (University of Illinois Urbana-Champaign) * Dhabaleswar K. Panda (Ohio State University) * Vijay Saraswat (IBM Research) PROGRAM COMMITTEE ----------------- * David Bernholdt (Oak Ridge National Laboratory) * Ron Brightwell (Sandia National Laboratory) * Wu-chun Feng (Virginia Tech) * Richard Graham (Oak Ridge National Laboratory) * Hyun-wook Jin (Konkuk University, South Korea) * Sameer Kumar (IBM Research) * Doug Lea (State University of New York at Oswego) * Jarek Nieplocha (Pacific Northwest National Laboratory) * Scott Pakin (Los Alamos National Laboratory) * Vivek Sarkar (Rice University) * Rajeev Thakur (Argonne National Laboratory) * Pete Wyckoff (Ohio Supercomputing Center) If you have any questions, please contact us at p2s2-chairs at mcs.anl.gov ======================================================================== If you do not want to receive any more announcements regarding the P2S2 workshop, please send an email to majordomo at mcs.anl.gov with the email body (not email subject) as "unsubscribe p2s2-announce". ======================================================================== -- Pavan Balaji http://www.mcs.anl.gov/~balaji From clameter at sgi.com Sun Apr 6 22:45:41 2008 From: clameter at sgi.com (Christoph Lameter) Date: Sun, 6 Apr 2008 22:45:41 -0700 (PDT) Subject: [ofa-general] Re: [PATCH] mmu notifier #v11 In-Reply-To: <20080405002330.GF14784@duo.random> References: <20080402220148.GV19189@duo.random> <20080402221716.GY19189@duo.random> <20080403151908.GB9603@duo.random> <20080404202055.GA14784@duo.random> <20080405002330.GF14784@duo.random> Message-ID: On Sat, 5 Apr 2008, Andrea Arcangeli wrote: > In short when working with single pages it's a waste to block the > secondary-mmu page fault, because it's zero cost to invalidate_page > before put_page. Not even GRU need to do that. That depends on what the notifier is being used for. Some serialization with the external mappings has to be done anyways. And its cleaner to have one API that does a lock/unlock scheme. Atomic operations can easily lead to races. From clameter at sgi.com Sun Apr 6 22:48:56 2008 From: clameter at sgi.com (Christoph Lameter) Date: Sun, 6 Apr 2008 22:48:56 -0700 (PDT) Subject: [ofa-general] Re: [patch 02/10] emm: notifier logic In-Reply-To: <20080405005759.GH14784@duo.random> References: <20080404223048.374852899@sgi.com> <20080404223131.469710551@sgi.com> <20080405005759.GH14784@duo.random> Message-ID: On Sat, 5 Apr 2008, Andrea Arcangeli wrote: > > + rcu_assign_pointer(mm->emm_notifier, e); > > + mm_unlock(mm); > > My mm_lock solution makes all rcu serialization an unnecessary > overhead so you should remove it like I already did in #v11. If it > wasn't the case, then mm_lock wouldn't be a definitive fix for the > race. There still could be junk in the cache of one cpu. If you just read the new pointer but use the earlier content pointed to then you have a problem. So a memory fence / barrier is needed to guarantee that the contents pointed to are fetched after the pointer. From andrea at qumranet.com Sun Apr 6 23:02:34 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Mon, 7 Apr 2008 08:02:34 +0200 Subject: [ofa-general] Re: [PATCH] mmu notifier #v11 In-Reply-To: References: <20080402220148.GV19189@duo.random> <20080402221716.GY19189@duo.random> <20080403151908.GB9603@duo.random> <20080404202055.GA14784@duo.random> <20080405002330.GF14784@duo.random> Message-ID: <20080407060234.GD9309@duo.random> On Sun, Apr 06, 2008 at 10:45:41PM -0700, Christoph Lameter wrote: > That depends on what the notifier is being used for. Some serialization > with the external mappings has to be done anyways. And its cleaner to have As far as I can tell no, you don't need to serialize against the secondary mmu page fault in invalidate_page, like you instead have to do in range_begin if you don't unpin the pages in range_end. > one API that does a lock/unlock scheme. Atomic operations can easily lead > to races. What races? Note that if you don't want to optimize XPMEM and GRU can feel free to implement their own invalidate_page as this: invalidate_page(mm, addr) { range_begin(mm, addr, addr+PAGE_SIZE) range_end(mm, addr, addr+PAGE_SIZE) } There's zero risk of adding races if they do this, but I doubt they want to run as slow as with EMM so I guess they'll exploit the optimization by going lock-free vs the spte page fault in invalidate_page. From andrea at qumranet.com Sun Apr 6 23:06:02 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Mon, 7 Apr 2008 08:06:02 +0200 Subject: [ofa-general] Re: [patch 02/10] emm: notifier logic In-Reply-To: References: <20080404223048.374852899@sgi.com> <20080404223131.469710551@sgi.com> <20080405005759.GH14784@duo.random> Message-ID: <20080407060602.GE9309@duo.random> On Sun, Apr 06, 2008 at 10:48:56PM -0700, Christoph Lameter wrote: > On Sat, 5 Apr 2008, Andrea Arcangeli wrote: > > > > + rcu_assign_pointer(mm->emm_notifier, e); > > > + mm_unlock(mm); > > > > My mm_lock solution makes all rcu serialization an unnecessary > > overhead so you should remove it like I already did in #v11. If it > > wasn't the case, then mm_lock wouldn't be a definitive fix for the > > race. > > There still could be junk in the cache of one cpu. If you just read the > new pointer but use the earlier content pointed to then you have a > problem. There can't be junk, spinlocks provides semantics of proper memory barriers, just like rcu, so it's entirely superflous. There could be junk only if any of the mmu_notifier_* methods would be invoked _outside_ the i_mmap_lock and _outside_ the anon_vma and outside the mmap_sem, that is never the case of course. > So a memory fence / barrier is needed to guarantee that the contents > pointed to are fetched after the pointer. It's not needed... if you were right we could never possibly run a list_for_each inside any spinlock protected critical section and we'd always need to use the _rcu version instead. The _rcu version is needed only when the list walk happens outside the spinlock critical section of course (rcu = no spinlock cacheline exlusive write operation in the read side, here the read side takes the spinlock big time). From clameter at sgi.com Sun Apr 6 23:20:08 2008 From: clameter at sgi.com (Christoph Lameter) Date: Sun, 6 Apr 2008 23:20:08 -0700 (PDT) Subject: [ofa-general] Re: [patch 02/10] emm: notifier logic In-Reply-To: <20080407060602.GE9309@duo.random> References: <20080404223048.374852899@sgi.com> <20080404223131.469710551@sgi.com> <20080405005759.GH14784@duo.random> <20080407060602.GE9309@duo.random> Message-ID: On Mon, 7 Apr 2008, Andrea Arcangeli wrote: > > > My mm_lock solution makes all rcu serialization an unnecessary > > > overhead so you should remove it like I already did in #v11. If it > > > wasn't the case, then mm_lock wouldn't be a definitive fix for the > > > race. > > > > There still could be junk in the cache of one cpu. If you just read the > > new pointer but use the earlier content pointed to then you have a > > problem. > > There can't be junk, spinlocks provides semantics of proper memory > barriers, just like rcu, so it's entirely superflous. > > There could be junk only if any of the mmu_notifier_* methods would be > invoked _outside_ the i_mmap_lock and _outside_ the anon_vma and > outside the mmap_sem, that is never the case of course. So we use other locks to perform serialization on the list chains? Basically the list chains are protected by either mmap_sem or an rmap lock? We need to document that. In that case we could also add an unregister function. From andrea at qumranet.com Mon Apr 7 00:13:30 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Mon, 7 Apr 2008 09:13:30 +0200 Subject: [ofa-general] Re: [patch 02/10] emm: notifier logic In-Reply-To: References: <20080404223048.374852899@sgi.com> <20080404223131.469710551@sgi.com> <20080405005759.GH14784@duo.random> <20080407060602.GE9309@duo.random> Message-ID: <20080407071330.GH9309@duo.random> On Sun, Apr 06, 2008 at 11:20:08PM -0700, Christoph Lameter wrote: > On Mon, 7 Apr 2008, Andrea Arcangeli wrote: > > > > > My mm_lock solution makes all rcu serialization an unnecessary > > > > overhead so you should remove it like I already did in #v11. If it > > > > wasn't the case, then mm_lock wouldn't be a definitive fix for the > > > > race. > > > > > > There still could be junk in the cache of one cpu. If you just read the > > > new pointer but use the earlier content pointed to then you have a > > > problem. > > > > There can't be junk, spinlocks provides semantics of proper memory > > barriers, just like rcu, so it's entirely superflous. > > > > There could be junk only if any of the mmu_notifier_* methods would be > > invoked _outside_ the i_mmap_lock and _outside_ the anon_vma and > > outside the mmap_sem, that is never the case of course. > > So we use other locks to perform serialization on the list chains? > Basically the list chains are protected by either mmap_sem or an rmap > lock? We need to document that. I thought it was obvious, if it wasn't the case how could mm_lock fix any range_begin/range_end race? Also to document it you've just to remove _rcu, the only confusion could arise from reading your patch, mine couldn't raise any doubt that rcu isn't needed and regular spinlocks/semaphores are serializing all methods. > In that case we could also add an unregister function. Indeed, but it still can't run after mm_users == 0. So for unregister to work one has to boost the mm_users first. exit_mmap doesn't take any lock when destroying the mm because it assumes nobody is messing with the mm at that time. So that requirement doesn't change, but now one can unregister before mm_users is dropped to 0. Also I wonder if I should make a new version of the mm_lock/unlock so that they will guarantee SIGKILL handling in O(N) anywhere inside mm_lock or mm_unlock, where N is the number of vmas, that will either require a VM_MM_LOCK_I/VM_MM_LOCK_A bitflag, or a vmalloc of two bitflag arrays inside the mmap_sem critical section returned by mm_lock as a cookie and passed as param to mm_unlock. The SIGKILL check is mostly worthless in spin_lock context (especially on UP or low-smp) but given the later patches switches all relevant VM locks to mutexes (this should happen under a config option to avoid hurting server performance), it might be worth it. That will require mmu_notifier_register to return both -EINTR and -ENOMEM if using the vmalloc trick to avoid registering two more vm_flags bitflags. Alternatively we can have mm_lock fail with -EPERM if there aren't enough capabilities and the number of vmas is bigger than a certain number. This is more or less like the requirement to attach during startup. This is preferable IMHO because it's effective even without preempt-rt and in turn with all locks being spinlocks for maximum performance, so I'll likely release #v12 with this change. In any case the mmu_notifier_register will need to return error (an unregister as well for that matter). But those are very minor issues, #v11 can go in -mm now to ensure mmu notifiers will be shipped with 2.6.26rc. From 2rsvn at longbeachgardenhotel.com Mon Apr 7 02:20:04 2008 From: 2rsvn at longbeachgardenhotel.com (auberon william) Date: Mon, 07 Apr 2008 09:20:04 +0000 Subject: [ofa-general] genuine artifacts Message-ID: <000601c8989f$01be5f5b$76becfa8@xuqpm> Detailed quality replicas of the most wanted designer watches are here! Genuine solid stainless steel, and 99.9% accurate markings, finish and weight. - The worlds largest online retailer of luxury products, including: Rolex Sports Models Rolex Datejusts Breitling Cartier Porsche Design Dolce & Gabbana Dior Gucci Hermes Watches Patek Philippe Visit - www.spoooke.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ossrosch at linux.vnet.ibm.com Mon Apr 7 04:31:39 2008 From: ossrosch at linux.vnet.ibm.com (Stefan Roscher) Date: Mon, 7 Apr 2008 12:31:39 +0100 Subject: [ofa-general] Plan for OFED-1.3.1? Message-ID: <200804071331.42031.ossrosch@linux.vnet.ibm.com> Hi, is there any schedule for the OFED-1.3.1 release? When should we start to send some minor bugfixes for ehca? Would the kernel-base be the same 2.6.24 or will it change to 2.6.25? regards Stefan From ossrosch at linux.vnet.ibm.com Mon Apr 7 05:57:33 2008 From: ossrosch at linux.vnet.ibm.com (Stefan Roscher) Date: Mon, 7 Apr 2008 13:57:33 +0100 Subject: [ofa-general] [PATCH] IB/ehca: extend query_device() and query_port() to support all values for ibv_devinfo Message-ID: <200804071457.36248.ossrosch@linux.vnet.ibm.com> Also, introduce a few inline helper functions to make the code more readable. Signed-off-by: Stefan Roscher --- drivers/infiniband/hw/ehca/ehca_hca.c | 128 ++++++++++++++++++++------------ 1 files changed, 80 insertions(+), 48 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_hca.c b/drivers/infiniband/hw/ehca/ehca_hca.c index 8832123..f89c5f8 100644 --- a/drivers/infiniband/hw/ehca/ehca_hca.c +++ b/drivers/infiniband/hw/ehca/ehca_hca.c @@ -43,6 +43,11 @@ #include "ehca_iverbs.h" #include "hcp_if.h" +static inline unsigned int limit_uint(unsigned int value) +{ + return min_t(unsigned int, value, INT_MAX); +} + int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props) { int i, ret = 0; @@ -83,37 +88,40 @@ int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props) props->vendor_id = rblock->vendor_id >> 8; props->vendor_part_id = rblock->vendor_part_id >> 16; props->hw_ver = rblock->hw_ver; - props->max_qp = min_t(unsigned, rblock->max_qp, INT_MAX); - props->max_qp_wr = min_t(unsigned, rblock->max_wqes_wq, INT_MAX); - props->max_sge = min_t(unsigned, rblock->max_sge, INT_MAX); - props->max_sge_rd = min_t(unsigned, rblock->max_sge_rd, INT_MAX); - props->max_cq = min_t(unsigned, rblock->max_cq, INT_MAX); - props->max_cqe = min_t(unsigned, rblock->max_cqe, INT_MAX); - props->max_mr = min_t(unsigned, rblock->max_mr, INT_MAX); - props->max_mw = min_t(unsigned, rblock->max_mw, INT_MAX); - props->max_pd = min_t(unsigned, rblock->max_pd, INT_MAX); - props->max_ah = min_t(unsigned, rblock->max_ah, INT_MAX); - props->max_fmr = min_t(unsigned, rblock->max_mr, INT_MAX); + props->max_qp = limit_uint(rblock->max_qp); + props->max_qp_wr = limit_uint(rblock->max_wqes_wq); + props->max_sge = limit_uint(rblock->max_sge); + props->max_sge_rd = limit_uint(rblock->max_sge_rd); + props->max_cq = limit_uint(rblock->max_cq); + props->max_cqe = limit_uint(rblock->max_cqe); + props->max_mr = limit_uint(rblock->max_mr); + props->max_mw = limit_uint(rblock->max_mw); + props->max_pd = limit_uint(rblock->max_pd); + props->max_ah = limit_uint(rblock->max_ah); + props->max_ee = limit_uint(rblock->max_rd_ee_context); + props->max_rdd = limit_uint(rblock->max_rd_domain); + props->max_fmr = limit_uint(rblock->max_mr); + props->local_ca_ack_delay = limit_uint(rblock->local_ca_ack_delay); + props->max_qp_rd_atom = limit_uint(rblock->max_rr_qp); + props->max_ee_rd_atom = limit_uint(rblock->max_rr_ee_context); + props->max_res_rd_atom = limit_uint(rblock->max_rr_hca); + props->max_qp_init_rd_atom = limit_uint(rblock->max_act_wqs_qp); + props->max_ee_init_rd_atom = limit_uint(rblock->max_act_wqs_ee_context); if (EHCA_BMASK_GET(HCA_CAP_SRQ, shca->hca_cap)) { - props->max_srq = props->max_qp; - props->max_srq_wr = props->max_qp_wr; + props->max_srq = limit_uint(props->max_qp); + props->max_srq_wr = limit_uint(props->max_qp_wr); props->max_srq_sge = 3; } - props->max_pkeys = 16; - props->local_ca_ack_delay - = rblock->local_ca_ack_delay; - props->max_raw_ipv6_qp - = min_t(unsigned, rblock->max_raw_ipv6_qp, INT_MAX); - props->max_raw_ethy_qp - = min_t(unsigned, rblock->max_raw_ethy_qp, INT_MAX); - props->max_mcast_grp - = min_t(unsigned, rblock->max_mcast_grp, INT_MAX); - props->max_mcast_qp_attach - = min_t(unsigned, rblock->max_mcast_qp_attach, INT_MAX); + props->max_pkeys = 16; + props->local_ca_ack_delay = limit_uint(rblock->local_ca_ack_delay); + props->max_raw_ipv6_qp = limit_uint(rblock->max_raw_ipv6_qp); + props->max_raw_ethy_qp = limit_uint(rblock->max_raw_ethy_qp); + props->max_mcast_grp = limit_uint(rblock->max_mcast_grp); + props->max_mcast_qp_attach = limit_uint(rblock->max_mcast_qp_attach); props->max_total_mcast_qp_attach - = min_t(unsigned, rblock->max_total_mcast_qp_attach, INT_MAX); + = limit_uint(rblock->max_total_mcast_qp_attach); /* translate device capabilities */ props->device_cap_flags = IB_DEVICE_SYS_IMAGE_GUID | @@ -128,6 +136,46 @@ query_device1: return ret; } +static inline int map_mtu(struct ehca_shca *shca, u32 fw_mtu) +{ + switch (fw_mtu) { + case 0x1: + return IB_MTU_256; + case 0x2: + return IB_MTU_512; + case 0x3: + return IB_MTU_1024; + case 0x4: + return IB_MTU_2048; + case 0x5: + return IB_MTU_4096; + default: + ehca_err(&shca->ib_device, "Unknown MTU size: %x.", + fw_mtu); + return 0; + } +} + +static inline int map_number_of_vls(struct ehca_shca *shca, u32 vl_cap) +{ + switch (vl_cap) { + case 0x1: + return 1; + case 0x2: + return 2; + case 0x3: + return 4; + case 0x4: + return 8; + case 0x5: + return 15; + default: + ehca_err(&shca->ib_device, "invalid Vl Capability: %x.", + vl_cap); + return 0; + } +} + int ehca_query_port(struct ib_device *ibdev, u8 port, struct ib_port_attr *props) { @@ -152,31 +200,14 @@ int ehca_query_port(struct ib_device *ibdev, memset(props, 0, sizeof(struct ib_port_attr)); - switch (rblock->max_mtu) { - case 0x1: - props->active_mtu = props->max_mtu = IB_MTU_256; - break; - case 0x2: - props->active_mtu = props->max_mtu = IB_MTU_512; - break; - case 0x3: - props->active_mtu = props->max_mtu = IB_MTU_1024; - break; - case 0x4: - props->active_mtu = props->max_mtu = IB_MTU_2048; - break; - case 0x5: - props->active_mtu = props->max_mtu = IB_MTU_4096; - break; - default: - ehca_err(&shca->ib_device, "Unknown MTU size: %x.", - rblock->max_mtu); - break; - } - + props->active_mtu = props->max_mtu = map_mtu(shca, rblock->max_mtu); props->port_cap_flags = rblock->capability_mask; props->gid_tbl_len = rblock->gid_tbl_len; - props->max_msg_sz = rblock->max_msg_sz; + if (rblock->max_msg_sz) { + props->max_msg_sz = rblock->max_msg_sz; + } else { + props->max_msg_sz = 0x1 << 31; + } props->bad_pkey_cntr = rblock->bad_pkey_cntr; props->qkey_viol_cntr = rblock->qkey_viol_cntr; props->pkey_tbl_len = rblock->pkey_tbl_len; @@ -186,6 +217,7 @@ int ehca_query_port(struct ib_device *ibdev, props->sm_sl = rblock->sm_sl; props->subnet_timeout = rblock->subnet_timeout; props->init_type_reply = rblock->init_type_reply; + props->max_vl_num = map_number_of_vls(shca, rblock->vl_cap); if (rblock->state && rblock->phys_width) { props->phys_state = rblock->phys_pstate; -- 1.5.2 From michael.heinz at qlogic.com Mon Apr 7 06:20:48 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Mon, 7 Apr 2008 08:20:48 -0500 Subject: [ofa-general] MVAPICH2 crashes on mixed fabric In-Reply-To: References: Message-ID: Wei, Thanks so much for the tip - I'll give it a try. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: wei huang [mailto:huanwei at cse.ohio-state.edu] Sent: Sunday, April 06, 2008 8:58 PM To: Mike Heinz Cc: general at lists.openfabrics.org Subject: Re: [ofa-general] MVAPICH2 crashes on mixed fabric Hi Mike, Currently mvapich2 will detect different HCA type and thus select different parameters for communication, which may cause the problem. We are working on this feature and it will be available in our next release. For now, if you want to run on this setup, please set few environmental variables like: mpiexec -n 2 -env MV2_USE_COALESCE 0 -env MV2_VBUF_TOTAL_SIZE 9216 ./a.out Please let us know if this works. Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Fri, 4 Apr 2008, Mike Heinz wrote: > Hey, all, I'm not sure if this is a known bug or some sort of > limitation I'm unaware of, but I've been building and testing with the > OFED 1.3 GA release on a small fabric that has a mix of Arbel-based > and newer Connect-X HCAs. > > What I've discovered is that mvapich and openmpi work fine across the > entire fabric, but mvapich2 crashes when I use a mix of Arbels and > Connect-X. The errors vary depending on the test program but here's an > example: > > [mheinz at compute-0-0 IMB-3.0]$ mpirun -n 5 ./IMB-MPI1 . > . > . > (output snipped) > . > . > . > > #--------------------------------------------------------------------- > -- > ------ > # Benchmarking Sendrecv > # #processes = 2 > # ( 3 additional processes waiting in MPI_Barrier) > #--------------------------------------------------------------------- > -- > ------ > #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] > Mbytes/sec > 0 1000 3.51 3.51 3.51 > 0.00 > 1 1000 3.63 3.63 3.63 > 0.52 > 2 1000 3.67 3.67 3.67 > 1.04 > 4 1000 3.64 3.64 3.64 > 2.09 > 8 1000 3.67 3.67 3.67 > 4.16 > 16 1000 3.67 3.67 3.67 > 8.31 > 32 1000 3.74 3.74 3.74 > 16.32 > 64 1000 3.90 3.90 3.90 > 31.28 > 128 1000 4.75 4.75 4.75 > 51.39 > 256 1000 5.21 5.21 5.21 > 93.79 > 512 1000 5.96 5.96 5.96 > 163.77 > 1024 1000 7.88 7.89 7.89 > 247.54 > 2048 1000 11.42 11.42 11.42 > 342.00 > 4096 1000 15.33 15.33 15.33 > 509.49 > 8192 1000 22.19 22.20 22.20 > 703.83 > 16384 1000 34.57 34.57 34.57 > 903.88 > 32768 1000 51.32 51.32 51.32 > 1217.94 > 65536 640 85.80 85.81 85.80 > 1456.74 > 131072 320 155.23 155.24 155.24 > 1610.40 > 262144 160 301.84 301.86 301.85 > 1656.39 > 524288 80 598.62 598.69 598.66 > 1670.31 > 1048576 40 1175.22 1175.30 1175.26 > 1701.69 > 2097152 20 2309.05 2309.05 2309.05 > 1732.32 > 4194304 10 4548.72 4548.98 4548.85 > 1758.64 > [0] Abort: Got FATAL event 3 > at line 796 in file ibv_channel_manager.c > rank 0 in job 1 compute-0-0.local_36049 caused collective abort of > all ranks > exit status of rank 0: killed by signal 9 > > If, however, I define my mpdring to contain only Connect-X systems OR > only Arbel systems, IMB-MPI1 runs to completion. > > Can any suggest a workaround or is this a real bug with mvapich2? > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania > > From hrosenstock at xsigo.com Mon Apr 7 06:35:10 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Mon, 07 Apr 2008 06:35:10 -0700 Subject: [ofa-general] XmtDiscards In-Reply-To: <20080406160554.GA28695@lanczos.q-leap.de> References: <200804050012.39893.bs@q-leap.de> <1207401583.15625.224.camel@hrosenstock-ws.xsigo.com> <20080406160554.GA28695@lanczos.q-leap.de> Message-ID: <1207575310.15625.258.camel@hrosenstock-ws.xsigo.com> Hi Bernd, On Sun, 2008-04-06 at 18:05 +0200, Bernd Schubert wrote: > Hello Hal, > > On Sat, Apr 05, 2008 at 06:19:43AM -0700, Hal Rosenstock wrote: > > Hi Bernd, > > > > On Sat, 2008-04-05 at 00:12 +0200, Bernd Schubert wrote: > > > Hello, > > > > > > after I upgraded one of our clusters to opensm-3.2.1 it seems to have gotten > > > much better there, at least no further RcvSwRelayErrors, even when the > > > cluster is in idle state and so far also no SymbolErrors, which we also have > > > seens before. > > > > > > However, after I just started a lustre stress test on 50 clients (to a lustre > > > storage system with 20 OSS servers and 60 OSTs), ibcheckerrors reports about > > > 9000 XmtDiscards within 30 minutes. > > > > > > Searching for this error I find "This is a symptom of congestion and may > > > require tweaking either HOQ or switch lifetime values". > > > Well, I have to admit I neither know what HOQ is, nor do I know how to tweak > > > it. I also do not have an idea to set switch lifetime values. I guess this > > > isn't related to the opensm timeout option, is it? > > > > > > Hmm, I just found a cisci pdf describing how to set the lifetime on these > > > switches, but is this also possible on Flextronics switches? > > > > What routing algorithm are you using ? Rather than play with those > > switch values, if you are not using up/down, could you try that to see > > if it helps with the congestion you are seeing ? > > I now configured up/down, but still got XmtDiscards, though, only on one port. > > Error check on lid 205 (SW_pfs1_leaf2) port all: FAILED > #warn: counter XmtDiscards = 6213 (threshold 100) lid 205 port 1 > Error check on lid 205 (SW_pfs1_leaf2) port 1: FAILED > #warn: counter RcvSwRelayErrors = 1431 (threshold 100) lid 205 port 13 > Error check on lid 205 (SW_pfs1_leaf2) port 13: FAILED Are you running IPoIB ? If so, SwRelayErrors are not necessarily indicative of a "real" issue due to the fact that multicasts reflected on the same port are mistakenly counted. > I'm also not sure if up/down is the optimal algorithm for a fabric with only > two switches. > > Since describing the connections in words is a bit difficult, I just upload > a drawing here: > > http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/ib/Interswitch-cabling.pdf > > The root-guid for the up/down algorithm is leaf-5 of of the small switch. But > I'm still not sure about up/down at all. Doesn't one need for up/down at least > 3 switches? Something like this ascii graphic below? > > > root-switch > / \ > / \ > Sw-1 ------------ Sw-2 Doesn't your chassis switch have many switches in it ? You did say it was 144 ports so it's made up of a number of switches. You may need to choose a "better" root than up/down automatically determines. -- Hal > Thanks for your help, > Bernd > > > PS: These RcvSwRelayErrors are also back again. I think this occur on some > operations of Lustre. Even if these RcvSwRelayErrors are not critical, they > are still a bit annoying, since they make it hard to find other errors in > the output ob ibcheckerrors. > If we can really ignore these errors, I will write a patch to not display these > by default. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From bs at q-leap.de Mon Apr 7 06:53:47 2008 From: bs at q-leap.de (Bernd Schubert) Date: Mon, 7 Apr 2008 15:53:47 +0200 Subject: [ofa-general] XmtDiscards In-Reply-To: <1207575310.15625.258.camel@hrosenstock-ws.xsigo.com> References: <200804050012.39893.bs@q-leap.de> <20080406160554.GA28695@lanczos.q-leap.de> <1207575310.15625.258.camel@hrosenstock-ws.xsigo.com> Message-ID: <200804071553.47457.bs@q-leap.de> Hello Hal, On Monday 07 April 2008 15:35:10 Hal Rosenstock wrote: > Hi Bernd, > > On Sun, 2008-04-06 at 18:05 +0200, Bernd Schubert wrote: > > Hello Hal, > > > > > > Searching for this error I find "This is a symptom of congestion and > > > > may require tweaking either HOQ or switch lifetime values". > > > > Well, I have to admit I neither know what HOQ is, nor do I know how > > > > to tweak it. I also do not have an idea to set switch lifetime > > > > values. I guess this isn't related to the opensm timeout option, is > > > > it? > > > > > > > > Hmm, I just found a cisci pdf describing how to set the lifetime on > > > > these switches, but is this also possible on Flextronics switches? > > > > > > What routing algorithm are you using ? Rather than play with those > > > switch values, if you are not using up/down, could you try that to see > > > if it helps with the congestion you are seeing ? > > > > I now configured up/down, but still got XmtDiscards, though, only on one > > port. > > > > Error check on lid 205 (SW_pfs1_leaf2) port all: FAILED > > #warn: counter XmtDiscards = 6213 (threshold 100) lid 205 port 1 > > Error check on lid 205 (SW_pfs1_leaf2) port 1: FAILED > > #warn: counter RcvSwRelayErrors = 1431 (threshold 100) lid 205 port 13 > > Error check on lid 205 (SW_pfs1_leaf2) port 13: FAILED > > Are you running IPoIB ? If so, SwRelayErrors are not necessarily > indicative of a "real" issue due to the fact that multicasts reflected > on the same port are mistakenly counted. so far only Lustre did IPoIB for network initialization. Once it finds a working connection it does RDMA. But I'm not sure about what it does in case of problems, e.g. server reboot, I guess it then does again IPoIB. Is there a way to find out if these RcvSwRelayErrors are due to multicast or due to real problems? > > > I'm also not sure if up/down is the optimal algorithm for a fabric with > > only two switches. > > > > Since describing the connections in words is a bit difficult, I just > > upload a drawing here: > > > > http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/ib/Interswitch-ca > >bling.pdf > > > > The root-guid for the up/down algorithm is leaf-5 of of the small switch. > > But I'm still not sure about up/down at all. Doesn't one need for up/down > > at least 3 switches? Something like this ascii graphic below? > > > > > > root-switch > > / \ > > / \ > > Sw-1 ------------ Sw-2 > > Doesn't your chassis switch have many switches in it ? You did say it > was 144 ports so it's made up of a number of switches. Yes, it's made up of a number of switches. > > You may need to choose a "better" root than up/down automatically > determines. > Opensm isn't able to detect a root itself at all. As said above I first configured leaf-5 of the small switch (see the pdf file above), but now switched it to leaf-6 guid. I have no idea which would be optimal for our switches - I guess I have to create a drawing from the ibnetdiscover output to figure this out. I will also later on try to check with ibutils if it detects errors. Thanks, Bernd -- Bernd Schubert Q-Leap Networks GmbH From a.p.zijlstra at chello.nl Mon Apr 7 06:55:48 2008 From: a.p.zijlstra at chello.nl (Peter Zijlstra) Date: Mon, 07 Apr 2008 15:55:48 +0200 Subject: [ofa-general] Re: [patch 01/10] emm: mm_lock: Lock a process against reclaim In-Reply-To: <20080405004127.GG14784@duo.random> References: <20080404223048.374852899@sgi.com> <20080404223131.271668133@sgi.com> <47F6B5EA.6060106@goop.org> <20080405004127.GG14784@duo.random> Message-ID: <1207576548.15579.43.camel@twins> On Sat, 2008-04-05 at 02:41 +0200, Andrea Arcangeli wrote: > On Fri, Apr 04, 2008 at 04:12:42PM -0700, Jeremy Fitzhardinge wrote: > > I think you can break this if() down a bit: > > > > if (!(vma->vm_file && vma->vm_file->f_mapping)) > > continue; > > It makes no difference at runtime, coding style preferences are quite > subjective. I'll have to concurr with Jeremy here, please break that monstrous if stmt down. It might not matter to the compiler, but it sure as hell helps for anyone trying to understand/maintain the thing. From erezz at Voltaire.COM Mon Apr 7 07:35:39 2008 From: erezz at Voltaire.COM (Erez Zilber) Date: Mon, 07 Apr 2008 17:35:39 +0300 Subject: [ofa-general] About RDMA_CM_EVENT_DEVICE_REMOVAL Message-ID: <47FA313B.20809@Voltaire.COM> Sean, I'm trying to add a better implementation to this event in iSER (better than the current BUG() call that we have). I have 2 questions: 1. Is this event raised for each connection? 2. After the event is raised, I guess that I need to release all IB resources for that connection, right? If you take a look at iser_free_ib_conn_res() (in ulp/iser/iser_verbs.c), you can see that we call rdma_destroy_id. This call never returns. Should I call rdma_destroy_id while handling RDMA_CM_EVENT_DEVICE_REMOVAL? Thanks, Erez From swise at opengridcomputing.com Mon Apr 7 07:37:55 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 07 Apr 2008 09:37:55 -0500 Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP NICs ? In-Reply-To: References: <47F3C2EF.6010304@oracle.com> <47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com> <47F4F526.3060709@opengridcomputing.com> <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com> Message-ID: <47FA31C3.5090307@opengridcomputing.com> Roland Dreier wrote: > > If not, can some notes be sent to the list? I say lets learn from what > > you did so far... > > In my experience, getting code to work over both IB and iWARP isn't that > hard. The main points are: > > - Use the RDMA CM for connection establishment (duh) > - Memory regions used to receive RDMA read responses must have "remote > write" permission (since in the iWARP protocol, RDMA read responses > are basically the same as incoming RDMA write requests) > - Active side of the connection must do the first operation > - Don't use IB-specific features (atomics, immediate data) > > Dunno the exact semantics for IB, but: write and send completions for iWARP only indicate the buffer for the IO operation can be reused. It does not indicate the data has been placed in the peers memory. Steve. From erezz at Voltaire.COM Mon Apr 7 07:53:32 2008 From: erezz at Voltaire.COM (Erez Zilber) Date: Mon, 07 Apr 2008 17:53:32 +0300 Subject: [ofa-general] About RDMA_CM_EVENT_DEVICE_REMOVAL In-Reply-To: <47FA313B.20809@Voltaire.COM> References: <47FA313B.20809@Voltaire.COM> Message-ID: <47FA356C.9080209@Voltaire.COM> Erez Zilber wrote: > Sean, > > I'm trying to add a better implementation to this event in iSER (better > than the current BUG() call that we have). I have 2 questions: > > 1. Is this event raised for each connection? > 2. After the event is raised, I guess that I need to release all IB > resources for that connection, right? If you take a look at > iser_free_ib_conn_res() (in ulp/iser/iser_verbs.c), you can see > that we call rdma_destroy_id. This call never returns. Should I > call rdma_destroy_id while handling RDMA_CM_EVENT_DEVICE_REMOVAL? > > I read some of the cma code, and I see that cma_process_remove calls rdma_destroy_id itself if iser_cma_handler returns a non-zero value. Why? Currently, iser_cma_handler returns 0 (success), so rdma_destroy_id is never called... Erez From poornima.kamath at qlogic.com Mon Apr 7 07:56:39 2008 From: poornima.kamath at qlogic.com (Poornima Kamath (Contractor - )) Date: Mon, 7 Apr 2008 07:56:39 -0700 Subject: [ofa-general] Running sdpnetstat on removing ib_sdp module causes kernel panic Message-ID: Hi, I am getting a kernel panic on running sdpnetstat when ib_sdp module is unloaded. Has anyone seen this? I am running OFED-1.3. I have opened a bug in OFED-bugzilla for the same. https://bugs.openfabrics.org/show_bug.cgi?id=996. Regards, Poornima -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Mon Apr 7 08:27:28 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 07 Apr 2008 10:27:28 -0500 Subject: [ofa-general] Directions for verbs API extensions In-Reply-To: References: Message-ID: <47FA3D60.3020905@opengridcomputing.com> Hey roland. Nice write-up. Comments in-line below: Roland Dreier wrote: > Here is a little document I wrote trying to summarize all the things > that we might want to add to the verbs API to support device > capabilities that aren't exposed yet. There are a number of issues to > resolve, and answers to the questions I ask below would help us make > progress towards actually supporting all this. > > There are a number of verbs that are common to the iWARP/RDMA > consortium verbs and the InfiniBand base memory management extensions > (IB-BMME). We would probably add one device capability bit for "BMME" > (and all iWARP devices could set it) to show support for everything here: > > - Allocate L_Key/STag. This allocates MR resources without actually > registering memory; the MR can then be registered or invalidated as > described below. > > - "Fast register" memory through send queue. This allows a work > request to be posted to a send queue to register memory using an > L_Key/STag that is in the invalid state. > > - Local invalidate send work requests, which can be used to > invalidate an MR or MW. One subtle point here is that local > invalidate operations have very loose ordering, in the sense that > they can be executed before earlier requests, but support for > fencing local invalidate operations is mandatory in iWARP and only > optional in IB. But is there any IB device that currently exists > that supports BMME but doesn't support local invalidate fencing? > I really hope we can ignore this possibility. > > - Memory windows associated to a single QP and bound using send work > requests posted with the normal post send verb rather than a > separate MW verb. (See below for more) > > In addition there are things that are optional in both specs: > > - Block-list physical buffer lists; this allows memory regions to be > registered with arbitrary size/alignment blocks instead of just > page-aligned chunks. Yet another capability bit if we want to > expose this. > > There are a few discrepancies between the iWARP and IB verbs that we > need to decide on how we want to handle: > > - In IB-BMME, L_Keys and R_Keys are split up so that there is an > 8-bit "key" that is owned by the consumer. As far as I know, there > is no analogous concept defined for iWARP STags; is there any point > in supporting this IB-only feature (which is optional even in the > IB spec)? > > In fact there is an 8b key for stags as well. The stag is composed of a 3B index allocated by the driver/hw, and a 1B key specified by the consumer. None of this is exposed in the linux rdma interface at this point and cxgb3 always sets the key to 0xff. > - Along similar lines, IB defines two types of memory windows, "type > 1" and "type 2" and in fact type 2 is split into "2A" and "2B" (the > difference is basically whether the MW is associated with just a > QP, or with a QP and a PD). iWARP memory windows are always what > the IB spec would call type 2B. All the IB devices that I know of > with IB-BMME support can handle type 2B memory windows. Is there > any point in having our API worry about the distinction between 2A > or 2B, or should we just decree that we only handle type 2B? (Does > anyone who hasn't just been reading specs even understand the > distinction between type 2A and 2B?) > > - Further, the MW API that we have now, with a separate bind MW verb, > corresponds to type 1 MWs. Type 2 MWs are bound by posting a work > request using the standard "post send" verb. Given that no IB > device drivers have implemented the bind MW verb yet, does it make > sense to deprecate the API for type 1 MWs and say that everyone > should use type 2[B] MWs only? > > The chelsio driver supports the iwarp bind_mw SQ WR via the current API. In fact the current API implies that this call is actually a SQ operation anyway: > /** > * ib_bind_mw - Posts a work request to the send queue of the specified > * QP, which binds the memory window to the given address range and > * remote access attributes. How is the current bind_mw API not valid or correct for iwarp MWs? Other than being a different call than ib_post_send()? > - iWARP supports "RDMA read with invalidate" send work requests, > while IB has no such operation. This makes sense because iWARP > requires the buffer used to receive RDMA read responses to have > remote write permission, while IB has no such requirement. I don't > see a really clean way to handle this except to say that apps have > to have "if (IB) do_this(); else /* iWARP */ do_that();" code to > use this in a portable way. > Or a transport independent app can always use 2 WRs, read + inv-local-stag/fenced instead of read-inv-local-stag. > - Zero-based virtual addresses for memory regions. This is mandatory > for iWARP and optional for IB (and is not required even for BMME). > I think the simplest thing to do is just to have yet another > capability bit to say whether a device supports ZBVA or not; all > iWARP devices can set it. > > Currently, nobody is using this nor the block mode feature. I don't think we should bother supporting them unless someone has an app in mind that will utilize them. > Finally, there are proprietary verbs extensions that are only > supported by a single device at the moment, which we have to decide if > and how to support. It is a tradeoff between making useful features > available versus making the already overly complex verbs API even more > impossible to fathom, although it seems all of these have users asking > for them: > > - ConnectX has XRC, masked atomic operations, and the "block > loopback" flag for UD QPs at least. > > - eHCA has "low-latency" QPs. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From hrosenstock at xsigo.com Mon Apr 7 08:29:00 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Mon, 07 Apr 2008 08:29:00 -0700 Subject: [ofa-general] XmtDiscards Message-ID: <1207582140.15625.284.camel@hrosenstock-ws.xsigo.com> Hi again Bernd, On Mon, 2008-04-07 at 15:53 +0200, Bernd Schubert wrote: > Hello Hal, > > On Monday 07 April 2008 15:35:10 Hal Rosenstock wrote: > > Hi Bernd, > > > > On Sun, 2008-04-06 at 18:05 +0200, Bernd Schubert wrote: > > > Hello Hal, > > > > > > > > Searching for this error I find "This is a symptom of congestion and > > > > > may require tweaking either HOQ or switch lifetime values". > > > > > Well, I have to admit I neither know what HOQ is, nor do I know how > > > > > to tweak it. I also do not have an idea to set switch lifetime > > > > > values. I guess this isn't related to the opensm timeout option, is > > > > > it? > > > > > > > > > > Hmm, I just found a cisci pdf describing how to set the lifetime on > > > > > these switches, but is this also possible on Flextronics switches? > > > > > > > > What routing algorithm are you using ? Rather than play with those > > > > switch values, if you are not using up/down, could you try that to see > > > > if it helps with the congestion you are seeing ? > > > > > > I now configured up/down, but still got XmtDiscards, though, only on one > > > port. > > > > > > Error check on lid 205 (SW_pfs1_leaf2) port all: FAILED > > > #warn: counter XmtDiscards = 6213 (threshold 100) lid 205 port 1 > > > Error check on lid 205 (SW_pfs1_leaf2) port 1: FAILED > > > #warn: counter RcvSwRelayErrors = 1431 (threshold 100) lid 205 port 13 > > > Error check on lid 205 (SW_pfs1_leaf2) port 13: FAILED > > > > Are you running IPoIB ? If so, SwRelayErrors are not necessarily > > indicative of a "real" issue due to the fact that multicasts reflected > > on the same port are mistakenly counted. > > so far only Lustre did IPoIB for network initialization. Once it finds a > working connection it does RDMA. But I'm not sure about what it does in case > of problems, e.g. server reboot, I guess it then does again IPoIB. > > Is there a way to find out if these RcvSwRelayErrors are due to multicast or > due to real problems? While there're no counters which break this down into the 3 buckets AFAIK, one can analyze that switch for the other 2 causes. That's the best I'm aware of that can be done. -- Hal > > > I'm also not sure if up/down is the optimal algorithm for a fabric with > > > only two switches. > > > > > > Since describing the connections in words is a bit difficult, I just > > > upload a drawing here: > > > > > > http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/ib/Interswitch-ca > > >bling.pdf > > > > > > The root-guid for the up/down algorithm is leaf-5 of of the small switch. > > > But I'm still not sure about up/down at all. Doesn't one need for up/down > > > at least 3 switches? Something like this ascii graphic below? > > > > > > > > > root-switch > > > / \ > > > / \ > > > Sw-1 ------------ Sw-2 > > > > Doesn't your chassis switch have many switches in it ? You did say it > > was 144 ports so it's made up of a number of switches. > > Yes, it's made up of a number of switches. > > > > > You may need to choose a "better" root than up/down automatically > > determines. > > > > Opensm isn't able to detect a root itself at all. As said above I first > configured leaf-5 of the small switch (see the pdf file above), but now > switched it to leaf-6 guid. I have no idea which would be optimal for our > switches - I guess I have to create a drawing from the ibnetdiscover output > to figure this out. Yes. > I will also later on try to check with ibutils if it detects errors. Sure; that would be good too. -- Hal > Thanks, > Bernd > > From weiny2 at llnl.gov Mon Apr 7 09:49:06 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Mon, 7 Apr 2008 09:49:06 -0700 Subject: [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ and Leaf HOQ input values (Was: Re: [ofa-general] XmtDiscards) In-Reply-To: <1207401479.15625.221.camel@hrosenstock-ws.xsigo.com> References: <1E3DCD1C63492545881FACB6063A57C1023F6B30@mtiexch01.mti.com> <1207401479.15625.221.camel@hrosenstock-ws.xsigo.com> Message-ID: <20080407094906.7165dc20.weiny2@llnl.gov> On Sat, 05 Apr 2008 06:17:59 -0700 Hal Rosenstock wrote: > On Fri, 2008-04-04 at 17:48 -0700, Boris Shpolyansky wrote: > > Bernd, > > > > 0x14 is the maximal value for HOQ lifetime, which effectively disables > > the mechanism. I think you shouldn't exceed this value. > > True about the maximal value but any 5 bit value > 19 (up through 31) > should effectively be the same thing according to the spec. > > I also think that OpenSM could do a better job validating and setting > this and other similar optional parameters. > As a start here is a patch which checks the HOQ life values. Ira >From 9e05f091a3c9173045f523aee245e98af1bf74f3 Mon Sep 17 00:00:00 2001 From: Ira K. Weiny Date: Mon, 7 Apr 2008 08:31:46 -0700 Subject: [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ and Leaf HOQ input values Signed-off-by: Ira K. Weiny --- opensm/opensm/osm_subnet.c | 22 ++++++++++++++++++++++ 1 files changed, 22 insertions(+), 0 deletions(-) diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index 47d735f..29d7cdc 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -1045,6 +1045,28 @@ static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts) p_opts->force_link_speed = IB_PORT_LINK_SPEED_ENABLED_MASK; } + if (0x14 < p_opts->head_of_queue_lifetime) { + sprintf(buff, + " Invalid Cached Option Value:head_of_queue_lifetime = %u:" + "Using Default:%u\n", p_opts->head_of_queue_lifetime, + OSM_DEFAULT_HEAD_OF_QUEUE_LIFE); + printf(buff); + cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); + p_opts->head_of_queue_lifetime = + OSM_DEFAULT_HEAD_OF_QUEUE_LIFE; + } + + if (0x14 < p_opts->leaf_head_of_queue_lifetime) { + sprintf(buff, + " Invalid Cached Option Value:leaf_head_of_queue_lifetime = %u:" + "Using Default:%u\n", p_opts->leaf_head_of_queue_lifetime, + OSM_DEFAULT_LEAF_HEAD_OF_QUEUE_LIFE); + printf(buff); + cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); + p_opts->leaf_head_of_queue_lifetime = + OSM_DEFAULT_LEAF_HEAD_OF_QUEUE_LIFE; + } + if (strcmp(p_opts->console, OSM_DISABLE_CONSOLE) && strcmp(p_opts->console, OSM_LOCAL_CONSOLE) #ifdef ENABLE_OSM_CONSOLE_SOCKET -- 1.5.1 From hrosenstock at xsigo.com Mon Apr 7 10:06:06 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Mon, 07 Apr 2008 10:06:06 -0700 Subject: [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ and Leaf HOQ input values (Was: Re: [ofa-general] XmtDiscards) In-Reply-To: <20080407094906.7165dc20.weiny2@llnl.gov> References: <1E3DCD1C63492545881FACB6063A57C1023F6B30@mtiexch01.mti.com> <1207401479.15625.221.camel@hrosenstock-ws.xsigo.com> <20080407094906.7165dc20.weiny2@llnl.gov> Message-ID: <1207587966.15625.317.camel@hrosenstock-ws.xsigo.com> On Mon, 2008-04-07 at 09:49 -0700, Ira Weiny wrote: > On Sat, 05 Apr 2008 06:17:59 -0700 > Hal Rosenstock wrote: > > > On Fri, 2008-04-04 at 17:48 -0700, Boris Shpolyansky wrote: > > > Bernd, > > > > > > 0x14 is the maximal value for HOQ lifetime, which effectively disables > > > the mechanism. I think you shouldn't exceed this value. > > > > True about the maximal value but any 5 bit value > 19 (up through 31) > > should effectively be the same thing according to the spec. > > > > I also think that OpenSM could do a better job validating and setting > > this and other similar optional parameters. > > > > As a start here is a patch which checks the HOQ life values. > > Ira > > From 9e05f091a3c9173045f523aee245e98af1bf74f3 Mon Sep 17 00:00:00 2001 > From: Ira K. Weiny > Date: Mon, 7 Apr 2008 08:31:46 -0700 > Subject: [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ and Leaf HOQ input values > > > Signed-off-by: Ira K. Weiny > --- > opensm/opensm/osm_subnet.c | 22 ++++++++++++++++++++++ > 1 files changed, 22 insertions(+), 0 deletions(-) > > diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c > index 47d735f..29d7cdc 100644 > --- a/opensm/opensm/osm_subnet.c > +++ b/opensm/opensm/osm_subnet.c > @@ -1045,6 +1045,28 @@ static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts) > p_opts->force_link_speed = IB_PORT_LINK_SPEED_ENABLED_MASK; > } > > + if (0x14 < p_opts->head_of_queue_lifetime) { > + sprintf(buff, > + " Invalid Cached Option Value:head_of_queue_lifetime = %u:" > + "Using Default:%u\n", p_opts->head_of_queue_lifetime, > + OSM_DEFAULT_HEAD_OF_QUEUE_LIFE); > + printf(buff); > + cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > + p_opts->head_of_queue_lifetime = > + OSM_DEFAULT_HEAD_OF_QUEUE_LIFE; > + } > + > + if (0x14 < p_opts->leaf_head_of_queue_lifetime) { > + sprintf(buff, > + " Invalid Cached Option Value:leaf_head_of_queue_lifetime = %u:" > + "Using Default:%u\n", p_opts->leaf_head_of_queue_lifetime, > + OSM_DEFAULT_LEAF_HEAD_OF_QUEUE_LIFE); > + printf(buff); > + cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > + p_opts->leaf_head_of_queue_lifetime = > + OSM_DEFAULT_LEAF_HEAD_OF_QUEUE_LIFE; > + } > + Should these be set to max rather than default as it seems that that's what they're more likely trying to do ? -- Hal > if (strcmp(p_opts->console, OSM_DISABLE_CONSOLE) > && strcmp(p_opts->console, OSM_LOCAL_CONSOLE) > #ifdef ENABLE_OSM_CONSOLE_SOCKET From or.gerlitz at gmail.com Mon Apr 7 10:39:00 2008 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Mon, 7 Apr 2008 20:39:00 +0300 Subject: [ofa-general] About RDMA_CM_EVENT_DEVICE_REMOVAL In-Reply-To: <47FA313B.20809@Voltaire.COM> References: <47FA313B.20809@Voltaire.COM> Message-ID: <15ddcffd0804071039q48f55544ja89ff2f60ae5592b@mail.gmail.com> On Mon, Apr 7, 2008 at 5:35 PM, Erez Zilber wrote: > 1. Is this event raised for each connection? per rdma cm id which is bounded to device, which is the initiator case is per connection > 2. After the event is raised, I guess that I need to release all IB > resources for that connection, right? If you take a look at > iser_free_ib_conn_res() (in ulp/iser/iser_verbs.c), you can see > that we call rdma_destroy_id. This call never returns. Should I > call rdma_destroy_id while handling RDMA_CM_EVENT_DEVICE_REMOVAL? you are not allowed to call rdma_destroy_id from the context of your callback... this is documented in the rdma-cm .h file, just return non zero from the callback if you want the rdma cm to destroy the id for you... Or From tziporet at dev.mellanox.co.il Mon Apr 7 10:54:02 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 07 Apr 2008 10:54:02 -0700 Subject: [ofa-general] Re: Plan for OFED-1.3.1? In-Reply-To: <200804071331.42031.ossrosch@linux.vnet.ibm.com> References: <200804071331.42031.ossrosch@linux.vnet.ibm.com> Message-ID: <47FA5FBA.8030907@mellanox.co.il> Stefan Roscher wrote: > Hi, > > is there any schedule for the OFED-1.3.1 release? Schedule is May 29 (I will present it as part of OFED 1.3 session today) > When should we start to send some minor bugfixes for ehca? You can start now > Would the kernel-base be the same 2.6.24 or will it change to 2.6.25? > Kernel base will not changed Tziporet From tziporet at dev.mellanox.co.il Mon Apr 7 11:00:31 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 07 Apr 2008 11:00:31 -0700 Subject: [ofa-general] ofed works on kernels with 64Kbyte pages? In-Reply-To: References: <20080404204758.GU29410@sgi.com> Message-ID: <47FA613F.3070301@mellanox.co.il> Roland Dreier wrote: > > I know it's a long shot, but has anyone tried using OFED on > > a kernel with 64Kbyte pages? > > > > SGI would like to support that, but I've gotten reports that > > something is not working (e.g., "ib_rdma_bw" doesn't work on > > an ia64 kernel with 64Kb pages). This is with the mthca driver, > > fwiw. > > > > Unfortunately a conspiracy of h/w prevents me from reproducing > > this right now, so I don't have more details. But I'd be very > > curious to know if anyone can verify that OFED does/doesn't > > work with 64Kbyte pages. > > I don't know about OFED, but I've tried various things on 64KB PAGE_SIZE > systems and it seems to work. It wouldn't surprise me if there are > issues since the drivers and firmware gets a lot less testing in such > situations but it "should work" -- I'd be happy to help debug if anyone > has concrete problems. > OFED was tested on PPC64 with RHEL5.1 which works with 64K pages as a default. This was tested with our ConnectX cards (mlx4 driver) I think IBM are using the same OS for their ehca cards too Tziporet From tziporet at dev.mellanox.co.il Mon Apr 7 11:32:06 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 07 Apr 2008 11:32:06 -0700 Subject: [ofa-general] [PATCH/RFC 1/2] IB/core: Add support for "send with invalidate" work requests In-Reply-To: References: Message-ID: <47FA68A6.8020109@mellanox.co.il> Roland Dreier wrote: > OK here's an updated series of the kernel side, with the invalidate > stuff moved to a new opcode. I also decided after thinking about it > that I liked Eli's suggestion of putting the invalidate rkey in a union > with imm_data. This won't work for libibverbs where we have to preserve > the API but I guess we can burn that bridge when we come to it... > I think send w/invalidate is for kernel keys only (at least in IB) so not clear we need it in libibverbs at all Tziporet From rajouri.jammu at gmail.com Mon Apr 7 11:51:34 2008 From: rajouri.jammu at gmail.com (Rajouri Jammu) Date: Mon, 7 Apr 2008 11:51:34 -0700 Subject: [ofa-general] OFED 1.3 user source rpm Message-ID: <3307cdf90804071151u7b47ad6csd57efaea13455cdb@mail.gmail.com> Hi, I could not find the ofa_user rpm in OFED 1.3. In older releases there was a way to create a separate rpm for the user src. OFED-1.2.5.4]# grep ofa_user * build_env.sh:OFA_USER_SRC_RPM=$(/bin/ls -1 ${SRPMS}/ofa_user*.src.rpm 2> $NULL) BUILD_ID:ofa_user-1.2.5.4: build.sh:# Create RPMs for selected packages from ofa_user and ofa_kernel I couldn't find anything like that in OFED 1.3. I there a way for me to look at the OFED 1.3 user mode sources? thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at dev.mellanox.co.il Mon Apr 7 11:32:06 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 07 Apr 2008 11:32:06 -0700 Subject: [ofa-general] [PATCH/RFC 1/2] IB/core: Add support for "send with invalidate" work requests In-Reply-To: References: Message-ID: <47FA68A6.8020109@mellanox.co.il> Roland Dreier wrote: > OK here's an updated series of the kernel side, with the invalidate > stuff moved to a new opcode. I also decided after thinking about it > that I liked Eli's suggestion of putting the invalidate rkey in a union > with imm_data. This won't work for libibverbs where we have to preserve > the API but I guess we can burn that bridge when we come to it... > I think send w/invalidate is for kernel keys only (at least in IB) so not clear we need it in libibverbs at all Tziporet From jeremy at goop.org Mon Apr 7 12:02:53 2008 From: jeremy at goop.org (Jeremy Fitzhardinge) Date: Mon, 07 Apr 2008 12:02:53 -0700 Subject: [ofa-general] Re: [patch 01/10] emm: mm_lock: Lock a process against reclaim In-Reply-To: <20080405004127.GG14784@duo.random> References: <20080404223048.374852899@sgi.com> <20080404223131.271668133@sgi.com> <47F6B5EA.6060106@goop.org> <20080405004127.GG14784@duo.random> Message-ID: <47FA6FDD.9060605@goop.org> Andrea Arcangeli wrote: > On Fri, Apr 04, 2008 at 04:12:42PM -0700, Jeremy Fitzhardinge wrote: > >> I think you can break this if() down a bit: >> >> if (!(vma->vm_file && vma->vm_file->f_mapping)) >> continue; >> > > It makes no difference at runtime, coding style preferences are quite > subjective. > Well, overall the formatting of that if statement is very hard to read. Separating out the logically distinct pieces in to different ifs at least shows the reader that they are distinct. Aside from that, doing some manual CSE to remove all the casts and expose the actual thing you're testing for would help a lot (are the casts even necessary?). >> So this is an O(n^2) algorithm to take the i_mmap_locks from low to high >> order? A comment would be nice. And O(n^2)? Ouch. How often is it >> called? >> > > It's called a single time when the mmu notifier is registered. It's a > very slow path of course. Any other approach to reduce the complexity > would require memory allocations and it would require > mmu_notifier_register to return -ENOMEM failure. It didn't seem worth > it. > It's per-mm though. How many processes would need to have notifiers? >> And is it necessary to mush lock and unlock together? Unlock ordering >> doesn't matter, so you should just be able to have a much simpler loop, no? >> > > That avoids duplicating .text. Originally they were separated. unlock > can't be a simpler loop because I didn't reserve vm_flags bitflags to > do a single O(N) loop for unlock. If you do malloc+fork+munmap two > vmas will point to the same anon-vma lock, that's why the unlock isn't > simpler unless I mark what I locked with a vm_flags bitflag. Well, its definitely going to need more comments then. I assumed it would end up locking everything, so unlocking everything would be sufficient. J From swise at opengridcomputing.com Mon Apr 7 12:28:50 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 07 Apr 2008 14:28:50 -0500 Subject: [ofa-general] [PATCH/RFC 1/2] IB/core: Add support for "send with invalidate" work requests In-Reply-To: <47FA68A6.8020109@mellanox.co.il> References: <47FA68A6.8020109@mellanox.co.il> Message-ID: <47FA75F2.3040907@opengridcomputing.com> Tziporet Koren wrote: > Roland Dreier wrote: >> OK here's an updated series of the kernel side, with the invalidate >> stuff moved to a new opcode. I also decided after thinking about it >> that I liked Eli's suggestion of putting the invalidate rkey in a union >> with imm_data. This won't work for libibverbs where we have to preserve >> the API but I guess we can burn that bridge when we come to it... >> > > I think send w/invalidate is for kernel keys only (at least in IB) so > not clear we need it in libibverbs at all > For iWARP, its needed for user mode as well... Steve. From andrea at qumranet.com Mon Apr 7 12:35:44 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Mon, 7 Apr 2008 21:35:44 +0200 Subject: [ofa-general] Re: [patch 01/10] emm: mm_lock: Lock a process against reclaim In-Reply-To: <47FA6FDD.9060605@goop.org> References: <20080404223048.374852899@sgi.com> <20080404223131.271668133@sgi.com> <47F6B5EA.6060106@goop.org> <20080405004127.GG14784@duo.random> <47FA6FDD.9060605@goop.org> Message-ID: <20080407193544.GH20587@duo.random> On Mon, Apr 07, 2008 at 12:02:53PM -0700, Jeremy Fitzhardinge wrote: > It's per-mm though. How many processes would need to have notifiers? There can be up to hundreds of VM in a single system. Not sure to understand the point of the question though. > Well, its definitely going to need more comments then. I assumed it would > end up locking everything, so unlocking everything would be sufficient. After your comments, I'm writing an alternate version that will guarantee a O(N) worst case to both sigkill and cond_resched but frankly this is low priority. Without mmu notifiers /dev/kvm can't be given to a normal luser without at least losing mlock ulimits, so lack of a mmu notifiers is a bigger issue than whatever complexity in mm_lock as far as /dev/kvm ownership is concerned. From rdreier at cisco.com Mon Apr 7 12:45:00 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 07 Apr 2008 12:45:00 -0700 Subject: [ofa-general] [PATCH/RFC 1/2] IB/core: Add support for "send with invalidate" work requests In-Reply-To: <47FA68A6.8020109@mellanox.co.il> (Tziporet Koren's message of "Mon, 07 Apr 2008 11:32:06 -0700") References: <47FA68A6.8020109@mellanox.co.il> Message-ID: > I think send w/invalidate is for kernel keys only (at least in IB) so > not clear we need it in libibverbs at all Really? Shouldn't userspace be able to do send with invalidate for memory windows? - R. From jimmott at austin.rr.com Mon Apr 7 14:11:34 2008 From: jimmott at austin.rr.com (Jim Mott) Date: Mon, 7 Apr 2008 16:11:34 -0500 Subject: [ofa-general] Running sdpnetstat on removing ib_sdp module causes kernel panic In-Reply-To: References: Message-ID: <000001c898f3$fd30fc60$f792f520$@rr.com> I have not seen it, but I am not sure I have tried this. I'll check it and report status (and fix) on you bug (bug996). It will be this weekend before I can look though. From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Poornima Kamath (Contractor - ) Sent: Monday, April 07, 2008 9:57 AM To: general at lists.openfabrics.org Subject: [ofa-general] Running sdpnetstat on removing ib_sdp module causes kernel panic Hi, I am getting a kernel panic on running sdpnetstat when ib_sdp module is unloaded. Has anyone seen this? I am running OFED-1.3. I have opened a bug in OFED-bugzilla for the same. https://bugs.openfabrics.org/show_bug.cgi?id=996. Regards, Poornima -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Mon Apr 7 14:21:54 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 07 Apr 2008 14:21:54 -0700 Subject: [ofa-general] Directions for verbs API extensions In-Reply-To: <47FA3D60.3020905@opengridcomputing.com> (Steve Wise's message of "Mon, 07 Apr 2008 10:27:28 -0500") References: <47FA3D60.3020905@opengridcomputing.com> Message-ID: > > There are a few discrepancies between the iWARP and IB verbs that we > > need to decide on how we want to handle: > > > > - In IB-BMME, L_Keys and R_Keys are split up so that there is an > > 8-bit "key" that is owned by the consumer. As far as I know, there > > is no analogous concept defined for iWARP STags; is there any point > > in supporting this IB-only feature (which is optional even in the > > IB spec)? > In fact there is an 8b key for stags as well. The stag is composed of > a 3B index allocated by the driver/hw, and a 1B key specified by the > consumer. None of this is exposed in the linux rdma interface at this > point and cxgb3 always sets the key to 0xff. Oops, I completely missed that in the iWARP verbs spec. Yes, the IB and iWARP verbs agree on the semantics here, so the only issue is that the "key" portion of L_Keys/R_Keys is only supported by IB devices that do BMME. So we can expose this in the API without too much trouble. > The chelsio driver supports the iwarp bind_mw SQ WR via the current > API. In fact the current API implies that this call is actually a SQ > operation anyway: > > /** > > * ib_bind_mw - Posts a work request to the send queue of the specified > > * QP, which binds the memory window to the given address range and > > * remote access attributes. > > How is the current bind_mw API not valid or correct for iwarp MWs? > Other than being a different call than ib_post_send()? That's the only issue. The main impact is that you can't submit an MW bind as part of a list of send WRs. I guess it's not too severe an issue. I don't have any strong feelings here, except that eliminating the separate bind_mw call might be a little cleaner. On the other hand it adds more conditional branches to post_send so maybe it's a net lose. > > - iWARP supports "RDMA read with invalidate" send work requests, > > while IB has no such operation. This makes sense because iWARP > > requires the buffer used to receive RDMA read responses to have > > remote write permission, while IB has no such requirement. I don't > > see a really clean way to handle this except to say that apps have > > to have "if (IB) do_this(); else /* iWARP */ do_that();" code to > > use this in a portable way. > Or a transport independent app can always use 2 WRs, read + > inv-local-stag/fenced instead of read-inv-local-stag. Except that fenced local invalidate is optional on IB ;) But as I said I think we can assume that IB devices that support local invalidate support fencing it. > > - Zero-based virtual addresses for memory regions. This is mandatory > > for iWARP and optional for IB (and is not required even for BMME). > > I think the simplest thing to do is just to have yet another > > capability bit to say whether a device supports ZBVA or not; all > > iWARP devices can set it. > Currently, nobody is using this nor the block mode feature. I don't > think we should bother supporting them unless someone has an app in > mind that will utilize them. I agree that block mode seems dubious. I believe that iSER on iWARP requires ZBVA though. - R. From sashak at voltaire.com Mon Apr 7 18:44:06 2008 From: sashak at voltaire.com (Sasha Copyist) Date: Tue, 8 Apr 2008 01:44:06 +0000 Subject: [ofa-general] ERR 0108: Unknown remote side In-Reply-To: <200804041147.27565.bs@q-leap.de> References: <200804041147.27565.bs@q-leap.de> Message-ID: <20080408014406.GA16864@sashak.voltaire.com> Hi Bernd, On 11:47 Fri 04 Apr , Bernd Schubert wrote: > > opensm-3.2.1 logs some error messages like this: > > Apr 04 00:00:08 325114 [4580A960] 0x01 -> __osm_state_mgr_light_sweep_start: > ERR 0108: Unknown remote side for node 0 > x000b8cffff002ba2(SW_pfs1_leaf4) port 13. Adding to light sweep sampling list > Apr 04 00:00:08 325126 [4580A960] 0x01 -> Directed Path Dump of 3 hop path: > Path = 0,1,14,13 > > > From ibnetdiscover output I see port13 of this switch is a switch-interconnect > (sorry, I don't know what the correct name/identifier for switches within > switches): > > [13] "S-000b8cffff002bfa"[13] # "SW_pfs1_inter7" lid 263 > 4xSDR It is possible that port was DOWN during first subnet discovery. Finally everything should be initialized after those messages. Isn't it the case here? Sasha From swise at opengridcomputing.com Mon Apr 7 16:06:59 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 07 Apr 2008 18:06:59 -0500 Subject: [ofa-general] Directions for verbs API extensions In-Reply-To: References: <47FA3D60.3020905@opengridcomputing.com> Message-ID: <47FAA913.7090805@opengridcomputing.com> > > Currently, nobody is using this nor the block mode feature. I don't > > think we should bother supporting them unless someone has an app in > > mind that will utilize them. > > I agree that block mode seems dubious. I believe that iSER on iWARP > requires ZBVA though. > You're right. However, iSER as its spec'd in the IETF, cannot work in Linux due to the linux networking maintainer's insistence that RDMA connections not share the same port space. Specifically, the spec mandates (for TCP only, not IB), that the connection used for the iSCSI login be migrated into rdma mode. IE you cannot start a different connection for doing the data moving part... Steve. From tziporet at dev.mellanox.co.il Mon Apr 7 17:49:26 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 07 Apr 2008 17:49:26 -0700 Subject: [ofa-general] Directions for verbs API extensions In-Reply-To: References: Message-ID: <47FAC116.3060600@mellanox.co.il> Roland Dreier wrote: > Finally, there are proprietary verbs extensions that are only > supported by a single device at the moment, which we have to decide if > and how to support. It is a tradeoff between making useful features > available versus making the already overly complex verbs API even more > impossible to fathom, although it seems all of these have users asking > for them: > > - ConnectX has XRC, masked atomic operations, and the "block > loopback" flag for UD QPs at least. We also have reliable multicast feature we wish to add Tziporet From tziporet at dev.mellanox.co.il Mon Apr 7 17:51:12 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 07 Apr 2008 17:51:12 -0700 Subject: [ofa-general] [PATCH/RFC 1/2] IB/core: Add support for "send with invalidate" work requests In-Reply-To: References: <47FA68A6.8020109@mellanox.co.il> Message-ID: <47FAC180.3040303@mellanox.co.il> Roland Dreier wrote: > > I think send w/invalidate is for kernel keys only (at least in IB) so > > not clear we need it in libibverbs at all > > Really? Shouldn't userspace be able to do send with invalidate for > memory windows? > Yes but we actually have not implemented memory window in IB either :-) Tziporet From grossmann at hlrs.de Tue Apr 8 01:13:52 2008 From: grossmann at hlrs.de (Thomas =?iso-8859-1?q?Gro=DFmann?=) Date: Tue, 8 Apr 2008 10:13:52 +0200 Subject: [ofa-general] kernel ib build (OFED 1.3) fails on SLES 10 Message-ID: <200804081013.52983.grossmann@hlrs.de> Hi, kernel ib build (OFED 1.3) fails on SLES 10. You find the output attached. Best regards, Thomas -- Thomas Großmann                  High Performance Computing Center Stuttgart (HLRS)                                         Allmandring 30                                                  70550 Stuttgart, Germany    E-Mail: grossmann at hlrs.de                                                                Phone: ++49-711-685-65529  Fax  : ++49-711-685-65832 -------------- next part -------------- warning: user vlad does not exist - using root warning: group vlad does not exist - using root warning: user vlad does not exist - using root warning: group vlad does not exist - using root Installing /root/OFED-1.3/SRPMS/ofa_kernel-1.3-ofed1.3.src.rpm Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.52212 + umask 022 + cd /var/tmp/OFED_topdir/BUILD + cd /var/tmp/OFED_topdir/BUILD + rm -rf ofa_kernel-1.3 + /usr/bin/gzip -dc /var/tmp/OFED_topdir/SOURCES/ofa_kernel-1.3.tgz + tar -xvvf - drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:54 ofa_kernel-1.3/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:54 ofa_kernel-1.3/.git/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:54 ofa_kernel-1.3/.git/refs/heads/ -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/heads/ofed_kernel -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/heads/ofed_kernel_2_6_24_rc1 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/heads/ofed_kernel_2_6_23 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:54 ofa_kernel-1.3/.git/refs/heads/master drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/ -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/sdp_ofed_1_1 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.12 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.12-rc2 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.12-rc3 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.12-rc4 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.12-rc5 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.12-rc6 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.13 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.13-rc1 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.13-rc2 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.13-rc3 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.13-rc4 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.13-rc5 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.13-rc6 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.13-rc7 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.14 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.14-rc1 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.14-rc2 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.14-rc3 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.14-rc4 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.14-rc5 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.15 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.15-rc1 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.15-rc2 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.15-rc3 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.15-rc4 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.15-rc5 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.15-rc6 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.15-rc7 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.16 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.16-rc1 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.16-rc2 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.16-rc3 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.16-rc4 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.16-rc5 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.16-rc6 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.17 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.17-rc1 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.17-rc2 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.17-rc3 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.17-rc4 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.17-rc5 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.17-rc6 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.18 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.18-rc1 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.18-rc2 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.18-rc3 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.18-rc4 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.18-rc5 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.18-rc6 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.18-rc7 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.19 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.19-rc1 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.19-rc2 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.19-rc3 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.19-rc4 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.19-rc5 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.19-rc6 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.20 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.20-rc1 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.20-rc2 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.20-rc3 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.20-rc4 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.20-rc5 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.20-rc6 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.20-rc7 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.21 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.21-rc1 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.21-rc2 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.21-rc3 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.21-rc4 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.21-rc5 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.21-rc6 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.21-rc7 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.22 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.22-rc1 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.22-rc2 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.22-rc3 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.22-rc4 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.22-rc5 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.22-rc6 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.22-rc7 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.23 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.23-rc1 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.23-rc2 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.23-rc3 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.23-rc4 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.23-rc5 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.23-rc6 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.23-rc7 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.23-rc8 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.23-rc9 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.24 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.24-rc2 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.24-rc3 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.24-rc5 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2-rc1 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2-rc2 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2-rc3 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2-rc4 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2-rc5 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2-rc6 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2.5 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2.c-10 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2.c-11 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2.c-9 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.3-beta2 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.3-rc1 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.3-rc2 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.3-rc3 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.3-rc4 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.3-rc5 -rw-r--r-- vlad/vlad 41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.3-rc6 drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:49 ofa_kernel-1.3/.git/branches/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:49 ofa_kernel-1.3/.git/info/ -rw-r--r-- vlad/vlad 240 2008-02-28 09:59:49 ofa_kernel-1.3/.git/info/exclude drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:49 ofa_kernel-1.3/.git/hooks/ -rw-r--r-- vlad/vlad 441 2008-02-28 09:59:49 ofa_kernel-1.3/.git/hooks/applypatch-msg -rw-r--r-- vlad/vlad 781 2008-02-28 09:59:49 ofa_kernel-1.3/.git/hooks/commit-msg -rw-r--r-- vlad/vlad 152 2008-02-28 09:59:49 ofa_kernel-1.3/.git/hooks/post-commit -rw-r--r-- vlad/vlad 511 2008-02-28 09:59:49 ofa_kernel-1.3/.git/hooks/post-receive -rw-r--r-- vlad/vlad 207 2008-02-28 09:59:49 ofa_kernel-1.3/.git/hooks/post-update -rw-r--r-- vlad/vlad 388 2008-02-28 09:59:49 ofa_kernel-1.3/.git/hooks/pre-applypatch -rw-r--r-- vlad/vlad 1696 2008-02-28 09:59:49 ofa_kernel-1.3/.git/hooks/pre-commit -rw-r--r-- vlad/vlad 4262 2008-02-28 09:59:49 ofa_kernel-1.3/.git/hooks/pre-rebase -rw-r--r-- vlad/vlad 1949 2008-02-28 09:59:49 ofa_kernel-1.3/.git/hooks/update -rw-r--r-- vlad/vlad 58 2008-02-28 09:59:49 ofa_kernel-1.3/.git/description drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:54 ofa_kernel-1.3/.git/objects/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:49 ofa_kernel-1.3/.git/objects/info/ -rw-r--r-- vlad/vlad 39 2008-02-28 09:59:49 ofa_kernel-1.3/.git/objects/info/alternates -rw-r--r-- vlad/vlad 23 2008-02-28 09:59:49 ofa_kernel-1.3/.git/HEAD -rw-r--r-- vlad/vlad 92 2008-02-28 09:59:49 ofa_kernel-1.3/.git/config drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:54 ofa_kernel-1.3/.git/logs/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:49 ofa_kernel-1.3/.git/logs/refs/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:54 ofa_kernel-1.3/.git/logs/refs/heads/ -rw-r--r-- vlad/vlad 222 2008-02-28 09:59:49 ofa_kernel-1.3/.git/logs/refs/heads/ofed_kernel -rw-r--r-- vlad/vlad 222 2008-02-28 09:59:49 ofa_kernel-1.3/.git/logs/refs/heads/ofed_kernel_2_6_23 -rw-r--r-- vlad/vlad 222 2008-02-28 09:59:49 ofa_kernel-1.3/.git/logs/refs/heads/ofed_kernel_2_6_24_rc1 -rw-r--r-- vlad/vlad 161 2008-02-28 09:59:54 ofa_kernel-1.3/.git/logs/refs/heads/master -rw-r--r-- vlad/vlad 161 2008-02-28 09:59:54 ofa_kernel-1.3/.git/logs/HEAD drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/ -rwxr-xr-x vlad/vlad 1334 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/ofed_checkout.sh -rw-r--r-- vlad/vlad 331 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/90-ib.rules -rw-r--r-- vlad/vlad 616 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/Makefile -rwxr-xr-x vlad/vlad 38197 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/configure -rw-r--r-- vlad/vlad 194 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/iscsi_scsi_makefile -rw-r--r-- vlad/vlad 15698 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/makefile -rwxr-xr-x vlad/vlad 26219 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/ofa_kernel.spec -rwxr-xr-x vlad/vlad 2921 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/ofed_makedist.sh -rwxr-xr-x vlad/vlad 13027 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/ofed_patch.sh -rw-r--r-- vlad/vlad 40 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/openib.conf -rwxr-xr-x vlad/vlad 43734 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/openibd drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/Documentation/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/Documentation/infiniband/ -rw-r--r-- vlad/vlad 4081 2008-02-28 09:59:50 ofa_kernel-1.3/Documentation/infiniband/core_locking.txt -rw-r--r-- vlad/vlad 2289 2008-02-28 09:59:50 ofa_kernel-1.3/Documentation/infiniband/ipoib.txt -rw-r--r-- vlad/vlad 2236 2008-02-28 09:59:50 ofa_kernel-1.3/Documentation/infiniband/sysfs.txt -rw-r--r-- vlad/vlad 4939 2008-02-28 09:59:50 ofa_kernel-1.3/Documentation/infiniband/user_mad.txt -rw-r--r-- vlad/vlad 2981 2008-02-28 09:59:50 ofa_kernel-1.3/Documentation/infiniband/user_verbs.txt drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ -rw-r--r-- vlad/vlad 1904 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/Kconfig -rw-r--r-- vlad/vlad 660 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/Makefile drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/ -rw-r--r-- vlad/vlad 790 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/Makefile -rw-r--r-- vlad/vlad 9613 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/addr.c -rw-r--r-- vlad/vlad 6214 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/agent.c -rw-r--r-- vlad/vlad 2160 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/agent.h -rw-r--r-- vlad/vlad 10355 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/cache.c -rw-r--r-- vlad/vlad 100297 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/cm.c -rw-r--r-- vlad/vlad 21553 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/cm_msgs.h -rw-r--r-- vlad/vlad 71133 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/cma.c -rw-r--r-- vlad/vlad 1875 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/core_priv.h -rw-r--r-- vlad/vlad 20057 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/device.c -rw-r--r-- vlad/vlad 14148 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/fmr_pool.c -rw-r--r-- vlad/vlad 28919 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/iwcm.c -rw-r--r-- vlad/vlad 2343 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/iwcm.h -rw-r--r-- vlad/vlad 85825 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/mad.c -rw-r--r-- vlad/vlad 6126 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/mad_priv.h -rw-r--r-- vlad/vlad 27404 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/mad_rmpp.c -rw-r--r-- vlad/vlad 2175 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/mad_rmpp.h -rw-r--r-- vlad/vlad 21515 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/multicast.c -rw-r--r-- vlad/vlad 6506 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/packer.c -rw-r--r-- vlad/vlad 2326 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/sa.h -rw-r--r-- vlad/vlad 29093 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/sa_query.c -rw-r--r-- vlad/vlad 7423 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/smi.c -rw-r--r-- vlad/vlad 2874 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/smi.h -rw-r--r-- vlad/vlad 20260 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/sysfs.c -rw-r--r-- vlad/vlad 33526 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/ucm.c -rw-r--r-- vlad/vlad 26314 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/ucma.c -rw-r--r-- vlad/vlad 10141 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/ud_header.c -rw-r--r-- vlad/vlad 7807 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/umem.c -rw-r--r-- vlad/vlad 31016 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/user_mad.c -rw-r--r-- vlad/vlad 6719 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/uverbs.h -rw-r--r-- vlad/vlad 53964 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/uverbs_cmd.c -rw-r--r-- vlad/vlad 24216 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/uverbs_main.c -rw-r--r-- vlad/vlad 5119 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/uverbs_marshall.c -rw-r--r-- vlad/vlad 19891 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/verbs.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/debug/ -rw-r--r-- vlad/vlad 90 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/debug/Makefile -rw-r--r-- vlad/vlad 21953 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/debug/memtrack.c -rw-r--r-- vlad/vlad 1734 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/debug/memtrack.h -rw-r--r-- vlad/vlad 4505 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/debug/mtrack.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/ -rw-r--r-- vlad/vlad 244 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/Kbuild -rw-r--r-- vlad/vlad 469 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/Kconfig -rw-r--r-- vlad/vlad 33363 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2.c -rw-r--r-- vlad/vlad 13966 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2.h -rw-r--r-- vlad/vlad 9195 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_ae.c -rw-r--r-- vlad/vlad 3338 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_ae.h -rw-r--r-- vlad/vlad 4066 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_alloc.c -rw-r--r-- vlad/vlad 9984 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_cm.c -rw-r--r-- vlad/vlad 10552 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_cq.c -rw-r--r-- vlad/vlad 5589 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_intr.c -rw-r--r-- vlad/vlad 8859 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_mm.c -rw-r--r-- vlad/vlad 4594 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_mq.c -rw-r--r-- vlad/vlad 3235 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_mq.h -rw-r--r-- vlad/vlad 3012 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_pd.c -rw-r--r-- vlad/vlad 21809 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_provider.c -rw-r--r-- vlad/vlad 4101 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_provider.h -rw-r--r-- vlad/vlad 24792 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_qp.c -rw-r--r-- vlad/vlad 16626 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_rnic.c -rw-r--r-- vlad/vlad 5103 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_status.h -rw-r--r-- vlad/vlad 2415 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_user.h -rw-r--r-- vlad/vlad 7745 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_vq.c -rw-r--r-- vlad/vlad 2530 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_vq.h -rw-r--r-- vlad/vlad 35159 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_wr.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/ -rw-r--r-- vlad/vlad 864 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/Kconfig -rw-r--r-- vlad/vlad 335 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/Makefile -rw-r--r-- vlad/vlad 5162 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/cxio_dbg.c -rw-r--r-- vlad/vlad 36525 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/cxio_hal.c -rw-r--r-- vlad/vlad 6913 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/cxio_hal.h -rw-r--r-- vlad/vlad 8670 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/cxio_resource.c -rw-r--r-- vlad/vlad 3116 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/cxio_resource.h -rw-r--r-- vlad/vlad 18781 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/cxio_wr.h -rw-r--r-- vlad/vlad 5500 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch.c -rw-r--r-- vlad/vlad 4547 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch.h -rw-r--r-- vlad/vlad 55171 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch_cm.c -rw-r--r-- vlad/vlad 5739 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch_cm.h -rw-r--r-- vlad/vlad 5576 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch_cq.c -rw-r--r-- vlad/vlad 7100 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch_ev.c -rw-r--r-- vlad/vlad 4709 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch_mem.c -rw-r--r-- vlad/vlad 32654 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch_provider.c -rw-r--r-- vlad/vlad 9450 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch_provider.h -rw-r--r-- vlad/vlad 27217 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch_qp.c -rw-r--r-- vlad/vlad 2128 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch_user.h -rw-r--r-- vlad/vlad 20279 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/tcb.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/Kconfig -rw-r--r-- vlad/vlad 545 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/Makefile -rw-r--r-- vlad/vlad 8960 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_av.c -rw-r--r-- vlad/vlad 10590 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_classes.h -rw-r--r-- vlad/vlad 10433 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_classes_pSeries.h -rw-r--r-- vlad/vlad 11830 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_cq.c -rw-r--r-- vlad/vlad 5122 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_eq.c -rw-r--r-- vlad/vlad 11393 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_hca.c -rw-r--r-- vlad/vlad 23029 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_irq.c -rw-r--r-- vlad/vlad 2515 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_irq.h -rw-r--r-- vlad/vlad 6746 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_iverbs.h -rw-r--r-- vlad/vlad 27987 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_main.c -rw-r--r-- vlad/vlad 4587 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_mcast.c -rw-r--r-- vlad/vlad 63998 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_mrmw.c -rw-r--r-- vlad/vlad 3545 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_mrmw.h -rw-r--r-- vlad/vlad 3656 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_pd.c -rw-r--r-- vlad/vlad 6177 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_qes.h -rw-r--r-- vlad/vlad 53087 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_qp.c -rw-r--r-- vlad/vlad 19274 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_reqs.c -rw-r--r-- vlad/vlad 3356 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_sqp.c -rw-r--r-- vlad/vlad 5321 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_tools.h -rw-r--r-- vlad/vlad 8730 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_uverbs.c -rw-r--r-- vlad/vlad 27968 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/hcp_if.c -rw-r--r-- vlad/vlad 9357 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/hcp_if.h -rw-r--r-- vlad/vlad 2456 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/hcp_phyp.c -rw-r--r-- vlad/vlad 2927 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/hcp_phyp.h -rw-r--r-- vlad/vlad 2411 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/hipz_fns.h -rw-r--r-- vlad/vlad 3412 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/hipz_fns_core.h -rw-r--r-- vlad/vlad 8966 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/hipz_hw.h -rw-r--r-- vlad/vlad 7476 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ipz_pt_fn.c -rw-r--r-- vlad/vlad 8733 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ipz_pt_fn.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ -rw-r--r-- vlad/vlad 425 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/Kconfig -rw-r--r-- vlad/vlad 723 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/Makefile -rw-r--r-- vlad/vlad 24608 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_common.h -rw-r--r-- vlad/vlad 12017 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_cq.c -rw-r--r-- vlad/vlad 4209 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_debug.h -rw-r--r-- vlad/vlad 15027 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_diag.c -rw-r--r-- vlad/vlad 4906 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_dma.c -rw-r--r-- vlad/vlad 67267 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_driver.c -rw-r--r-- vlad/vlad 23428 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_eeprom.c -rw-r--r-- vlad/vlad 70677 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_file_ops.c -rw-r--r-- vlad/vlad 9387 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_fs.c -rw-r--r-- vlad/vlad 54076 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_iba6110.c -rw-r--r-- vlad/vlad 49945 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_iba6120.c -rw-r--r-- vlad/vlad 31654 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_init_chip.c -rw-r--r-- vlad/vlad 37435 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_intr.c -rw-r--r-- vlad/vlad 33139 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_kernel.h -rw-r--r-- vlad/vlad 6501 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_keys.c -rw-r--r-- vlad/vlad 44054 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_mad.c -rw-r--r-- vlad/vlad 4926 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_mmap.c -rw-r--r-- vlad/vlad 10334 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_mr.c -rw-r--r-- vlad/vlad 26505 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_qp.c -rw-r--r-- vlad/vlad 52463 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_rc.c -rw-r--r-- vlad/vlad 19080 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_registers.h -rw-r--r-- vlad/vlad 17517 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_ruc.c -rw-r--r-- vlad/vlad 9238 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_srq.c -rw-r--r-- vlad/vlad 10904 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_stats.c -rw-r--r-- vlad/vlad 20246 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_sysfs.c -rw-r--r-- vlad/vlad 13730 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_uc.c -rw-r--r-- vlad/vlad 15626 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_ud.c -rw-r--r-- vlad/vlad 6025 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_user_pages.c -rw-r--r-- vlad/vlad 50825 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_verbs.c -rw-r--r-- vlad/vlad 25960 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_verbs.h -rw-r--r-- vlad/vlad 8559 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c -rw-r--r-- vlad/vlad 2170 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_wc_ppc64.c -rw-r--r-- vlad/vlad 6209 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/ -rw-r--r-- vlad/vlad 312 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/Kconfig -rw-r--r-- vlad/vlad 107 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/Makefile -rw-r--r-- vlad/vlad 3493 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/ah.c -rw-r--r-- vlad/vlad 13929 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/cq.c -rw-r--r-- vlad/vlad 5388 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/doorbell.c -rw-r--r-- vlad/vlad 9730 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/mad.c -rw-r--r-- vlad/vlad 20985 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/main.c -rw-r--r-- vlad/vlad 8852 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/mlx4_ib.h -rw-r--r-- vlad/vlad 6663 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/mr.c -rw-r--r-- vlad/vlad 46124 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/qp.c -rw-r--r-- vlad/vlad 8788 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/srq.c -rw-r--r-- vlad/vlad 2595 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/user.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/ -rw-r--r-- vlad/vlad 604 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/Kconfig -rw-r--r-- vlad/vlad 310 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/Makefile -rw-r--r-- vlad/vlad 7704 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_allocator.c -rw-r--r-- vlad/vlad 10152 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_av.c -rw-r--r-- vlad/vlad 5804 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_catas.c -rw-r--r-- vlad/vlad 58256 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_cmd.c -rw-r--r-- vlad/vlad 10839 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_cmd.h -rw-r--r-- vlad/vlad 2099 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_config_reg.h -rw-r--r-- vlad/vlad 26337 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_cq.c -rw-r--r-- vlad/vlad 19050 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_dev.h -rw-r--r-- vlad/vlad 3550 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_doorbell.h -rw-r--r-- vlad/vlad 26633 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_eq.c -rw-r--r-- vlad/vlad 9714 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_mad.c -rw-r--r-- vlad/vlad 38467 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_main.c -rw-r--r-- vlad/vlad 9659 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_mcg.c -rw-r--r-- vlad/vlad 18187 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_memfree.c -rw-r--r-- vlad/vlad 5773 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_memfree.h -rw-r--r-- vlad/vlad 24295 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_mr.c -rw-r--r-- vlad/vlad 2654 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_pd.c -rw-r--r-- vlad/vlad 9420 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_profile.c -rw-r--r-- vlad/vlad 2067 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_profile.h -rw-r--r-- vlad/vlad 36121 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_provider.c -rw-r--r-- vlad/vlad 9102 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_provider.h -rw-r--r-- vlad/vlad 63097 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_qp.c -rw-r--r-- vlad/vlad 7799 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_reset.c -rw-r--r-- vlad/vlad 18070 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_srq.c -rw-r--r-- vlad/vlad 2435 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_uar.c -rw-r--r-- vlad/vlad 2707 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_user.h -rw-r--r-- vlad/vlad 3417 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_wqe.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/ -rw-r--r-- vlad/vlad 488 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/Kconfig -rw-r--r-- vlad/vlad 115 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/Makefile -rw-r--r-- vlad/vlad 32042 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes.c -rw-r--r-- vlad/vlad 18448 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes.h -rw-r--r-- vlad/vlad 90884 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes_cm.c -rw-r--r-- vlad/vlad 12173 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes_cm.h -rw-r--r-- vlad/vlad 6964 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes_context.h -rw-r--r-- vlad/vlad 105780 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes_hw.c -rw-r--r-- vlad/vlad 35775 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes_hw.h -rw-r--r-- vlad/vlad 58106 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes_nic.c -rw-r--r-- vlad/vlad 3306 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes_user.h -rw-r--r-- vlad/vlad 30864 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes_utils.c -rw-r--r-- vlad/vlad 123589 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes_verbs.c -rw-r--r-- vlad/vlad 5039 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes_verbs.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ -rw-r--r-- vlad/vlad 1858 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/Kconfig -rw-r--r-- vlad/vlad 290 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/Makefile -rw-r--r-- vlad/vlad 19322 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ipoib.h -rw-r--r-- vlad/vlad 37961 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ipoib_cm.c -rw-r--r-- vlad/vlad 6877 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ipoib_fs.c -rw-r--r-- vlad/vlad 21934 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ipoib_ib.c -rw-r--r-- vlad/vlad 32744 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ipoib_main.c -rw-r--r-- vlad/vlad 25265 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ipoib_multicast.c -rw-r--r-- vlad/vlad 7326 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ipoib_verbs.c -rw-r--r-- vlad/vlad 4517 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ipoib_vlan.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/iser/ -rw-r--r-- vlad/vlad 500 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/iser/Kconfig -rw-r--r-- vlad/vlad 125 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/iser/Makefile -rw-r--r-- vlad/vlad 18305 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/iser/iscsi_iser.c -rw-r--r-- vlad/vlad 11909 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/iser/iscsi_iser.h -rw-r--r-- vlad/vlad 20298 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/iser/iser_initiator.c -rw-r--r-- vlad/vlad 14624 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/iser/iser_memory.c -rw-r--r-- vlad/vlad 22173 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/iser/iser_verbs.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/ -rw-r--r-- vlad/vlad 1030 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/Kconfig -rw-r--r-- vlad/vlad 315 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/Makefile -rw-r--r-- vlad/vlad 12598 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_config.c -rw-r--r-- vlad/vlad 5484 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_config.h -rw-r--r-- vlad/vlad 64068 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_control.c -rw-r--r-- vlad/vlad 6351 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_control.h -rw-r--r-- vlad/vlad 8431 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_control_pkt.h -rw-r--r-- vlad/vlad 32535 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_data.c -rw-r--r-- vlad/vlad 5572 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_data.h -rw-r--r-- vlad/vlad 17797 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_ib.c -rw-r--r-- vlad/vlad 4651 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_ib.h -rw-r--r-- vlad/vlad 26695 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_main.c -rw-r--r-- vlad/vlad 4000 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_main.h -rw-r--r-- vlad/vlad 3526 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.c -rw-r--r-- vlad/vlad 2561 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.h -rw-r--r-- vlad/vlad 7195 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_stats.c -rw-r--r-- vlad/vlad 11063 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_stats.h -rw-r--r-- vlad/vlad 20373 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_sys.c -rw-r--r-- vlad/vlad 2096 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_sys.h -rw-r--r-- vlad/vlad 3107 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_trailer.h -rw-r--r-- vlad/vlad 6900 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_util.h -rw-r--r-- vlad/vlad 29143 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_viport.c -rw-r--r-- vlad/vlad 4876 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_viport.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/sdp/ -rw-r--r-- vlad/vlad 1186 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/sdp/Kconfig -rw-r--r-- vlad/vlad 158 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/sdp/Makefile -rw-r--r-- vlad/vlad 7192 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/sdp/sdp.h -rw-r--r-- vlad/vlad 21016 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/sdp/sdp_bcopy.c -rw-r--r-- vlad/vlad 14446 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/sdp/sdp_cma.c -rw-r--r-- vlad/vlad 59665 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/sdp/sdp_main.c -rw-r--r-- vlad/vlad 278 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/sdp/sdp_socket.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srp/ -rw-r--r-- vlad/vlad 43 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srp/Kbuild -rw-r--r-- vlad/vlad 355 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srp/Kconfig -rw-r--r-- vlad/vlad 54715 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srp/ib_srp.c -rw-r--r-- vlad/vlad 4099 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srp/ib_srp.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srpt/ -rw-r--r-- vlad/vlad 461 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srpt/Kconfig -rw-r--r-- vlad/vlad 134 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srpt/Makefile -rw-r--r-- vlad/vlad 2738 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srpt/ib_dm_mad.h -rw-r--r-- vlad/vlad 63198 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srpt/ib_srpt.c -rw-r--r-- vlad/vlad 4650 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srpt/ib_srpt.h -rw-r--r-- vlad/vlad 74834 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srpt/scsi_tgt.h -rw-r--r-- vlad/vlad 8508 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srpt/scst_const.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/util/ -rw-r--r-- vlad/vlad 163 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/util/Kconfig -rw-r--r-- vlad/vlad 72 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/util/Makefile -rw-r--r-- vlad/vlad 16068 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/util/madeye.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/base/ -rw-r--r-- vlad/vlad 12299 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/base/attribute_container.c -rw-r--r-- vlad/vlad 9582 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/base/transport_class.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/ -rw-r--r-- vlad/vlad 168 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/Makefile -rw-r--r-- vlad/vlad 9694 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/adapter.h -rw-r--r-- vlad/vlad 7172 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/ael1002.c -rw-r--r-- vlad/vlad 24850 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/common.h -rw-r--r-- vlad/vlad 4741 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_ctl_defs.h -rw-r--r-- vlad/vlad 3489 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_defs.h -rw-r--r-- vlad/vlad 3799 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_ioctl.h -rw-r--r-- vlad/vlad 67785 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_main.c -rw-r--r-- vlad/vlad 34174 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_offload.c -rw-r--r-- vlad/vlad 6018 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_offload.h -rw-r--r-- vlad/vlad 5887 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/firmware_exports.h -rw-r--r-- vlad/vlad 12681 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/l2t.c -rw-r--r-- vlad/vlad 4851 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/l2t.h -rw-r--r-- vlad/vlad 13874 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/mc5.c -rw-r--r-- vlad/vlad 57153 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/regs.h -rw-r--r-- vlad/vlad 82893 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/sge.c -rw-r--r-- vlad/vlad 7942 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/sge_defs.h -rw-r--r-- vlad/vlad 34389 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/t3_cpl.h -rw-r--r-- vlad/vlad 107520 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/t3_hw.c -rw-r--r-- vlad/vlad 2496 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/t3cdev.h -rw-r--r-- vlad/vlad 1822 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/version.h -rw-r--r-- vlad/vlad 6723 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/vsc8211.c -rw-r--r-- vlad/vlad 19733 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/xgmac.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/ -rw-r--r-- vlad/vlad 162 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/Makefile -rw-r--r-- vlad/vlad 4873 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/alloc.c -rw-r--r-- vlad/vlad 4420 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/catas.c -rw-r--r-- vlad/vlad 11872 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/cmd.c -rw-r--r-- vlad/vlad 7024 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/cq.c -rw-r--r-- vlad/vlad 17231 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/eq.c -rw-r--r-- vlad/vlad 29620 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/fw.c -rw-r--r-- vlad/vlad 4436 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/fw.h -rw-r--r-- vlad/vlad 11317 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/icm.c -rw-r--r-- vlad/vlad 4606 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/icm.h -rw-r--r-- vlad/vlad 4320 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/intf.c -rw-r--r-- vlad/vlad 25226 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/main.c -rw-r--r-- vlad/vlad 9386 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/mcg.c -rw-r--r-- vlad/vlad 8893 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/mlx4.h -rw-r--r-- vlad/vlad 15069 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/mr.c -rw-r--r-- vlad/vlad 3058 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/pd.c -rw-r--r-- vlad/vlad 7702 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/profile.c -rw-r--r-- vlad/vlad 8785 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/qp.c -rw-r--r-- vlad/vlad 4996 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/reset.c -rw-r--r-- vlad/vlad 7035 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/srq.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:54 ofa_kernel-1.3/drivers/scsi/ -rw-r--r-- vlad/vlad 63387 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/scsi/iscsi_tcp.c -rw-r--r-- vlad/vlad 5183 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/scsi/iscsi_tcp.h -rw-r--r-- vlad/vlad 58063 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/scsi/libiscsi.c -rw-r--r-- vlad/vlad 43019 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/scsi/scsi_transport_iscsi.c lrwxrwxrwx vlad/vlad 0 2008-02-28 09:59:56 ofa_kernel-1.3/drivers/scsi/Makefile -> ../../ofed_scripts/iscsi_scsi_makefile drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ -rw-r--r-- vlad/vlad 26 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/Kbuild -rw-r--r-- vlad/vlad 4920 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_addr.h -rw-r--r-- vlad/vlad 4399 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_cache.h -rw-r--r-- vlad/vlad 18741 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_cm.h -rw-r--r-- vlad/vlad 3503 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_fmr_pool.h -rw-r--r-- vlad/vlad 22723 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_mad.h -rw-r--r-- vlad/vlad 2025 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_marshall.h -rw-r--r-- vlad/vlad 7794 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_pack.h -rw-r--r-- vlad/vlad 14310 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_sa.h -rw-r--r-- vlad/vlad 4519 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_smi.h -rw-r--r-- vlad/vlad 2664 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_umem.h -rw-r--r-- vlad/vlad 6564 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_user_cm.h -rw-r--r-- vlad/vlad 7190 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_user_mad.h -rw-r--r-- vlad/vlad 1894 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_user_sa.h -rw-r--r-- vlad/vlad 13677 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_user_verbs.h -rw-r--r-- vlad/vlad 54235 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_verbs.h -rw-r--r-- vlad/vlad 8777 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/iw_cm.h -rw-r--r-- vlad/vlad 10998 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/rdma_cm.h -rw-r--r-- vlad/vlad 1783 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/rdma_cm_ib.h -rw-r--r-- vlad/vlad 4772 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/rdma_user_cm.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/include/scsi/ -rw-r--r-- vlad/vlad 14941 2008-02-28 09:59:50 ofa_kernel-1.3/include/scsi/iscsi_proto.h -rw-r--r-- vlad/vlad 5621 2008-02-28 09:59:50 ofa_kernel-1.3/include/scsi/srp.h -rw-r--r-- vlad/vlad 10388 2008-02-28 09:59:53 ofa_kernel-1.3/include/scsi/iscsi_if.h -rw-r--r-- vlad/vlad 10226 2008-02-28 09:59:53 ofa_kernel-1.3/include/scsi/libiscsi.h -rw-r--r-- vlad/vlad 8718 2008-02-28 09:59:53 ofa_kernel-1.3/include/scsi/scsi_transport_iscsi.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/include/linux/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/include/linux/mlx4/ -rw-r--r-- vlad/vlad 5294 2008-02-28 09:59:53 ofa_kernel-1.3/include/linux/mlx4/cmd.h -rw-r--r-- vlad/vlad 3436 2008-02-28 09:59:53 ofa_kernel-1.3/include/linux/mlx4/cq.h -rw-r--r-- vlad/vlad 9523 2008-02-28 09:59:53 ofa_kernel-1.3/include/linux/mlx4/device.h -rw-r--r-- vlad/vlad 2894 2008-02-28 09:59:53 ofa_kernel-1.3/include/linux/mlx4/doorbell.h -rw-r--r-- vlad/vlad 2082 2008-02-28 09:59:53 ofa_kernel-1.3/include/linux/mlx4/driver.h -rw-r--r-- vlad/vlad 6662 2008-02-28 09:59:53 ofa_kernel-1.3/include/linux/mlx4/qp.h -rw-r--r-- vlad/vlad 1560 2008-02-28 09:59:53 ofa_kernel-1.3/include/linux/mlx4/srq.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/asm-generic/ -rw-r--r-- vlad/vlad 1430 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/asm-generic/bug.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/asm/ -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/asm/atomic.h -rw-r--r-- vlad/vlad 199 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/asm/dma-mapping.h -rw-r--r-- vlad/vlad 4244 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/asm/msr.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/ -rw-r--r-- vlad/vlad 409 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/bitops.h -rw-r--r-- vlad/vlad 335 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/cache.h -rw-r--r-- vlad/vlad 294 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/compiler.h -rw-r--r-- vlad/vlad 394 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/cpumask.h -rw-r--r-- vlad/vlad 2517 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/debugfs.h -rw-r--r-- vlad/vlad 3801 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/device.h -rw-r--r-- vlad/vlad 376 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 185 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/err.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 201 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/ethtool.h -rw-r--r-- vlad/vlad 648 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/fs.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/genalloc.h -rw-r--r-- vlad/vlad 123 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/hardirq.h -rw-r--r-- vlad/vlad 961 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/idr.h -rw-r--r-- vlad/vlad 1145 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/if_infiniband.h -rw-r--r-- vlad/vlad 575 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/inetdevice.h -rw-r--r-- vlad/vlad 534 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/interrupt.h -rw-r--r-- vlad/vlad 20 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/io.h -rw-r--r-- vlad/vlad 267 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/ioctl32.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/ip.h -rw-r--r-- vlad/vlad 312 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/kernel.h -rw-r--r-- vlad/vlad 4171 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/kfifo.h -rw-r--r-- vlad/vlad 10211 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/lockdep.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/log2.h -rw-r--r-- vlad/vlad 690 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/mii.h -rw-r--r-- vlad/vlad 872 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/mm.h -rw-r--r-- vlad/vlad 159 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/module.h -rw-r--r-- vlad/vlad 718 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/mutex.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/net.h -rw-r--r-- vlad/vlad 845 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/netdevice.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/notifier.h -rw-r--r-- vlad/vlad 2251 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/pci.h -rw-r--r-- vlad/vlad 509 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/pci_ids.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/random.h -rw-r--r-- vlad/vlad 183 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/rtnetlink.h -rw-r--r-- vlad/vlad 175 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/rwsem.h -rw-r--r-- vlad/vlad 863 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 1263 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/sched.h -rw-r--r-- vlad/vlad 146 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/signal.h -rw-r--r-- vlad/vlad 3089 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/skbuff.h -rw-r--r-- vlad/vlad 1386 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/slab.h -rw-r--r-- vlad/vlad 642 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/spinlock.h -rw-r--r-- vlad/vlad 527 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/sysfs.h -rw-r--r-- vlad/vlad 240 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/tcp.h -rw-r--r-- vlad/vlad 596 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/timer.h -rw-r--r-- vlad/vlad 366 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/utsname.h -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/vmalloc.h -rw-r--r-- vlad/vlad 1679 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/net/ -rw-r--r-- vlad/vlad 298 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/net/dst.h -rw-r--r-- vlad/vlad 405 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/net/inet_hashtables.h -rw-r--r-- vlad/vlad 79 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/net/inet_sock.h -rw-r--r-- vlad/vlad 192 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/net/neighbour.h -rw-r--r-- vlad/vlad 784 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/net/netevent.h -rw-r--r-- vlad/vlad 9187 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/net/sock.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/scsi/ -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/scsi/scsi_cmnd.h -rw-r--r-- vlad/vlad 13 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/scsi/scsi_dbg.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/src/ -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/src/genalloc.c -rw-r--r-- vlad/vlad 5845 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/src/ib_idr.c -rw-r--r-- vlad/vlad 3349 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/src/netevent.c -rw-r--r-- vlad/vlad 10764 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/src/stream.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/asm-generic/ -rw-r--r-- vlad/vlad 1430 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/asm-generic/bug.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/asm/ -rw-r--r-- vlad/vlad 581 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/asm/atomic.h -rw-r--r-- vlad/vlad 590 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/asm/bitops.h -rw-r--r-- vlad/vlad 153 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/asm/io.h -rw-r--r-- vlad/vlad 4244 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/asm/msr.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/ -rw-r--r-- vlad/vlad 409 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/bitops.h -rw-r--r-- vlad/vlad 335 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/cache.h -rw-r--r-- vlad/vlad 294 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/compiler.h -rw-r--r-- vlad/vlad 194 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/cpumask.h -rw-r--r-- vlad/vlad 2517 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/debugfs.h -rw-r--r-- vlad/vlad 3801 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/device.h -rw-r--r-- vlad/vlad 327 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 185 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/err.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 201 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/ethtool.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/fs.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/genalloc.h -rw-r--r-- vlad/vlad 210 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/idr.h -rw-r--r-- vlad/vlad 1145 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/if_infiniband.h -rw-r--r-- vlad/vlad 575 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/inetdevice.h -rw-r--r-- vlad/vlad 534 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/interrupt.h -rw-r--r-- vlad/vlad 20 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/io.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/ip.h -rw-r--r-- vlad/vlad 312 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/jiffies.h -rw-r--r-- vlad/vlad 312 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/kernel.h -rw-r--r-- vlad/vlad 4171 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/kfifo.h -rw-r--r-- vlad/vlad 10211 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/lockdep.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/log2.h -rw-r--r-- vlad/vlad 574 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/mii.h -rw-r--r-- vlad/vlad 872 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/mm.h -rw-r--r-- vlad/vlad 718 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/mutex.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/net.h -rw-r--r-- vlad/vlad 817 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/netdevice.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/notifier.h -rw-r--r-- vlad/vlad 1704 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/pci.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/random.h -rw-r--r-- vlad/vlad 424 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/rbtree.h -rw-r--r-- vlad/vlad 183 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/rtnetlink.h -rw-r--r-- vlad/vlad 175 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/rwsem.h -rw-r--r-- vlad/vlad 1027 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 759 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/sched.h -rw-r--r-- vlad/vlad 146 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/signal.h -rw-r--r-- vlad/vlad 2990 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/skbuff.h -rw-r--r-- vlad/vlad 1566 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/slab.h -rw-r--r-- vlad/vlad 349 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/spinlock.h -rw-r--r-- vlad/vlad 240 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/tcp.h -rw-r--r-- vlad/vlad 596 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/timer.h -rw-r--r-- vlad/vlad 366 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/types.h -rw-r--r-- vlad/vlad 155 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/types.h.orig -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/utsname.h -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/vmalloc.h -rw-r--r-- vlad/vlad 1679 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/net/ -rw-r--r-- vlad/vlad 272 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/net/dst.h -rw-r--r-- vlad/vlad 405 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/net/inet_hashtables.h -rw-r--r-- vlad/vlad 79 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/net/inet_sock.h -rw-r--r-- vlad/vlad 905 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/net/neighbour.h -rw-r--r-- vlad/vlad 784 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/net/netevent.h -rw-r--r-- vlad/vlad 3494 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/net/sock.h -rw-r--r-- vlad/vlad 80 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/net/tcp_states.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/scsi/ -rw-r--r-- vlad/vlad 124 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/scsi/scsi.h -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/scsi/scsi_cmnd.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/src/ -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/src/genalloc.c -rw-r--r-- vlad/vlad 3323 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/src/netevent.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/asm-generic/ -rw-r--r-- vlad/vlad 1430 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/asm-generic/bug.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/asm/ -rw-r--r-- vlad/vlad 581 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/asm/atomic.h -rw-r--r-- vlad/vlad 590 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/asm/bitops.h -rw-r--r-- vlad/vlad 4244 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/asm/msr.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/ -rw-r--r-- vlad/vlad 2595 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/attribute_container.h -rw-r--r-- vlad/vlad 409 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/bitops.h -rw-r--r-- vlad/vlad 335 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/cache.h -rw-r--r-- vlad/vlad 294 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/compiler.h -rw-r--r-- vlad/vlad 194 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/cpumask.h -rw-r--r-- vlad/vlad 1274 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/crypto.h -rw-r--r-- vlad/vlad 2517 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/debugfs.h -rw-r--r-- vlad/vlad 3801 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/device.h -rw-r--r-- vlad/vlad 327 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 185 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/err.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 414 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/ethtool.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/fs.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/genalloc.h -rw-r--r-- vlad/vlad 210 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/idr.h -rw-r--r-- vlad/vlad 215 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/if_ether.h -rw-r--r-- vlad/vlad 1145 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/if_infiniband.h -rw-r--r-- vlad/vlad 412 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/if_vlan.h -rw-r--r-- vlad/vlad 575 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/inetdevice.h -rw-r--r-- vlad/vlad 534 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/interrupt.h -rw-r--r-- vlad/vlad 20 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/io.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/ip.h -rw-r--r-- vlad/vlad 312 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/jiffies.h -rw-r--r-- vlad/vlad 406 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/kernel.h -rw-r--r-- vlad/vlad 4194 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/kfifo.h -rw-r--r-- vlad/vlad 1473 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/klist.h -rw-r--r-- vlad/vlad 546 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/kref.h -rw-r--r-- vlad/vlad 10211 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/lockdep.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/log2.h -rw-r--r-- vlad/vlad 872 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/mm.h -rw-r--r-- vlad/vlad 719 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/moduleparam.h -rw-r--r-- vlad/vlad 718 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/mutex.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/net.h -rw-r--r-- vlad/vlad 844 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/netdevice.h -rw-r--r-- vlad/vlad 476 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/netlink.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/notifier.h -rw-r--r-- vlad/vlad 1704 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/pci.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/random.h -rw-r--r-- vlad/vlad 424 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/rbtree.h -rw-r--r-- vlad/vlad 183 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/rtnetlink.h -rw-r--r-- vlad/vlad 175 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/rwsem.h -rw-r--r-- vlad/vlad 1027 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 146 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/signal.h -rw-r--r-- vlad/vlad 2990 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/skbuff.h -rw-r--r-- vlad/vlad 1208 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/slab.h -rw-r--r-- vlad/vlad 400 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/spinlock.h -rw-r--r-- vlad/vlad 349 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/tcp.h -rw-r--r-- vlad/vlad 596 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/timer.h -rw-r--r-- vlad/vlad 2537 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/transport_class.h -rw-r--r-- vlad/vlad 312 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/utsname.h -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/vmalloc.h -rw-r--r-- vlad/vlad 1679 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/net/ -rw-r--r-- vlad/vlad 272 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/net/dst.h -rw-r--r-- vlad/vlad 405 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/net/inet_hashtables.h -rw-r--r-- vlad/vlad 79 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/net/inet_sock.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/net/neighbour.h -rw-r--r-- vlad/vlad 784 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/net/netevent.h -rw-r--r-- vlad/vlad 3494 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/net/sock.h -rw-r--r-- vlad/vlad 80 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/net/tcp_states.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/scsi/ -rw-r--r-- vlad/vlad 124 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/scsi/scsi.h -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/scsi/scsi_cmnd.h -rw-r--r-- vlad/vlad 578 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/scsi/scsi_device.h -rw-r--r-- vlad/vlad 170 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/scsi/scsi_host.h -rw-r--r-- vlad/vlad 201 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/scsi/scsi_transport.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/src/ -rw-r--r-- vlad/vlad 43 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/src/base.h -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/src/genalloc.c -rw-r--r-- vlad/vlad 292 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/src/init.c -rw-r--r-- vlad/vlad 3323 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/src/netevent.c -rw-r--r-- vlad/vlad 1422 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/src/scsi.c -rw-r--r-- vlad/vlad 4579 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/src/scsi_lib.c -rw-r--r-- vlad/vlad 1445 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/src/scsi_scan.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/asm-generic/ -rw-r--r-- vlad/vlad 1430 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/asm-generic/bug.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/asm/ -rw-r--r-- vlad/vlad 179 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/asm/atomic.h -rw-r--r-- vlad/vlad 590 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/asm/bitops.h -rw-r--r-- vlad/vlad 4244 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/asm/msr.h -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/ -rw-r--r-- vlad/vlad 409 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/bitops.h -rw-r--r-- vlad/vlad 335 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/cache.h -rw-r--r-- vlad/vlad 294 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/compiler.h -rw-r--r-- vlad/vlad 194 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/cpumask.h -rw-r--r-- vlad/vlad 3588 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/device.h -rw-r--r-- vlad/vlad 327 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 185 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/err.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 201 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/ethtool.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/fs.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/genalloc.h -rw-r--r-- vlad/vlad 210 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/idr.h -rw-r--r-- vlad/vlad 575 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/inetdevice.h -rw-r--r-- vlad/vlad 534 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/interrupt.h -rw-r--r-- vlad/vlad 20 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/io.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/ip.h -rw-r--r-- vlad/vlad 440 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/kernel.h -rw-r--r-- vlad/vlad 10211 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/lockdep.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/log2.h -rw-r--r-- vlad/vlad 482 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/mm.h -rw-r--r-- vlad/vlad 718 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/mutex.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/net.h -rw-r--r-- vlad/vlad 816 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/netdevice.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/notifier.h -rw-r--r-- vlad/vlad 1370 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/pci.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/random.h -rw-r--r-- vlad/vlad 424 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/rbtree.h -rw-r--r-- vlad/vlad 183 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/rtnetlink.h -rw-r--r-- vlad/vlad 175 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/rwsem.h -rw-r--r-- vlad/vlad 928 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 146 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/signal.h -rw-r--r-- vlad/vlad 3274 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/skbuff.h -rw-r--r-- vlad/vlad 1533 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/slab.h -rw-r--r-- vlad/vlad 167 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/spinlock.h -rw-r--r-- vlad/vlad 240 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/tcp.h -rw-r--r-- vlad/vlad 596 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/timer.h -rw-r--r-- vlad/vlad 366 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/utsname.h -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/vmalloc.h -rw-r--r-- vlad/vlad 1679 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/net/ -rw-r--r-- vlad/vlad 272 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/net/dst.h -rw-r--r-- vlad/vlad 405 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/net/inet_hashtables.h -rw-r--r-- vlad/vlad 79 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/net/inet_sock.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/net/neighbour.h -rw-r--r-- vlad/vlad 784 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/net/netevent.h -rw-r--r-- vlad/vlad 3494 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/net/sock.h -rw-r--r-- vlad/vlad 80 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/net/tcp_states.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/scsi/ -rw-r--r-- vlad/vlad 124 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/scsi/scsi.h -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/scsi/scsi_cmnd.h -rw-r--r-- vlad/vlad 216 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/scsi/scsi_host.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/src/ -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/src/genalloc.c -rw-r--r-- vlad/vlad 3323 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/src/netevent.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/asm-generic/ -rw-r--r-- vlad/vlad 1430 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/asm-generic/bug.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/asm/ -rw-r--r-- vlad/vlad 179 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/asm/atomic.h -rw-r--r-- vlad/vlad 590 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/asm/bitops.h -rw-r--r-- vlad/vlad 4244 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/asm/msr.h -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/ -rw-r--r-- vlad/vlad 409 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/bitops.h -rw-r--r-- vlad/vlad 335 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/cache.h -rw-r--r-- vlad/vlad 294 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/compiler.h -rw-r--r-- vlad/vlad 194 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/cpumask.h -rw-r--r-- vlad/vlad 3582 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/device.h -rw-r--r-- vlad/vlad 327 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 185 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/err.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/fs.h -rw-r--r-- vlad/vlad 210 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/idr.h -rw-r--r-- vlad/vlad 575 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/inetdevice.h -rw-r--r-- vlad/vlad 20 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/io.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/ip.h -rw-r--r-- vlad/vlad 440 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/kernel.h -rw-r--r-- vlad/vlad 10211 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/lockdep.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/log2.h -rw-r--r-- vlad/vlad 718 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/mutex.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/net.h -rw-r--r-- vlad/vlad 558 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/netdevice.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/notifier.h -rw-r--r-- vlad/vlad 1370 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/pci.h -rw-r--r-- vlad/vlad 424 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/rbtree.h -rw-r--r-- vlad/vlad 183 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/rtnetlink.h -rw-r--r-- vlad/vlad 175 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/rwsem.h -rw-r--r-- vlad/vlad 928 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 146 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/signal.h -rw-r--r-- vlad/vlad 2791 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/skbuff.h -rw-r--r-- vlad/vlad 1547 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/slab.h -rw-r--r-- vlad/vlad 167 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/spinlock.h -rw-r--r-- vlad/vlad 240 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/tcp.h -rw-r--r-- vlad/vlad 596 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/timer.h -rw-r--r-- vlad/vlad 366 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/utsname.h -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/vmalloc.h -rw-r--r-- vlad/vlad 1486 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/net/ -rw-r--r-- vlad/vlad 405 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/net/inet_hashtables.h -rw-r--r-- vlad/vlad 79 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/net/inet_sock.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/net/neighbour.h -rw-r--r-- vlad/vlad 2418 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/net/sock.h -rw-r--r-- vlad/vlad 80 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/net/tcp_states.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/scsi/ -rw-r--r-- vlad/vlad 124 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/scsi/scsi.h -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/scsi/scsi_cmnd.h -rw-r--r-- vlad/vlad 216 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/scsi/scsi_host.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm-generic/ -rw-r--r-- vlad/vlad 1430 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm-generic/bug.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm-x86_64/ -rw-r--r-- vlad/vlad 634 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm-x86_64/dma-mapping.h -rw-r--r-- vlad/vlad 377 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm-x86_64/swiotlb.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm/ -rw-r--r-- vlad/vlad 179 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm/atomic.h -rw-r--r-- vlad/vlad 192 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm/dma-mapping.h -rw-r--r-- vlad/vlad 4244 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm/msr.h -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm/scatterlist.h -rw-r--r-- vlad/vlad 176 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm/swiotlb.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/ -rw-r--r-- vlad/vlad 409 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/bitops.h -rw-r--r-- vlad/vlad 335 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/cache.h -rw-r--r-- vlad/vlad 166 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/compiler.h -rw-r--r-- vlad/vlad 194 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/cpumask.h -rw-r--r-- vlad/vlad 4273 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/device.h -rw-r--r-- vlad/vlad 327 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 358 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 201 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/ethtool.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/fs.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/genalloc.h -rw-r--r-- vlad/vlad 210 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/idr.h -rw-r--r-- vlad/vlad 215 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/if_ether.h -rw-r--r-- vlad/vlad 412 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/if_vlan.h -rw-r--r-- vlad/vlad 575 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/inetdevice.h -rw-r--r-- vlad/vlad 586 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/interrupt.h -rw-r--r-- vlad/vlad 20 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/io.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/ip.h -rw-r--r-- vlad/vlad 440 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/kernel.h -rw-r--r-- vlad/vlad 10211 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/lockdep.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/log2.h -rw-r--r-- vlad/vlad 475 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/mm.h -rw-r--r-- vlad/vlad 718 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/mutex.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/net.h -rw-r--r-- vlad/vlad 840 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/netdevice.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/notifier.h -rw-r--r-- vlad/vlad 1370 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/pci.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/random.h -rw-r--r-- vlad/vlad 424 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/rbtree.h -rw-r--r-- vlad/vlad 183 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/rtnetlink.h -rw-r--r-- vlad/vlad 175 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/rwsem.h -rw-r--r-- vlad/vlad 928 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 146 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/signal.h -rw-r--r-- vlad/vlad 2954 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/skbuff.h -rw-r--r-- vlad/vlad 1553 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/slab.h -rw-r--r-- vlad/vlad 167 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/spinlock.h -rw-r--r-- vlad/vlad 349 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/tcp.h -rw-r--r-- vlad/vlad 596 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/timer.h -rw-r--r-- vlad/vlad 366 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/utsname.h -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/vmalloc.h -rw-r--r-- vlad/vlad 1748 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/net/ -rw-r--r-- vlad/vlad 405 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/net/inet_hashtables.h -rw-r--r-- vlad/vlad 79 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/net/inet_sock.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/net/neighbour.h -rw-r--r-- vlad/vlad 784 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/net/netevent.h -rw-r--r-- vlad/vlad 2418 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/net/sock.h -rw-r--r-- vlad/vlad 80 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/net/tcp_states.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/scsi/ -rw-r--r-- vlad/vlad 124 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/scsi/scsi.h -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/scsi/scsi_cmnd.h -rw-r--r-- vlad/vlad 216 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/scsi/scsi_host.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/src/ -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/src/genalloc.c -rw-r--r-- vlad/vlad 3323 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/src/netevent.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/asm-generic/ -rw-r--r-- vlad/vlad 1430 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/asm-generic/bug.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/asm/ -rw-r--r-- vlad/vlad 179 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/asm/atomic.h -rw-r--r-- vlad/vlad 4244 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/asm/msr.h -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/ -rw-r--r-- vlad/vlad 409 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/bitops.h -rw-r--r-- vlad/vlad 166 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/compiler.h -rw-r--r-- vlad/vlad 194 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/cpumask.h -rw-r--r-- vlad/vlad 295 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/device.h -rw-r--r-- vlad/vlad 327 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 201 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/ethtool.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/fs.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/genalloc.h -rw-r--r-- vlad/vlad 210 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/idr.h -rw-r--r-- vlad/vlad 521 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/inetdevice.h -rw-r--r-- vlad/vlad 586 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/interrupt.h -rw-r--r-- vlad/vlad 20 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/io.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/ip.h -rw-r--r-- vlad/vlad 440 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/kernel.h -rw-r--r-- vlad/vlad 10211 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/lockdep.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/log2.h -rw-r--r-- vlad/vlad 475 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/mm.h -rw-r--r-- vlad/vlad 718 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/mutex.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/net.h -rw-r--r-- vlad/vlad 816 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/netdevice.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/notifier.h -rw-r--r-- vlad/vlad 1370 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/pci.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/random.h -rw-r--r-- vlad/vlad 424 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/rbtree.h -rw-r--r-- vlad/vlad 183 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/rtnetlink.h -rw-r--r-- vlad/vlad 175 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/rwsem.h -rw-r--r-- vlad/vlad 928 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 146 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/signal.h -rw-r--r-- vlad/vlad 2961 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/skbuff.h -rw-r--r-- vlad/vlad 1373 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/slab.h -rw-r--r-- vlad/vlad 167 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/spinlock.h -rw-r--r-- vlad/vlad 240 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/tcp.h -rw-r--r-- vlad/vlad 596 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/timer.h -rw-r--r-- vlad/vlad 366 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/utsname.h -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/vmalloc.h -rw-r--r-- vlad/vlad 1748 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/net/ -rw-r--r-- vlad/vlad 405 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/net/inet_hashtables.h -rw-r--r-- vlad/vlad 79 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/net/inet_sock.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/net/neighbour.h -rw-r--r-- vlad/vlad 784 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/net/netevent.h -rw-r--r-- vlad/vlad 2418 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/net/sock.h -rw-r--r-- vlad/vlad 80 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/net/tcp_states.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/scsi/ -rw-r--r-- vlad/vlad 124 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/scsi/scsi.h -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/scsi/scsi_cmnd.h -rw-r--r-- vlad/vlad 216 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/scsi/scsi_host.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/src/ -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/src/genalloc.c -rw-r--r-- vlad/vlad 3323 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/src/netevent.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/asm-generic/ -rw-r--r-- vlad/vlad 1430 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/asm-generic/bug.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/asm/ -rw-r--r-- vlad/vlad 179 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/asm/atomic.h -rw-r--r-- vlad/vlad 4244 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/asm/msr.h -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/ -rw-r--r-- vlad/vlad 409 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/bitops.h -rw-r--r-- vlad/vlad 166 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/compiler.h -rw-r--r-- vlad/vlad 194 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/cpumask.h -rw-r--r-- vlad/vlad 295 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/device.h -rw-r--r-- vlad/vlad 327 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/fs.h -rw-r--r-- vlad/vlad 521 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/inetdevice.h -rw-r--r-- vlad/vlad 20 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/io.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/ip.h -rw-r--r-- vlad/vlad 440 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/kernel.h -rw-r--r-- vlad/vlad 10211 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/lockdep.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/log2.h -rw-r--r-- vlad/vlad 718 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/mutex.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/net.h -rw-r--r-- vlad/vlad 558 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/netdevice.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/notifier.h -rw-r--r-- vlad/vlad 1370 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/pci.h -rw-r--r-- vlad/vlad 424 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/rbtree.h -rw-r--r-- vlad/vlad 183 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/rtnetlink.h -rw-r--r-- vlad/vlad 175 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/rwsem.h -rw-r--r-- vlad/vlad 928 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 146 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/signal.h -rw-r--r-- vlad/vlad 2791 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/skbuff.h -rw-r--r-- vlad/vlad 1367 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/slab.h -rw-r--r-- vlad/vlad 167 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/spinlock.h -rw-r--r-- vlad/vlad 240 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/tcp.h -rw-r--r-- vlad/vlad 596 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/timer.h -rw-r--r-- vlad/vlad 366 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/utsname.h -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/vmalloc.h -rw-r--r-- vlad/vlad 1486 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/net/ -rw-r--r-- vlad/vlad 405 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/net/inet_hashtables.h -rw-r--r-- vlad/vlad 79 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/net/inet_sock.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/net/neighbour.h -rw-r--r-- vlad/vlad 2418 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/net/sock.h -rw-r--r-- vlad/vlad 80 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/net/tcp_states.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/scsi/ -rw-r--r-- vlad/vlad 124 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/scsi/scsi.h -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/scsi/scsi_cmnd.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.14/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/asm-generic/ -rw-r--r-- vlad/vlad 1430 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/asm-generic/bug.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/asm/ -rw-r--r-- vlad/vlad 179 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/asm/atomic.h -rw-r--r-- vlad/vlad 4315 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/asm/msr.h -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/ -rw-r--r-- vlad/vlad 409 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/bitops.h -rw-r--r-- vlad/vlad 166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/compiler.h -rw-r--r-- vlad/vlad 194 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/cpumask.h -rw-r--r-- vlad/vlad 295 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/device.h -rw-r--r-- vlad/vlad 327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/fs.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/genalloc.h -rw-r--r-- vlad/vlad 521 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/inetdevice.h -rw-r--r-- vlad/vlad 586 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/interrupt.h -rw-r--r-- vlad/vlad 20 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/io.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/ip.h -rw-r--r-- vlad/vlad 440 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/kernel.h -rw-r--r-- vlad/vlad 10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/lockdep.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/log2.h -rw-r--r-- vlad/vlad 475 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/mm.h -rw-r--r-- vlad/vlad 718 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/mutex.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/net.h -rw-r--r-- vlad/vlad 816 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/netdevice.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/notifier.h -rw-r--r-- vlad/vlad 1307 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/pci.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/pci_regs.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/random.h -rw-r--r-- vlad/vlad 424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/rbtree.h -rw-r--r-- vlad/vlad 183 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/rtnetlink.h -rw-r--r-- vlad/vlad 175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/rwsem.h -rw-r--r-- vlad/vlad 928 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/signal.h -rw-r--r-- vlad/vlad 2961 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/skbuff.h -rw-r--r-- vlad/vlad 1113 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/slab.h -rw-r--r-- vlad/vlad 167 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/spinlock.h -rw-r--r-- vlad/vlad 240 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/tcp.h -rw-r--r-- vlad/vlad 596 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/timer.h -rw-r--r-- vlad/vlad 328 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/utsname.h -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/vmalloc.h -rw-r--r-- vlad/vlad 1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/net/ -rw-r--r-- vlad/vlad 79 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/net/inet_sock.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/net/neighbour.h -rw-r--r-- vlad/vlad 784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/net/netevent.h -rw-r--r-- vlad/vlad 150 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/net/sock.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/scsi/ -rw-r--r-- vlad/vlad 124 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/scsi/scsi.h -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/scsi/scsi_cmnd.h -rw-r--r-- vlad/vlad 216 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/scsi/scsi_host.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/src/ -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/src/genalloc.c -rw-r--r-- vlad/vlad 3348 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/src/netevent.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/asm-generic/ -rw-r--r-- vlad/vlad 1430 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/asm-generic/bug.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/asm/ -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/ -rw-r--r-- vlad/vlad 409 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/bitops.h -rw-r--r-- vlad/vlad 166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/compiler.h -rw-r--r-- vlad/vlad 194 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/cpumask.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/device.h -rw-r--r-- vlad/vlad 327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/fs.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/genalloc.h -rw-r--r-- vlad/vlad 521 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/inetdevice.h -rw-r--r-- vlad/vlad 586 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/interrupt.h -rw-r--r-- vlad/vlad 20 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/io.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/ip.h -rw-r--r-- vlad/vlad 440 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/kernel.h -rw-r--r-- vlad/vlad 10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/lockdep.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/log2.h -rw-r--r-- vlad/vlad 475 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/mm.h -rw-r--r-- vlad/vlad 718 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/mutex.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/net.h -rw-r--r-- vlad/vlad 809 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/netdevice.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/notifier.h -rw-r--r-- vlad/vlad 624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/pci.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/pci_regs.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/random.h -rw-r--r-- vlad/vlad 424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/rbtree.h -rw-r--r-- vlad/vlad 183 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/rtnetlink.h -rw-r--r-- vlad/vlad 175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/rwsem.h -rw-r--r-- vlad/vlad 610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/signal.h -rw-r--r-- vlad/vlad 2634 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/skbuff.h -rw-r--r-- vlad/vlad 1091 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/slab.h -rw-r--r-- vlad/vlad 167 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/spinlock.h -rw-r--r-- vlad/vlad 240 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/tcp.h -rw-r--r-- vlad/vlad 321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/timer.h -rw-r--r-- vlad/vlad 308 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/utsname.h -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/vmalloc.h -rw-r--r-- vlad/vlad 1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/net/ -rw-r--r-- vlad/vlad 79 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/net/inet_sock.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/net/neighbour.h -rw-r--r-- vlad/vlad 784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/net/netevent.h -rw-r--r-- vlad/vlad 150 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/net/sock.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/scsi/ -rw-r--r-- vlad/vlad 124 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/scsi/scsi.h -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/scsi/scsi_cmnd.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/src/ -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/src/genalloc.c -rw-r--r-- vlad/vlad 3348 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/src/netevent.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/asm-generic/ -rw-r--r-- vlad/vlad 1430 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/asm-generic/bug.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/asm/ -rw-r--r-- vlad/vlad 432 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/asm/bitops.h -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/ -rw-r--r-- vlad/vlad 409 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/bitops.h -rw-r--r-- vlad/vlad 166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/compiler.h -rw-r--r-- vlad/vlad 194 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/cpumask.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/device.h -rw-r--r-- vlad/vlad 327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/fs.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/genalloc.h -rw-r--r-- vlad/vlad 521 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/inetdevice.h -rw-r--r-- vlad/vlad 586 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/interrupt.h -rw-r--r-- vlad/vlad 20 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/io.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/ip.h -rw-r--r-- vlad/vlad 440 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/kernel.h -rw-r--r-- vlad/vlad 10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/lockdep.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/log2.h -rw-r--r-- vlad/vlad 475 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/mm.h -rw-r--r-- vlad/vlad 718 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/mutex.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/net.h -rw-r--r-- vlad/vlad 674 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/netdevice.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/notifier.h -rw-r--r-- vlad/vlad 624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/pci.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/pci_regs.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/random.h -rw-r--r-- vlad/vlad 424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/rbtree.h -rw-r--r-- vlad/vlad 183 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/rtnetlink.h -rw-r--r-- vlad/vlad 175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/rwsem.h -rw-r--r-- vlad/vlad 610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/signal.h -rw-r--r-- vlad/vlad 2790 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/skbuff.h -rw-r--r-- vlad/vlad 1091 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/slab.h -rw-r--r-- vlad/vlad 167 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/spinlock.h -rw-r--r-- vlad/vlad 240 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/tcp.h -rw-r--r-- vlad/vlad 321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/timer.h -rw-r--r-- vlad/vlad 308 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/utsname.h -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/vmalloc.h -rw-r--r-- vlad/vlad 1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/net/ -rw-r--r-- vlad/vlad 79 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/net/inet_sock.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/net/neighbour.h -rw-r--r-- vlad/vlad 784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/net/netevent.h -rw-r--r-- vlad/vlad 150 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/net/sock.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/scsi/ -rw-r--r-- vlad/vlad 124 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/scsi/scsi.h -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/scsi/scsi_cmnd.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/src/ -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/src/genalloc.c -rw-r--r-- vlad/vlad 3348 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/src/netevent.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/asm-generic/ -rw-r--r-- vlad/vlad 1430 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/asm-generic/bug.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/asm/ -rw-r--r-- vlad/vlad 4322 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/asm/hvcall.h -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/ -rw-r--r-- vlad/vlad 223 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/bitops.h -rw-r--r-- vlad/vlad 166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/compiler.h -rw-r--r-- vlad/vlad 247 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/cpu.h -rw-r--r-- vlad/vlad 194 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/cpumask.h -rw-r--r-- vlad/vlad 327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 205 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/fs.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/genalloc.h -rw-r--r-- vlad/vlad 215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/if_ether.h -rw-r--r-- vlad/vlad 412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/if_vlan.h -rw-r--r-- vlad/vlad 521 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/inetdevice.h -rw-r--r-- vlad/vlad 586 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/interrupt.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/ip.h -rw-r--r-- vlad/vlad 440 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/kernel.h -rw-r--r-- vlad/vlad 10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/lockdep.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/log2.h -rw-r--r-- vlad/vlad 186 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/mutex.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/net.h -rw-r--r-- vlad/vlad 698 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/netdevice.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/notifier.h -rw-r--r-- vlad/vlad 624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/pci.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/pci_regs.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/random.h -rw-r--r-- vlad/vlad 424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/rbtree.h -rw-r--r-- vlad/vlad 183 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/rtnetlink.h -rw-r--r-- vlad/vlad 175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/rwsem.h -rw-r--r-- vlad/vlad 610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/signal.h -rw-r--r-- vlad/vlad 2627 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/skbuff.h -rw-r--r-- vlad/vlad 1091 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/slab.h -rw-r--r-- vlad/vlad 167 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/spinlock.h -rw-r--r-- vlad/vlad 349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/tcp.h -rw-r--r-- vlad/vlad 321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/timer.h -rw-r--r-- vlad/vlad 161 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/utsname.h -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/vmalloc.h -rw-r--r-- vlad/vlad 1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/net/ -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/net/neighbour.h -rw-r--r-- vlad/vlad 784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/net/netevent.h -rw-r--r-- vlad/vlad 150 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/net/sock.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/scsi/ -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/scsi/scsi_cmnd.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/src/ -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/src/genalloc.c -rw-r--r-- vlad/vlad 3348 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/src/netevent.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/asm-generic/ -rw-r--r-- vlad/vlad 1430 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/asm-generic/bug.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/asm/ -rw-r--r-- vlad/vlad 4322 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/asm/hvcall.h -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/ -rw-r--r-- vlad/vlad 223 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/bitops.h -rw-r--r-- vlad/vlad 166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/compiler.h -rw-r--r-- vlad/vlad 247 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/cpu.h -rw-r--r-- vlad/vlad 1274 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/crypto.h -rw-r--r-- vlad/vlad 327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 205 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/fs.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/genalloc.h -rw-r--r-- vlad/vlad 215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/if_ether.h -rw-r--r-- vlad/vlad 412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/if_vlan.h -rw-r--r-- vlad/vlad 586 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/interrupt.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/ip.h -rw-r--r-- vlad/vlad 440 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/kernel.h -rw-r--r-- vlad/vlad 10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/lockdep.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/log2.h -rw-r--r-- vlad/vlad 186 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/mutex.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/net.h -rw-r--r-- vlad/vlad 701 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/netdevice.h -rw-r--r-- vlad/vlad 344 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/netlink.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/notifier.h -rw-r--r-- vlad/vlad 56 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/parser.h -rw-r--r-- vlad/vlad 624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/pci.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/pci_regs.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/random.h -rw-r--r-- vlad/vlad 424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/rbtree.h -rw-r--r-- vlad/vlad 183 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/rtnetlink.h -rw-r--r-- vlad/vlad 175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/rwsem.h -rw-r--r-- vlad/vlad 736 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/signal.h -rw-r--r-- vlad/vlad 2634 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/skbuff.h -rw-r--r-- vlad/vlad 1091 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/slab.h -rw-r--r-- vlad/vlad 167 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/spinlock.h -rw-r--r-- vlad/vlad 349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/tcp.h -rw-r--r-- vlad/vlad 321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/timer.h -rw-r--r-- vlad/vlad 161 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/utsname.h -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/vmalloc.h -rw-r--r-- vlad/vlad 1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/net/ -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/net/neighbour.h -rw-r--r-- vlad/vlad 784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/net/netevent.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/scsi/ -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/scsi/scsi_cmnd.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/src/ -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/src/genalloc.c -rw-r--r-- vlad/vlad 3348 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/src/netevent.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/asm-generic/ -rw-r--r-- vlad/vlad 1430 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/asm-generic/bug.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/asm/ -rw-r--r-- vlad/vlad 4155 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/asm/hvcall.h -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/ -rw-r--r-- vlad/vlad 223 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/bitops.h -rw-r--r-- vlad/vlad 166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/compiler.h -rw-r--r-- vlad/vlad 247 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/cpu.h -rw-r--r-- vlad/vlad 1274 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/crypto.h -rw-r--r-- vlad/vlad 327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 205 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/fs.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/genalloc.h -rw-r--r-- vlad/vlad 215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/if_ether.h -rw-r--r-- vlad/vlad 412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/if_vlan.h -rw-r--r-- vlad/vlad 586 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/interrupt.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/ip.h -rw-r--r-- vlad/vlad 440 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/kernel.h -rw-r--r-- vlad/vlad 10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/lockdep.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/log2.h -rw-r--r-- vlad/vlad 186 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/mutex.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/net.h -rw-r--r-- vlad/vlad 443 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/netdevice.h -rw-r--r-- vlad/vlad 505 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/netdevice.h.orig -rw-r--r-- vlad/vlad 344 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/netlink.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/notifier.h -rw-r--r-- vlad/vlad 56 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/parser.h -rw-r--r-- vlad/vlad 624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/pci.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/pci_regs.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/random.h -rw-r--r-- vlad/vlad 424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/rbtree.h -rw-r--r-- vlad/vlad 183 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/rtnetlink.h -rw-r--r-- vlad/vlad 175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/rwsem.h -rw-r--r-- vlad/vlad 736 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/signal.h -rw-r--r-- vlad/vlad 2331 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/skbuff.h -rw-r--r-- vlad/vlad 1397 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/slab.h -rw-r--r-- vlad/vlad 167 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/spinlock.h -rw-r--r-- vlad/vlad 349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/tcp.h -rw-r--r-- vlad/vlad 321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/timer.h -rw-r--r-- vlad/vlad 161 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/utsname.h -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/vmalloc.h -rw-r--r-- vlad/vlad 1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/net/ -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/net/neighbour.h -rw-r--r-- vlad/vlad 784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/net/netevent.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/scsi/ -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/scsi/scsi_cmnd.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/src/ -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/src/genalloc.c -rw-r--r-- vlad/vlad 3348 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/src/netevent.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/asm-generic/ -rw-r--r-- vlad/vlad 899 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/asm-generic/bug.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/asm/ -rw-r--r-- vlad/vlad 4155 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/asm/hvcall.h -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/ -rw-r--r-- vlad/vlad 223 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/bitops.h -rw-r--r-- vlad/vlad 166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/compiler.h -rw-r--r-- vlad/vlad 247 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/cpu.h -rw-r--r-- vlad/vlad 327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 205 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/fs.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/genalloc.h -rw-r--r-- vlad/vlad 215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/if_ether.h -rw-r--r-- vlad/vlad 412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/if_vlan.h -rw-r--r-- vlad/vlad 586 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/interrupt.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/ip.h -rw-r--r-- vlad/vlad 440 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/kernel.h -rw-r--r-- vlad/vlad 10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/lockdep.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/log2.h -rw-r--r-- vlad/vlad 186 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/mutex.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/net.h -rw-r--r-- vlad/vlad 443 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/netdevice.h -rw-r--r-- vlad/vlad 344 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/netlink.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/notifier.h -rw-r--r-- vlad/vlad 56 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/parser.h -rw-r--r-- vlad/vlad 624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/pci.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/pci_regs.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/random.h -rw-r--r-- vlad/vlad 424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/rbtree.h -rw-r--r-- vlad/vlad 183 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/rtnetlink.h -rw-r--r-- vlad/vlad 175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/rwsem.h -rw-r--r-- vlad/vlad 636 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/signal.h -rw-r--r-- vlad/vlad 2331 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/skbuff.h -rw-r--r-- vlad/vlad 1397 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/slab.h -rw-r--r-- vlad/vlad 167 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/spinlock.h -rw-r--r-- vlad/vlad 349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/tcp.h -rw-r--r-- vlad/vlad 321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/timer.h -rw-r--r-- vlad/vlad 138 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/utsname.h -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/vmalloc.h -rw-r--r-- vlad/vlad 1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/net/ -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/net/neighbour.h -rw-r--r-- vlad/vlad 784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/net/netevent.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/scsi/ -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/scsi/scsi_cmnd.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/src/ -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/src/genalloc.c -rw-r--r-- vlad/vlad 3348 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/src/netevent.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/asm-generic/ -rw-r--r-- vlad/vlad 1430 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/asm-generic/bug.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/asm/ -rw-r--r-- vlad/vlad 1289 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/asm/hvcall.h -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/ -rw-r--r-- vlad/vlad 166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/compiler.h -rw-r--r-- vlad/vlad 247 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/cpu.h -rw-r--r-- vlad/vlad 327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 205 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/fs.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/genalloc.h -rw-r--r-- vlad/vlad 521 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/inetdevice.h -rw-r--r-- vlad/vlad 586 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/interrupt.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/ip.h -rw-r--r-- vlad/vlad 440 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/kernel.h -rw-r--r-- vlad/vlad 10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/lockdep.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/log2.h -rw-r--r-- vlad/vlad 186 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/mutex.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/net.h -rw-r--r-- vlad/vlad 674 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/netdevice.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/notifier.h -rw-r--r-- vlad/vlad 624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/pci.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/pci_regs.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/random.h -rw-r--r-- vlad/vlad 424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/rbtree.h -rw-r--r-- vlad/vlad 175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/rwsem.h -rw-r--r-- vlad/vlad 610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/signal.h -rw-r--r-- vlad/vlad 2627 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/skbuff.h -rw-r--r-- vlad/vlad 764 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/slab.h -rw-r--r-- vlad/vlad 167 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/spinlock.h -rw-r--r-- vlad/vlad 240 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/tcp.h -rw-r--r-- vlad/vlad 321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/timer.h -rw-r--r-- vlad/vlad 161 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/utsname.h -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/vmalloc.h -rw-r--r-- vlad/vlad 1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/net/ -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/net/neighbour.h -rw-r--r-- vlad/vlad 784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/net/netevent.h -rw-r--r-- vlad/vlad 150 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/net/sock.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/scsi/ -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/scsi/scsi_cmnd.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/src/ -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/src/genalloc.c -rw-r--r-- vlad/vlad 3367 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/src/netevent.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/asm/ -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/ -rw-r--r-- vlad/vlad 166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/compiler.h -rw-r--r-- vlad/vlad 1274 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/crypto.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/genalloc.h -rw-r--r-- vlad/vlad 215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/if_ether.h -rw-r--r-- vlad/vlad 412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/if_vlan.h -rw-r--r-- vlad/vlad 604 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/interrupt.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/ip.h -rw-r--r-- vlad/vlad 258 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/kernel.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/log2.h -rw-r--r-- vlad/vlad 121 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/net.h -rw-r--r-- vlad/vlad 413 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/netdevice.h -rw-r--r-- vlad/vlad 344 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/netlink.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/notifier.h -rw-r--r-- vlad/vlad 624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/pci.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/random.h -rw-r--r-- vlad/vlad 230 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/rbtree.h -rw-r--r-- vlad/vlad 736 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 1982 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/skbuff.h -rw-r--r-- vlad/vlad 489 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/slab.h -rw-r--r-- vlad/vlad 240 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/tcp.h -rw-r--r-- vlad/vlad 161 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/types.h -rw-r--r-- vlad/vlad 1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/net/ -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/net/neighbour.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/scsi/ -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/scsi/scsi_cmnd.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/src/ -rw-r--r-- vlad/vlad 5439 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/src/genalloc.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/asm/ -rw-r--r-- vlad/vlad 1289 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/asm/hvcall.h -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/ -rw-r--r-- vlad/vlad 166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/compiler.h -rw-r--r-- vlad/vlad 439 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 205 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/fs.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/genalloc.h -rw-r--r-- vlad/vlad 215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/if_ether.h -rw-r--r-- vlad/vlad 412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/if_vlan.h -rw-r--r-- vlad/vlad 550 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/interrupt.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/ip.h -rw-r--r-- vlad/vlad 309 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/kernel.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/log2.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/net.h -rw-r--r-- vlad/vlad 413 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/netdevice.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/notifier.h -rw-r--r-- vlad/vlad 624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/pci.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/pci_regs.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/random.h -rw-r--r-- vlad/vlad 230 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/rbtree.h -rw-r--r-- vlad/vlad 610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 2332 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/skbuff.h -rw-r--r-- vlad/vlad 764 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/slab.h -rw-r--r-- vlad/vlad 349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/tcp.h -rw-r--r-- vlad/vlad 321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/timer.h -rw-r--r-- vlad/vlad 161 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/utsname.h -rw-r--r-- vlad/vlad 1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/net/ -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/net/neighbour.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/scsi/ -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/scsi/scsi_cmnd.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/src/ -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/src/genalloc.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/asm/ -rw-r--r-- vlad/vlad 1289 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/asm/hvcall.h -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/ -rw-r--r-- vlad/vlad 166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/compiler.h -rw-r--r-- vlad/vlad 1274 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/crypto.h -rw-r--r-- vlad/vlad 439 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 173 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/fs.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/genalloc.h -rw-r--r-- vlad/vlad 215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/if_ether.h -rw-r--r-- vlad/vlad 412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/if_vlan.h -rw-r--r-- vlad/vlad 550 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/interrupt.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/ip.h -rw-r--r-- vlad/vlad 258 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/kernel.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/log2.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/net.h -rw-r--r-- vlad/vlad 413 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/netdevice.h -rw-r--r-- vlad/vlad 344 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/netlink.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/notifier.h -rw-r--r-- vlad/vlad 624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/pci.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/pci_regs.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/random.h -rw-r--r-- vlad/vlad 230 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/rbtree.h -rw-r--r-- vlad/vlad 736 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 2333 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/skbuff.h -rw-r--r-- vlad/vlad 764 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/slab.h -rw-r--r-- vlad/vlad 349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/tcp.h -rw-r--r-- vlad/vlad 321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/timer.h -rw-r--r-- vlad/vlad 161 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/utsname.h -rw-r--r-- vlad/vlad 1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/net/ -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/net/neighbour.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/scsi/ -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/scsi/scsi_cmnd.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/src/ -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/src/genalloc.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/asm/ -rw-r--r-- vlad/vlad 1289 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/asm/hvcall.h -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/ -rw-r--r-- vlad/vlad 166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/compiler.h -rw-r--r-- vlad/vlad 439 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 205 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/fs.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/genalloc.h -rw-r--r-- vlad/vlad 498 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/interrupt.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/ip.h -rw-r--r-- vlad/vlad 309 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/kernel.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/log2.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/net.h -rw-r--r-- vlad/vlad 413 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/netdevice.h -rw-r--r-- vlad/vlad 339 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/netlink.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/notifier.h -rw-r--r-- vlad/vlad 624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/pci.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/pci_regs.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/random.h -rw-r--r-- vlad/vlad 230 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/rbtree.h -rw-r--r-- vlad/vlad 610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 2331 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/skbuff.h -rw-r--r-- vlad/vlad 764 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/slab.h -rw-r--r-- vlad/vlad 240 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/tcp.h -rw-r--r-- vlad/vlad 321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/timer.h -rw-r--r-- vlad/vlad 161 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/utsname.h -rw-r--r-- vlad/vlad 1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/net/ -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/net/neighbour.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/scsi/ -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/scsi/scsi_cmnd.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/src/ -rw-r--r-- vlad/vlad 5439 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/src/genalloc.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/asm/ -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/ -rw-r--r-- vlad/vlad 166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/compiler.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/genalloc.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/ip.h -rw-r--r-- vlad/vlad 258 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/kernel.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/log2.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/notifier.h -rw-r--r-- vlad/vlad 624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/pci.h -rw-r--r-- vlad/vlad 610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 2149 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/skbuff.h -rw-r--r-- vlad/vlad 459 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/slab.h -rw-r--r-- vlad/vlad 240 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/tcp.h -rw-r--r-- vlad/vlad 321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/timer.h -rw-r--r-- vlad/vlad 141 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/types.h -rw-r--r-- vlad/vlad 1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/net/ -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/net/neighbour.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/scsi/ -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/scsi/scsi_cmnd.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/src/ -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/src/genalloc.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/asm/ -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/ -rw-r--r-- vlad/vlad 166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/compiler.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/genalloc.h -rw-r--r-- vlad/vlad 215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/if_ether.h -rw-r--r-- vlad/vlad 412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/if_vlan.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/ip.h -rw-r--r-- vlad/vlad 159 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/kernel.h -rw-r--r-- vlad/vlad 368 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/log2.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/notifier.h -rw-r--r-- vlad/vlad 610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 2149 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/skbuff.h -rw-r--r-- vlad/vlad 459 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/slab.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/tcp.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/net/ -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/net/neighbour.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/scsi/ -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/scsi/scsi_cmnd.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/src/ -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/src/genalloc.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/asm/ -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/asm/prom.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/linux/ -rw-r--r-- vlad/vlad 166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/linux/compiler.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/linux/ip.h -rw-r--r-- vlad/vlad 159 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/linux/kernel.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/linux/notifier.h -rw-r--r-- vlad/vlad 610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 2149 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/linux/skbuff.h -rw-r--r-- vlad/vlad 459 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/linux/slab.h -rw-r--r-- vlad/vlad 240 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/linux/tcp.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/net/ -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/net/ip.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/scsi/ -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/scsi/scsi_cmnd.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22/include/linux/ -rw-r--r-- vlad/vlad 166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22/include/linux/compiler.h -rw-r--r-- vlad/vlad 610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 459 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22/include/linux/slab.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22/include/net/ -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22/include/net/ip.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22/include/scsi/ -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22/include/scsi/scsi_cmnd.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22_suse10_3/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22_suse10_3/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22_suse10_3/include/linux/ -rw-r--r-- vlad/vlad 166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22_suse10_3/include/linux/compiler.h -rw-r--r-- vlad/vlad 610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22_suse10_3/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 459 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22_suse10_3/include/linux/slab.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22_suse10_3/include/net/ -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22_suse10_3/include/net/ip.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22_suse10_3/include/scsi/ -rw-r--r-- vlad/vlad 413 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22_suse10_3/include/scsi/scsi_cmnd.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.23/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.23/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.23/include/linux/ -rw-r--r-- vlad/vlad 610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.23/include/linux/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.23/include/net/ -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.23/include/net/ip.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/asm-generic/ -rw-r--r-- vlad/vlad 1430 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/asm-generic/bug.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/asm/ -rw-r--r-- vlad/vlad 581 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/asm/atomic.h -rw-r--r-- vlad/vlad 590 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/asm/bitops.h -rw-r--r-- vlad/vlad 4244 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/asm/msr.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/asm/scatterlist.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/ -rw-r--r-- vlad/vlad 2595 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/attribute_container.h -rw-r--r-- vlad/vlad 409 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/bitops.h -rw-r--r-- vlad/vlad 335 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/cache.h -rw-r--r-- vlad/vlad 294 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/compiler.h -rw-r--r-- vlad/vlad 194 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/cpumask.h -rw-r--r-- vlad/vlad 1274 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/crypto.h -rw-r--r-- vlad/vlad 2517 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/debugfs.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/device.h -rw-r--r-- vlad/vlad 327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 185 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/err.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 201 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/ethtool.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/fs.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/genalloc.h -rw-r--r-- vlad/vlad 210 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/idr.h -rw-r--r-- vlad/vlad 215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/if_ether.h -rw-r--r-- vlad/vlad 1145 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/if_infiniband.h -rw-r--r-- vlad/vlad 412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/if_vlan.h -rw-r--r-- vlad/vlad 575 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/inetdevice.h -rw-r--r-- vlad/vlad 586 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/interrupt.h -rw-r--r-- vlad/vlad 20 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/io.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/ip.h -rw-r--r-- vlad/vlad 312 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/jiffies.h -rw-r--r-- vlad/vlad 534 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/kernel.h -rw-r--r-- vlad/vlad 4194 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/kfifo.h -rw-r--r-- vlad/vlad 1473 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/klist.h -rw-r--r-- vlad/vlad 546 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/kref.h -rw-r--r-- vlad/vlad 10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/lockdep.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/log2.h -rw-r--r-- vlad/vlad 872 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/mm.h -rw-r--r-- vlad/vlad 719 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/moduleparam.h -rw-r--r-- vlad/vlad 718 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/mutex.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/net.h -rw-r--r-- vlad/vlad 844 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/netdevice.h -rw-r--r-- vlad/vlad 476 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/netlink.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/notifier.h -rw-r--r-- vlad/vlad 1128 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/pci.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/random.h -rw-r--r-- vlad/vlad 424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/rbtree.h -rw-r--r-- vlad/vlad 183 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/rtnetlink.h -rw-r--r-- vlad/vlad 175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/rwsem.h -rw-r--r-- vlad/vlad 1253 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/signal.h -rw-r--r-- vlad/vlad 3275 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/skbuff.h -rw-r--r-- vlad/vlad 1208 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/slab.h -rw-r--r-- vlad/vlad 294 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/spinlock.h -rw-r--r-- vlad/vlad 349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/tcp.h -rw-r--r-- vlad/vlad 596 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/timer.h -rw-r--r-- vlad/vlad 2537 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/transport_class.h -rw-r--r-- vlad/vlad 312 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/utsname.h -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/vmalloc.h -rw-r--r-- vlad/vlad 1679 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/net/ -rw-r--r-- vlad/vlad 272 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/net/dst.h -rw-r--r-- vlad/vlad 405 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/net/inet_hashtables.h -rw-r--r-- vlad/vlad 79 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/net/inet_sock.h -rw-r--r-- vlad/vlad 1048 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/net/neighbour.h -rw-r--r-- vlad/vlad 784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/net/netevent.h -rw-r--r-- vlad/vlad 3494 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/net/sock.h -rw-r--r-- vlad/vlad 80 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/net/tcp_states.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/scsi/ -rw-r--r-- vlad/vlad 124 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/scsi/scsi.h -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/scsi/scsi_cmnd.h -rw-r--r-- vlad/vlad 578 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/scsi/scsi_device.h -rw-r--r-- vlad/vlad 274 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/scsi/scsi_host.h -rw-r--r-- vlad/vlad 201 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/scsi/scsi_transport.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/src/ -rw-r--r-- vlad/vlad 43 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/src/base.h -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/src/genalloc.c -rw-r--r-- vlad/vlad 292 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/src/init.c -rw-r--r-- vlad/vlad 3349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/src/netevent.c -rw-r--r-- vlad/vlad 1422 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/src/scsi.c -rw-r--r-- vlad/vlad 4579 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/src/scsi_lib.c -rw-r--r-- vlad/vlad 1445 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/src/scsi_scan.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm-generic/ -rw-r--r-- vlad/vlad 1430 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm-generic/bug.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm-powerpc/ -rw-r--r-- vlad/vlad 30 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm-powerpc/system.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm/ -rw-r--r-- vlad/vlad 581 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm/atomic.h -rw-r--r-- vlad/vlad 590 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm/bitops.h -rw-r--r-- vlad/vlad 9134 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm/hvcall.h -rw-r--r-- vlad/vlad 4244 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm/msr.h -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm/scatterlist.h -rw-r--r-- vlad/vlad 158 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm/smp.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/ -rw-r--r-- vlad/vlad 2595 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/attribute_container.h -rw-r--r-- vlad/vlad 409 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/bitops.h -rw-r--r-- vlad/vlad 335 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/cache.h -rw-r--r-- vlad/vlad 294 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/compiler.h -rw-r--r-- vlad/vlad 247 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/cpu.h -rw-r--r-- vlad/vlad 194 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/cpumask.h -rw-r--r-- vlad/vlad 1274 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/crypto.h -rw-r--r-- vlad/vlad 2517 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/debugfs.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/device.h -rw-r--r-- vlad/vlad 327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 185 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/err.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 201 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/ethtool.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/fs.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/genalloc.h -rw-r--r-- vlad/vlad 210 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/idr.h -rw-r--r-- vlad/vlad 215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/if_ether.h -rw-r--r-- vlad/vlad 1145 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/if_infiniband.h -rw-r--r-- vlad/vlad 412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/if_vlan.h -rw-r--r-- vlad/vlad 575 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/inetdevice.h -rw-r--r-- vlad/vlad 641 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/interrupt.h -rw-r--r-- vlad/vlad 20 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/io.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/ip.h -rw-r--r-- vlad/vlad 312 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/jiffies.h -rw-r--r-- vlad/vlad 534 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/kernel.h -rw-r--r-- vlad/vlad 4194 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/kfifo.h -rw-r--r-- vlad/vlad 1473 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/klist.h -rw-r--r-- vlad/vlad 546 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/kref.h -rw-r--r-- vlad/vlad 10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/lockdep.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/log2.h -rw-r--r-- vlad/vlad 872 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/mm.h -rw-r--r-- vlad/vlad 719 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/moduleparam.h -rw-r--r-- vlad/vlad 718 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/mutex.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/net.h -rw-r--r-- vlad/vlad 844 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/netdevice.h -rw-r--r-- vlad/vlad 476 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/netlink.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/notifier.h -rw-r--r-- vlad/vlad 1128 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/pci.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/random.h -rw-r--r-- vlad/vlad 424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/rbtree.h -rw-r--r-- vlad/vlad 183 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/rtnetlink.h -rw-r--r-- vlad/vlad 175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/rwsem.h -rw-r--r-- vlad/vlad 1253 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/signal.h -rw-r--r-- vlad/vlad 3275 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/skbuff.h -rw-r--r-- vlad/vlad 1208 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/slab.h -rw-r--r-- vlad/vlad 294 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/spinlock.h -rw-r--r-- vlad/vlad 349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/tcp.h -rw-r--r-- vlad/vlad 596 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/timer.h -rw-r--r-- vlad/vlad 2537 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/transport_class.h -rw-r--r-- vlad/vlad 312 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/utsname.h -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/vmalloc.h -rw-r--r-- vlad/vlad 1679 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/net/ -rw-r--r-- vlad/vlad 272 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/net/dst.h -rw-r--r-- vlad/vlad 405 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/net/inet_hashtables.h -rw-r--r-- vlad/vlad 79 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/net/inet_sock.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/net/neighbour.h -rw-r--r-- vlad/vlad 784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/net/netevent.h -rw-r--r-- vlad/vlad 3494 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/net/sock.h -rw-r--r-- vlad/vlad 80 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/net/tcp_states.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/scsi/ -rw-r--r-- vlad/vlad 124 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/scsi/scsi.h -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/scsi/scsi_cmnd.h -rw-r--r-- vlad/vlad 578 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/scsi/scsi_device.h -rw-r--r-- vlad/vlad 170 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/scsi/scsi_host.h -rw-r--r-- vlad/vlad 201 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/scsi/scsi_transport.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/src/ -rw-r--r-- vlad/vlad 43 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/src/base.h -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/src/genalloc.c -rw-r--r-- vlad/vlad 292 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/src/init.c -rw-r--r-- vlad/vlad 3349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/src/netevent.c -rw-r--r-- vlad/vlad 1422 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/src/scsi.c -rw-r--r-- vlad/vlad 4579 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/src/scsi_lib.c -rw-r--r-- vlad/vlad 1445 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/src/scsi_scan.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm-generic/ -rw-r--r-- vlad/vlad 899 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm-generic/bug.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm-powerpc/ -rw-r--r-- vlad/vlad 30 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm-powerpc/system.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm/ -rw-r--r-- vlad/vlad 581 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm/atomic.h -rw-r--r-- vlad/vlad 590 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm/bitops.h -rw-r--r-- vlad/vlad 9134 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm/hvcall.h -rw-r--r-- vlad/vlad 4244 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm/msr.h -rw-r--r-- vlad/vlad 255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm/pgtable-4k.h -rw-r--r-- vlad/vlad 342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm/pgtable-64k.h -rw-r--r-- vlad/vlad 174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm/prom.h -rw-r--r-- vlad/vlad 109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm/scatterlist.h -rw-r--r-- vlad/vlad 158 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm/smp.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/ -rw-r--r-- vlad/vlad 2595 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/attribute_container.h -rw-r--r-- vlad/vlad 409 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/bitops.h -rw-r--r-- vlad/vlad 335 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/cache.h -rw-r--r-- vlad/vlad 294 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/compiler.h -rw-r--r-- vlad/vlad 247 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/cpu.h -rw-r--r-- vlad/vlad 194 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/cpumask.h -rw-r--r-- vlad/vlad 1274 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/crypto.h -rw-r--r-- vlad/vlad 2517 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/debugfs.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/device.h -rw-r--r-- vlad/vlad 327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/dma-mapping.h -rw-r--r-- vlad/vlad 185 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/err.h -rw-r--r-- vlad/vlad 329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/etherdevice.h -rw-r--r-- vlad/vlad 201 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/ethtool.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/fs.h -rw-r--r-- vlad/vlad 1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/genalloc.h -rw-r--r-- vlad/vlad 210 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/idr.h -rw-r--r-- vlad/vlad 215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/if_ether.h -rw-r--r-- vlad/vlad 1145 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/if_infiniband.h -rw-r--r-- vlad/vlad 412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/if_vlan.h -rw-r--r-- vlad/vlad 575 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/inetdevice.h -rw-r--r-- vlad/vlad 641 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/interrupt.h -rw-r--r-- vlad/vlad 20 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/io.h -rw-r--r-- vlad/vlad 232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/ip.h -rw-r--r-- vlad/vlad 312 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/jiffies.h -rw-r--r-- vlad/vlad 534 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/kernel.h -rw-r--r-- vlad/vlad 4194 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/kfifo.h -rw-r--r-- vlad/vlad 1473 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/klist.h -rw-r--r-- vlad/vlad 546 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/kref.h -rw-r--r-- vlad/vlad 10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/lockdep.h -rw-r--r-- vlad/vlad 4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/log2.h -rw-r--r-- vlad/vlad 872 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/mm.h -rw-r--r-- vlad/vlad 719 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/moduleparam.h -rw-r--r-- vlad/vlad 718 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/mutex.h -rw-r--r-- vlad/vlad 433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/net.h -rw-r--r-- vlad/vlad 844 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/netdevice.h -rw-r--r-- vlad/vlad 476 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/netlink.h -rw-r--r-- vlad/vlad 692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/notifier.h -rw-r--r-- vlad/vlad 1128 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/pci.h -rw-r--r-- vlad/vlad 180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/random.h -rw-r--r-- vlad/vlad 424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/rbtree.h -rw-r--r-- vlad/vlad 183 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/rtnetlink.h -rw-r--r-- vlad/vlad 175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/rwsem.h -rw-r--r-- vlad/vlad 1253 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/scatterlist.h -rw-r--r-- vlad/vlad 146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/signal.h -rw-r--r-- vlad/vlad 3275 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/skbuff.h -rw-r--r-- vlad/vlad 1208 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/slab.h -rw-r--r-- vlad/vlad 233 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/spinlock.h -rw-r--r-- vlad/vlad 349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/tcp.h -rw-r--r-- vlad/vlad 596 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/timer.h -rw-r--r-- vlad/vlad 2537 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/transport_class.h -rw-r--r-- vlad/vlad 312 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/types.h -rw-r--r-- vlad/vlad 213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/utsname.h -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/vmalloc.h -rw-r--r-- vlad/vlad 1679 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/workqueue.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/net/ -rw-r--r-- vlad/vlad 272 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/net/dst.h -rw-r--r-- vlad/vlad 405 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/net/inet_hashtables.h -rw-r--r-- vlad/vlad 79 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/net/inet_sock.h -rw-r--r-- vlad/vlad 227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/net/ip.h -rw-r--r-- vlad/vlad 171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/net/neighbour.h -rw-r--r-- vlad/vlad 784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/net/netevent.h -rw-r--r-- vlad/vlad 3494 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/net/sock.h -rw-r--r-- vlad/vlad 80 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/net/tcp_states.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/scsi/ -rw-r--r-- vlad/vlad 124 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/scsi/scsi.h -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/scsi/scsi_cmnd.h -rw-r--r-- vlad/vlad 578 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/scsi/scsi_device.h -rw-r--r-- vlad/vlad 170 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/scsi/scsi_host.h -rw-r--r-- vlad/vlad 201 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/scsi/scsi_transport.h drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/src/ -rw-r--r-- vlad/vlad 43 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/src/base.h -rw-r--r-- vlad/vlad 5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/src/genalloc.c -rw-r--r-- vlad/vlad 292 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/src/init.c -rw-r--r-- vlad/vlad 3349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/src/netevent.c -rw-r--r-- vlad/vlad 1422 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/src/scsi.c -rw-r--r-- vlad/vlad 4579 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/src/scsi_lib.c -rw-r--r-- vlad/vlad 1445 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/src/scsi_scan.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel/ -rw-r--r-- vlad/vlad 5225 2008-02-28 09:59:53 ofa_kernel-1.3/kernel/kfifo.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ -rw-r--r-- vlad/vlad 2671 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1565 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/2_misc_device_to_2_6_9.patch -rw-r--r-- vlad/vlad 1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/addr_1_netevents_revert_to_2_6_17.patch -rw-r--r-- vlad/vlad 578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/addr_3926_to_2_6_13.patch -rw-r--r-- vlad/vlad 709 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/addr_6720_to_2_6_9.patch -rw-r--r-- vlad/vlad 732 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/addr_8802_to_2_6_5-7_244.patch -rw-r--r-- vlad/vlad 471 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/cm_8802_to_2_6_5-7_244.patch -rw-r--r-- vlad/vlad 5770 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/core_4807_to_2_6_9.patch -rw-r--r-- vlad/vlad 8340 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 715 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/cxgb3_t3_hw_to_2.6.5_sles9_sp3.patch -rw-r--r-- vlad/vlad 4218 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/cxio_hal_to_2.6.14.patch -rw-r--r-- vlad/vlad 2708 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-01-header.patch -rw-r--r-- vlad/vlad 1665 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-02-dont-leak-info-to-userspace.patch -rw-r--r-- vlad/vlad 3535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-03-iowrite32_copy.patch -rw-r--r-- vlad/vlad 4200 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-05-page-hacks-2.6.14.patch -rw-r--r-- vlad/vlad 1437 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-06-page-hacks-2.6.9.patch -rw-r--r-- vlad/vlad 912 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-07-iounmap-2.6.9.patch -rw-r--r-- vlad/vlad 910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-08-fs-get_sb-2.6.17.patch -rw-r--r-- vlad/vlad 4543 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-09-sysfs-show-2.6.12.patch -rw-r--r-- vlad/vlad 558 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-10-rlimit-2.6.9.patch -rw-r--r-- vlad/vlad 980 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-13-class-2.6.9.patch -rw-r--r-- vlad/vlad 3115 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-15-kref-2.6.5.patch -rw-r--r-- vlad/vlad 12734 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-16-htirq-2.6.18.patch -rw-r--r-- vlad/vlad 1096 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-17-ipath_intr-2.6.18.patch -rw-r--r-- vlad/vlad 1221 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath_rev_for_2_6_22.patch -rw-r--r-- vlad/vlad 2984 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipoib_8111_to_2_6_16.patch -rw-r--r-- vlad/vlad 1983 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipoib_8802_to_2_6_5-7_244.patch -rw-r--r-- vlad/vlad 6183 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipoib_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1939 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipoib_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 1412 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/iwch_cm_to_2_6_5-7_244.patch -rw-r--r-- vlad/vlad 1505 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/linux_stuff_to_2_6_5-7_244.patch -rw-r--r-- vlad/vlad 466 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/mlx4_makefile_to_2_6_5-7_244.patch -rw-r--r-- vlad/vlad 469 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/mthca_catas_reset_to_2_6_5-7_244.patch -rw-r--r-- vlad/vlad 734 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/mthca_dev_3465_to_2_6_11.patch -rw-r--r-- vlad/vlad 908 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/rds_to_2_6_9.patch -rw-r--r-- vlad/vlad 1056 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/sa_query_8802_to_2_6_5-7_244.patch -rw-r--r-- vlad/vlad 2239 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/sdp_bcopy_8802_to_2_6_5-7.244.patch -rw-r--r-- vlad/vlad 465 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/sdp_cma_8111_to_2_6_5-7_244.patch -rw-r--r-- vlad/vlad 8459 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/sdp_main_to_2_6_5-7.244.patch -rw-r--r-- vlad/vlad 2632 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/srp_7312_to_2_6_11.patch -rw-r--r-- vlad/vlad 319 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/srp_Makefile_8802_to_2_6_5-7_244.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/srp_cmd_to_2_6_22.patch -rw-r--r-- vlad/vlad 948 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/srp_scsi_scan_target_7242_to_2_6_11.patch -rw-r--r-- vlad/vlad 1443 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/t3_hw_to_2_6_5-7_244.patch -rw-r--r-- vlad/vlad 473 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/top_8109_to_2_6_5-7_244.patch -rw-r--r-- vlad/vlad 628 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ucm_5245_to_2_6_9.patch -rw-r--r-- vlad/vlad 2378 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/user_mad_4603_to_2_6_9.patch -rw-r--r-- vlad/vlad 1633 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/user_mad_8802_to_2_6_5-7_244.patch -rw-r--r-- vlad/vlad 3910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/uverbs_8802_to_2_6_5-7_244.patch -rw-r--r-- vlad/vlad 1706 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/uverbs_main_3935_to_2_6_9.patch -rw-r--r-- vlad/vlad 842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/uverbs_to_2_6_17.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ -rw-r--r-- vlad/vlad 2671 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1565 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/2_misc_device_to_2_6_9.patch -rw-r--r-- vlad/vlad 1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/addr_1_netevents_revert_to_2_6_17.patch -rw-r--r-- vlad/vlad 578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/addr_3926_to_2_6_13.patch -rw-r--r-- vlad/vlad 686 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/addr_4670_to_2_6_9.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/core_1sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 5770 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/core_4807_to_2_6_9.patch -rw-r--r-- vlad/vlad 8340 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 4218 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/cxio_hal_to_2.6.14.patch -rw-r--r-- vlad/vlad 2708 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-01-header.patch -rw-r--r-- vlad/vlad 1665 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-02-dont-leak-info-to-userspace.patch -rw-r--r-- vlad/vlad 3535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-03-iowrite32_copy.patch -rw-r--r-- vlad/vlad 4200 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-05-page-hacks-2.6.14.patch -rw-r--r-- vlad/vlad 1437 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-06-page-hacks-2.6.9.patch -rw-r--r-- vlad/vlad 912 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-07-iounmap-2.6.9.patch -rw-r--r-- vlad/vlad 910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-08-fs-get_sb-2.6.17.patch -rw-r--r-- vlad/vlad 4543 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-09-sysfs-show-2.6.12.patch -rw-r--r-- vlad/vlad 558 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-10-rlimit-2.6.9.patch -rw-r--r-- vlad/vlad 980 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-13-class-2.6.9.patch -rw-r--r-- vlad/vlad 12734 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-16-htirq-2.6.18.patch -rw-r--r-- vlad/vlad 1096 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-17-ipath_intr-2.6.18.patch -rw-r--r-- vlad/vlad 1221 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath_rev_for_2_6_22.patch -rw-r--r-- vlad/vlad 13882 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 6508 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipoib_0200_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1935 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipoib_0300_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 2825 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipoib_to_2_6_16.patch -rw-r--r-- vlad/vlad 401 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/iwch_cm_to_2_6_9_U2.patch -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/iwch_provider_to_2.6.9_U4.patch -rw-r--r-- vlad/vlad 1015 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/linux_stuff_to_2_6_17.patch -rw-r--r-- vlad/vlad 336 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/makefile_to_2_6_9.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 734 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/mthca_dev_3465_to_2_6_11.patch -rw-r--r-- vlad/vlad 908 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/rds_to_2_6_9.patch -rw-r--r-- vlad/vlad 2103 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/sdp_7277_to_2_6_11.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4557 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2632 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/srp_7312_to_2_6_11.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/srp_cmd_to_2_6_22.patch -rw-r--r-- vlad/vlad 948 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/srp_scsi_scan_target_7242_to_2_6_11.patch -rw-r--r-- vlad/vlad 1443 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/t3_hw_to_2_6_5-7_244.patch -rw-r--r-- vlad/vlad 628 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ucm_5245_to_2_6_9.patch -rw-r--r-- vlad/vlad 2378 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/user_mad_4603_to_2_6_9.patch -rw-r--r-- vlad/vlad 1706 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/uverbs_main_3935_to_2_6_9.patch -rw-r--r-- vlad/vlad 842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/uverbs_to_2_6_17.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ -rw-r--r-- vlad/vlad 2671 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1565 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/2_misc_device_to_2_6_9.patch -rw-r--r-- vlad/vlad 3195 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/add_iscsi_session_wq.patch -rw-r--r-- vlad/vlad 6569 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/add_open_iscsi.patch -rw-r--r-- vlad/vlad 1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/addr_1_netevents_revert_to_2_6_17.patch -rw-r--r-- vlad/vlad 578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/addr_3926_to_2_6_13.patch -rw-r--r-- vlad/vlad 686 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/addr_4670_to_2_6_9.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/core_1sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 5770 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/core_4807_to_2_6_9.patch -rw-r--r-- vlad/vlad 8340 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 4218 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/cxio_hal_to_2.6.14.patch -rw-r--r-- vlad/vlad 511 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/fix_inclusion_order_iscsi_iser.patch -rw-r--r-- vlad/vlad 2708 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-01-header.patch -rw-r--r-- vlad/vlad 1665 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-02-dont-leak-info-to-userspace.patch -rw-r--r-- vlad/vlad 3535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-03-iowrite32_copy.patch -rw-r--r-- vlad/vlad 4200 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-05-page-hacks-2.6.14.patch -rw-r--r-- vlad/vlad 1437 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-06-page-hacks-2.6.9.patch -rw-r--r-- vlad/vlad 912 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-07-iounmap-2.6.9.patch -rw-r--r-- vlad/vlad 910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-08-fs-get_sb-2.6.17.patch -rw-r--r-- vlad/vlad 4543 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-09-sysfs-show-2.6.12.patch -rw-r--r-- vlad/vlad 558 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-10-rlimit-2.6.9.patch -rw-r--r-- vlad/vlad 980 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-13-class-2.6.9.patch -rw-r--r-- vlad/vlad 12734 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-16-htirq-2.6.18.patch -rw-r--r-- vlad/vlad 1096 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-17-ipath_intr-2.6.18.patch -rw-r--r-- vlad/vlad 1221 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath_rev_for_2_6_22.patch -rw-r--r-- vlad/vlad 13882 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 6508 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipoib_0200_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1935 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipoib_0300_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 2825 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipoib_to_2_6_16.patch -rw-r--r-- vlad/vlad 2390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/iscsi_scsi_addons.patch -rw-r--r-- vlad/vlad 842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/iser_handle_non_sg_data.patch -rw-r--r-- vlad/vlad 401 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/iwch_cm_to_2_6_9_U3.patch -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/iwch_provider_to_2.6.9_U4.patch -rw-r--r-- vlad/vlad 1015 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/linux_stuff_to_2_6_17.patch -rw-r--r-- vlad/vlad 336 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/makefile_to_2_6_9.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 734 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/mthca_dev_3465_to_2_6_11.patch -rw-r--r-- vlad/vlad 9567 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/qlgc_vnic_sysfs_nested_class_dev.patch -rw-r--r-- vlad/vlad 908 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/rds_to_2_6_9.patch -rw-r--r-- vlad/vlad 1705 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/release_host_lock_before_eh.patch -rw-r--r-- vlad/vlad 2103 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/sdp_7277_to_2_6_11.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4557 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2632 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/srp_7312_to_2_6_11.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/srp_cmd_to_2_6_22.patch -rw-r--r-- vlad/vlad 948 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/srp_scsi_scan_target_7242_to_2_6_11.patch -rw-r--r-- vlad/vlad 1443 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/t3_hw_to_2_6_5-7_244.patch -rw-r--r-- vlad/vlad 628 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ucm_5245_to_2_6_9.patch -rw-r--r-- vlad/vlad 2378 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/user_mad_4603_to_2_6_9.patch -rw-r--r-- vlad/vlad 1706 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/uverbs_main_3935_to_2_6_9.patch -rw-r--r-- vlad/vlad 842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/uverbs_to_2_6_17.patch -rw-r--r-- vlad/vlad 1367 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/ipoib_crash_wa.patch -rw-r--r-- vlad/vlad 5576 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/ipoib_napi_optional.patch -rw-r--r-- vlad/vlad 779 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0010_ipoib_high_dma.patch -rw-r--r-- vlad/vlad 10526 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0017_ipoib_sg.patch -rw-r--r-- vlad/vlad 6726 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0019_hw_csum.patch -rw-r--r-- vlad/vlad 4735 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0050_ipoib_checksum_offload.patch -rw-r--r-- vlad/vlad 694 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0060_ipoib_qp_init_attr.patch -rw-r--r-- vlad/vlad 8374 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0110_ipoib_lso.patch -rw-r--r-- vlad/vlad 4438 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0120_ipoib_ethtool.patch -rw-r--r-- vlad/vlad 17108 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0130_ipoib_lro.patch -rw-r--r-- vlad/vlad 3223 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0160_ipoib_modify_cq.patch -rw-r--r-- vlad/vlad 1268 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0220_control_lro.patch -rw-r--r-- vlad/vlad 1465 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0240_cq_coal_ipoib_cm.path drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/2_misc_device_to_2_6_19.patch -rw-r--r-- vlad/vlad 1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/addr_1_netevents_revert_to_2_6_17.patch -rw-r--r-- vlad/vlad 578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/addr_3926_to_2_6_13.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/cxgb3_remove_eeh.patch -rw-r--r-- vlad/vlad 1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ipath-04-aio_write.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ipoib_0200_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ipoib_0300_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ipoib_0400_skb_to_2_6_20.patch -rw-r--r-- vlad/vlad 2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ipoib_to_2_6_16.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 1409 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/iw_nes_200_to_2_6_13.patch -rw-r--r-- vlad/vlad 782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/linux_stuff_to_2_6_17.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 734 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/mthca_dev_3465_to_2_6_11.patch -rw-r--r-- vlad/vlad 428 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/rds_to_2_6_11.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 2103 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/sdp_7277_to_2_6_11.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2632 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/srp_7312_to_2_6_11.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/srp_cmd_to_2_6_22.patch -rwxr-xr-x vlad/vlad 976 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/srp_scsi_scan_target_7242_to_2_6_11.patch -rw-r--r-- vlad/vlad 506 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/t3_hw_to_2_6_13.patch -rw-r--r-- vlad/vlad 628 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ucm_5245_to_2_6_9.patch -rw-r--r-- vlad/vlad 723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ucm_to_2_6_16.patch -rw-r--r-- vlad/vlad 702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ucma_to_2_6_16.patch -rw-r--r-- vlad/vlad 1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/user_mad_3935_to_2_6_11.patch -rw-r--r-- vlad/vlad 1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/user_mad_to_2_6_16.patch -rw-r--r-- vlad/vlad 839 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/uverbs_main_3935_to_2_6_11.patch -rw-r--r-- vlad/vlad 1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/uverbs_to_2_6_16.patch -rw-r--r-- vlad/vlad 842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/uverbs_to_2_6_17.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/2_misc_device_to_2_6_19.patch -rw-r--r-- vlad/vlad 1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/addr_1_netevents_revert_to_2_6_17.patch -rw-r--r-- vlad/vlad 578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/addr_3926_to_2_6_13.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/cxgb3_remove_eeh.patch -rw-r--r-- vlad/vlad 1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/ipath-04-aio_write.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/ipoib_0200_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/ipoib_0300_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/ipoib_0400_skb_to_2_6_20.patch -rw-r--r-- vlad/vlad 2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/ipoib_to_2_6_16.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 743 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/mthca_dev_3465_to_2_6_11.patch -rw-r--r-- vlad/vlad 649 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/mthca_provider_3465_to_2_6_11.patch -rw-r--r-- vlad/vlad 428 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/rds_to_2_6_11.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1972 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/sdp_7277_to_2_6_13.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/srp_cmd_to_2_6_22.patch -rw-r--r-- vlad/vlad 723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/ucm_to_2_6_16.patch -rw-r--r-- vlad/vlad 702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/ucma_to_2_6_16.patch -rw-r--r-- vlad/vlad 1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/user_mad_3935_to_2_6_11_FC4.patch -rw-r--r-- vlad/vlad 1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/user_mad_to_2_6_16.patch -rw-r--r-- vlad/vlad 839 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/uverbs_main_3935_to_2_6_11_FC4.patch -rw-r--r-- vlad/vlad 1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/uverbs_to_2_6_16.patch -rw-r--r-- vlad/vlad 842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/uverbs_to_2_6_17.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/2_misc_device_to_2_6_19.patch -rw-r--r-- vlad/vlad 1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/addr_1_netevents_revert_to_2_6_17.patch -rw-r--r-- vlad/vlad 578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/addr_3926_to_2_6_13.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/cxgb3_remove_eeh.patch -rw-r--r-- vlad/vlad 1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/ipath-04-aio_write.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/ipoib_0200_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/ipoib_0300_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/ipoib_0400_skb_to_2_6_20.patch -rw-r--r-- vlad/vlad 2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/ipoib_to_2_6_16.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/iw_cxgb3_0200_states.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 1409 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/iw_nes_200_to_2_6_13.patch -rw-r--r-- vlad/vlad 782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/linux_stuff_to_2_6_17.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1972 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/sdp_7277_to_2_6_13.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/srp_cmd_to_2_6_22.patch -rw-r--r-- vlad/vlad 506 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/t3_hw_to_2_6_13.patch -rw-r--r-- vlad/vlad 723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/ucm_to_2_6_16.patch -rw-r--r-- vlad/vlad 702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/ucma_to_2_6_16.patch -rw-r--r-- vlad/vlad 1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/user_mad_to_2_6_16.patch -rw-r--r-- vlad/vlad 1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/uverbs_to_2_6_16.patch -rw-r--r-- vlad/vlad 842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/uverbs_to_2_6_17.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/2_misc_device_to_2_6_19.patch -rw-r--r-- vlad/vlad 1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/addr_1_netevents_revert_to_2_6_17.patch -rw-r--r-- vlad/vlad 578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/addr_3926_to_2_6_13.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/cxgb3_remove_eeh.patch -rw-r--r-- vlad/vlad 1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/ipath-04-aio_write.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/ipoib_0200_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/ipoib_0300_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/ipoib_0400_skb_to_2_6_20.patch -rw-r--r-- vlad/vlad 2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/ipoib_to_2_6_16.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/iw_cxgb3_0200_states.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 1409 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/iw_nes_200_to_2_6_13.patch -rw-r--r-- vlad/vlad 782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/linux_stuff_to_2_6_17.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1972 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/sdp_7277_to_2_6_13.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/srp_cmd_to_2_6_22.patch -rw-r--r-- vlad/vlad 506 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/t3_hw_to_2_6_13.patch -rw-r--r-- vlad/vlad 723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/ucm_to_2_6_16.patch -rw-r--r-- vlad/vlad 702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/ucma_to_2_6_16.patch -rw-r--r-- vlad/vlad 1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/user_mad_to_2_6_16.patch -rw-r--r-- vlad/vlad 1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/uverbs_to_2_6_16.patch -rw-r--r-- vlad/vlad 842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/uverbs_to_2_6_17.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/2_misc_device_to_2_6_19.patch -rw-r--r-- vlad/vlad 1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/addr_1_netevents_revert_to_2_6_17.patch -rw-r--r-- vlad/vlad 578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/addr_3926_to_2_6_13.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/cxgb3_remove_eeh.patch -rw-r--r-- vlad/vlad 1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/ipath-04-aio_write.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/ipoib_0200_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/ipoib_0400_skb_to_2_6_20.patch -rw-r--r-- vlad/vlad 2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/ipoib_to_2_6_16.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1972 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/sdp_7277_to_2_6_13.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/srp_cmd_to_2_6_22.patch -rw-r--r-- vlad/vlad 723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/ucm_to_2_6_16.patch -rw-r--r-- vlad/vlad 702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/ucma_to_2_6_16.patch -rw-r--r-- vlad/vlad 1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/user_mad_to_2_6_16.patch -rw-r--r-- vlad/vlad 1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/uverbs_to_2_6_16.patch -rw-r--r-- vlad/vlad 842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/uverbs_to_2_6_17.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/2_misc_device_to_2_6_19.patch -rw-r--r-- vlad/vlad 1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/addr_1_netevents_revert_to_2_6_17.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/cxgb3_main_to_2_6_22.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/cxgb3_remove_eeh.patch -rw-r--r-- vlad/vlad 1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/ipath-04-aio_write.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/ipoib_0200_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/ipoib_0300_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/ipoib_0400_skb_to_2_6_20.patch -rw-r--r-- vlad/vlad 2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/ipoib_to_2_6_16.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/iw_cxgb3_0200_states.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/linux_stuff_to_2_6_17.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/srp_cmd_to_2_6_22.patch -rw-r--r-- vlad/vlad 723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/ucm_to_2_6_16.patch -rw-r--r-- vlad/vlad 702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/ucma_to_2_6_16.patch -rw-r--r-- vlad/vlad 1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/user_mad_to_2_6_16.patch -rw-r--r-- vlad/vlad 1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/uverbs_to_2_6_16.patch -rw-r--r-- vlad/vlad 842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/uverbs_to_2_6_17.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/2_misc_device_to_2_6_19.patch -rw-r--r-- vlad/vlad 1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/addr_1_netevents_revert_to_2_6_17.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/cxgb3_main_to_2_6_22.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/cxgb3_remove_eeh.patch -rw-r--r-- vlad/vlad 1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/ipath-04-aio_write.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/ipoib_0200_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/ipoib_0300_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/ipoib_0400_skb_to_2_6_20.patch -rw-r--r-- vlad/vlad 2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/ipoib_to_2_6_16.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/iw_cxgb3_0200_states.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/linux_stuff_to_2_6_17.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/srp_cmd_to_2_6_22.patch -rw-r--r-- vlad/vlad 723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/ucm_to_2_6_16.patch -rw-r--r-- vlad/vlad 702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/ucma_to_2_6_16.patch -rw-r--r-- vlad/vlad 1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/user_mad_to_2_6_16.patch -rw-r--r-- vlad/vlad 1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/uverbs_to_2_6_16.patch -rw-r--r-- vlad/vlad 842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/uverbs_to_2_6_17.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/2_misc_device_to_2_6_19.patch -rw-r--r-- vlad/vlad 1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/addr_1_netevents_revert_to_2_6_17.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/cxgb3_main_to_2_6_22.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/cxgb3_remove_eeh.patch -rw-r--r-- vlad/vlad 1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/ipath-04-aio_write.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/ipoib_0200_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/ipoib_0300_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/ipoib_0400_skb_to_2_6_20.patch -rw-r--r-- vlad/vlad 2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/ipoib_to_2_6_16.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/iw_cxgb3_0200_states.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/linux_stuff_to_2_6_17.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/srp_cmd_to_2_6_22.patch -rw-r--r-- vlad/vlad 723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/ucm_to_2_6_16.patch -rw-r--r-- vlad/vlad 702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/ucma_to_2_6_16.patch -rw-r--r-- vlad/vlad 1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/user_mad_to_2_6_16.patch -rw-r--r-- vlad/vlad 1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/uverbs_to_2_6_16.patch -rw-r--r-- vlad/vlad 842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/uverbs_to_2_6_17.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/2_misc_device_to_2_6_19.patch -rw-r--r-- vlad/vlad 1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/addr_1_netevents_revert_to_2_6_17.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxgb3_main_to_2_6_22.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ehca_01_ibmebus_loc_code.patch -rw-r--r-- vlad/vlad 1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipath-04-aio_write.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_0200_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_0300_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_0400_skb_to_2_6_20.patch -rw-r--r-- vlad/vlad 2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_to_2_6_16.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/iw_cxgb3_0200_states.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/linux_stuff_to_2_6_17.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/srp_cmd_to_2_6_22.patch -rw-r--r-- vlad/vlad 723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ucm_to_2_6_16.patch -rw-r--r-- vlad/vlad 702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ucma_to_2_6_16.patch -rw-r--r-- vlad/vlad 1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/user_mad_to_2_6_16.patch -rw-r--r-- vlad/vlad 1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/uverbs_to_2_6_16.patch -rw-r--r-- vlad/vlad 842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/uverbs_to_2_6_17.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/2_misc_device_to_2_6_19.patch -rw-r--r-- vlad/vlad 1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/addr_1_netevents_revert_to_2_6_17.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/cxgb3_main_to_2_6_22.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ehca_01_ibmebus_loc_code.patch -rw-r--r-- vlad/vlad 1693 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipath-02-dont-leak-info-to-userspace.patch -rw-r--r-- vlad/vlad 1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipath-04-aio_write.patch -rw-r--r-- vlad/vlad 910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipath-08-fs-get_sb-2.6.17.patch -rw-r--r-- vlad/vlad 14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipath-16-htirq-2.6.18.patch -rw-r--r-- vlad/vlad 1354 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipath-21-warnings.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipoib_0200_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipoib_0300_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipoib_0400_skb_to_2_6_20.patch -rw-r--r-- vlad/vlad 2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipoib_to_2_6_16.patch -rw-r--r-- vlad/vlad 49962 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/iscsi_01_sync_kernel_code_with_release_2.0-865.15.patch -rw-r--r-- vlad/vlad 593 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/iscsi_02_865_to_2_6_9-19.patch -rw-r--r-- vlad/vlad 768 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/iser_sync_with_open_iscsi_2.0-865.13.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/iw_cxgb3_0200_states.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/linux_stuff_to_2_6_17.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/srp_cmd_to_2_6_22.patch -rw-r--r-- vlad/vlad 723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ucm_to_2_6_16.patch -rw-r--r-- vlad/vlad 702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ucma_to_2_6_16.patch -rw-r--r-- vlad/vlad 1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/user_mad_to_2_6_16.patch -rw-r--r-- vlad/vlad 1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/uverbs_to_2_6_16.patch -rw-r--r-- vlad/vlad 842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/uverbs_to_2_6_17.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/2_misc_device_to_2_6_19.patch -rw-r--r-- vlad/vlad 1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/addr_1_netevents_revert_to_2_6_17.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/cxgb3_main_to_2_6_22.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ehca_01_ibmebus_loc_code.patch -rw-r--r-- vlad/vlad 1693 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipath-02-dont-leak-info-to-userspace.patch -rw-r--r-- vlad/vlad 1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipath-04-aio_write.patch -rw-r--r-- vlad/vlad 910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipath-08-fs-get_sb-2.6.17.patch -rw-r--r-- vlad/vlad 14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipath-16-htirq-2.6.18.patch -rw-r--r-- vlad/vlad 1354 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipath-21-warnings.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipoib_0200_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipoib_0300_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipoib_0400_skb_to_2_6_20.patch -rw-r--r-- vlad/vlad 2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipoib_to_2_6_16.patch -rw-r--r-- vlad/vlad 49962 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/iscsi_01_sync_kernel_code_with_release_2.0-865.15.patch -rw-r--r-- vlad/vlad 593 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/iscsi_02_865_to_2_6_9-19.patch -rw-r--r-- vlad/vlad 768 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/iser_sync_with_open_iscsi_2.0-865.13.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/iw_cxgb3_0200_states.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/linux_stuff_to_2_6_17.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/srp_cmd_to_2_6_22.patch -rw-r--r-- vlad/vlad 723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ucm_to_2_6_16.patch -rw-r--r-- vlad/vlad 702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ucma_to_2_6_16.patch -rw-r--r-- vlad/vlad 1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/user_mad_to_2_6_16.patch -rw-r--r-- vlad/vlad 1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/uverbs_to_2_6_16.patch -rw-r--r-- vlad/vlad 842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/uverbs_to_2_6_17.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/2_misc_device_to_2_6_19.patch -rw-r--r-- vlad/vlad 1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/addr_1_netevents_revert_to_2_6_17.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/cxgb3_main_to_2_6_22.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ehca_01_ibmebus_loc_code.patch -rw-r--r-- vlad/vlad 1693 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipath-02-dont-leak-info-to-userspace.patch -rw-r--r-- vlad/vlad 1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipath-04-aio_write.patch -rw-r--r-- vlad/vlad 910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipath-08-fs-get_sb-2.6.17.patch -rw-r--r-- vlad/vlad 14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipath-16-htirq-2.6.18.patch -rw-r--r-- vlad/vlad 1354 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipath-21-warnings.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6508 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipoib_0200_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1935 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipoib_0300_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipoib_0400_skb_to_2_6_20.patch -rw-r--r-- vlad/vlad 2825 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipoib_to_2_6_16.patch -rw-r--r-- vlad/vlad 49962 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/iscsi_01_sync_kernel_code_with_release_2.0-865.15.patch -rw-r--r-- vlad/vlad 593 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/iscsi_02_865_to_2_6_9-19.patch -rw-r--r-- vlad/vlad 768 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/iser_sync_with_open_iscsi_2.0-865.13.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/iw_cxgb3_0200_states.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/linux_stuff_to_2_6_17.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/srp_cmd_to_2_6_22.patch -rw-r--r-- vlad/vlad 842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/uverbs_to_2_6_17.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/2_misc_device_to_2_6_19.patch -rw-r--r-- vlad/vlad 1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/addr_1_netevents_revert_to_2_6_17.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/cxgb3_main_to_2_6_22.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ehca_01_ibmebus_loc_code.patch -rw-r--r-- vlad/vlad 2708 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipath-01-header.patch -rw-r--r-- vlad/vlad 1693 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipath-02-dont-leak-info-to-userspace.patch -rw-r--r-- vlad/vlad 1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipath-04-aio_write.patch -rw-r--r-- vlad/vlad 910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipath-08-fs-get_sb-2.6.17.patch -rw-r--r-- vlad/vlad 14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipath-16-htirq-2.6.18.patch -rw-r--r-- vlad/vlad 1354 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipath-21-warnings.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipoib_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipoib_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipoib_skb_to_2_6_20.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/iw_cxgb3_0200_states.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/linux_stuff_to_2_6_17.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/srp_cmd_to_2_6_22.patch -rw-r--r-- vlad/vlad 842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/uverbs_to_2_6_17.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/2_misc_device_to_2_6_19.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/cxgb3_main_to_2_6_22.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/ehca_01_ibmebus_loc_code.patch -rw-r--r-- vlad/vlad 1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/ipath-04-aio_write.patch -rw-r--r-- vlad/vlad 14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/ipath-16-htirq-2.6.18.patch -rw-r--r-- vlad/vlad 4785 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/ipath-20-vmalloc_user-2.6.18.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/ipoib_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/ipoib_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/ipoib_skb_to_2_6_20.patch -rw-r--r-- vlad/vlad 49962 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/iscsi_01_sync_kernel_code_with_release_2.0-865.15.patch -rw-r--r-- vlad/vlad 593 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/iscsi_02_865_to_2_6_9-19.patch -rw-r--r-- vlad/vlad 768 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/iser_sync_with_open_iscsi_2.0-865.13.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/iw_cxgb3_0200_states.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 546 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/linux_genalloc_to_2_6_20.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/srp_cmd_to_2_6_22.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/2_misc_device_to_2_6_19.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/cxgb3_main_to_2_6_22.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/ehca_01_ibmebus_loc_code.patch -rw-r--r-- vlad/vlad 1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/ipath-04-aio_write.patch -rw-r--r-- vlad/vlad 14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/ipath-16-htirq-2.6.18.patch -rw-r--r-- vlad/vlad 4785 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/ipath-20-vmalloc_user-2.6.18.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/ipoib_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/ipoib_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/ipoib_skb_to_2_6_20.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/iw_cxgb3_0200_states.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 546 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/linux_genalloc_to_2_6_20.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/srp_cmd_to_2_6_22.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/2_misc_device_to_2_6_19.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/cxgb3_main_to_2_6_22.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/ehca_01_ibmebus_loc_code.patch -rw-r--r-- vlad/vlad 1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/ipath-04-aio_write.patch -rw-r--r-- vlad/vlad 14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/ipath-16-htirq-2.6.18.patch -rw-r--r-- vlad/vlad 4785 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/ipath-20-vmalloc_user-2.6.18.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/ipoib_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/ipoib_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/ipoib_skb_to_2_6_20.patch -rw-r--r-- vlad/vlad 49962 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/iscsi_01_sync_kernel_code_with_release_2.0-865.15.patch -rw-r--r-- vlad/vlad 593 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/iscsi_02_865_to_2_6_9-19.patch -rw-r--r-- vlad/vlad 768 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/iser_sync_with_open_iscsi_2.0-865.13.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/iw_cxgb3_0200_states.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 546 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/linux_genalloc_to_2_6_20.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/srp_cmd_to_2_6_22.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/cxgb3_main_to_2_6_22.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/ipath-04-aio_write.patch -rw-r--r-- vlad/vlad 14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/ipath-16-htirq-2.6.18.patch -rw-r--r-- vlad/vlad 4785 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/ipath-20-vmalloc_user-2.6.18.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/ipoib_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/ipoib_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/ipoib_skb_to_2_6_20.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 546 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/linux_genalloc_to_2_6_20.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/srp_cmd_to_2_6_22.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/2_misc_device_to_2_6_19.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/cxgb3_main_to_2_6_22.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/ehca_01_ibmebus_loc_code.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/ipoib_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/ipoib_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/ipoib_skb_to_2_6_20.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 546 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/linux_genalloc_to_2_6_20.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/srp_cmd_to_2_6_22.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/ -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/cxgb3_main_to_2_6_22.patch -rw-r--r-- vlad/vlad 5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/ehca_01_ibmebus_loc_code.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/ipoib_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/ipoib_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/ipoib_skb_to_2_6_20.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 546 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/linux_genalloc_to_2_6_20.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/srp_cmd_to_2_6_22.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/ -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 523 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/cxgb3_main_to_2_6_22.patch -rw-r--r-- vlad/vlad 5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/ehca_01_ibmebus_loc_code.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 524 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/ipoib_csum_offload_to_2.6.21.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1019 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/sdp_ia64.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/srp_cmd_to_2_6_22.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/ -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 523 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/cxgb3_main_to_2_6_22.patch -rw-r--r-- vlad/vlad 5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/ehca_01_ibmebus_loc_code.patch -rw-r--r-- vlad/vlad 5367 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/ipoib_to_2.6.23.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1019 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/sdp_ia64.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/srp_cmd_to_2_6_22.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/ -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 822 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 523 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/cxgb3_main_to_2_6_22.patch -rw-r--r-- vlad/vlad 5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/ehca_01_ibmebus_loc_code.patch -rw-r--r-- vlad/vlad 5367 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/ipoib_to_2.6.23.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1019 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/sdp_ia64.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/srp_cmd_to_2_6_22.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/ -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/core_sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 523 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/ehca_01_ibmebus_loc_code.patch -rw-r--r-- vlad/vlad 5367 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/ipoib_to_2.6.23.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 1019 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/sdp_ia64.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/srp_0200_revert_srp_transport_to_2.6.23.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1565 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/2_misc_device_to_2_6_9.patch -rw-r--r-- vlad/vlad 1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/addr_1_netevents_revert_to_2_6_17.patch -rw-r--r-- vlad/vlad 578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/addr_3926_to_2_6_13.patch -rw-r--r-- vlad/vlad 686 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/addr_4670_to_2_6_9.patch -rw-r--r-- vlad/vlad 344 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/amso1100_makefile_to_2_6_9.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/core_1sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 4827 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/core_4807_to_2_6_9U4.patch -rwxr-xr-x vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/core_ib_verbs_to_2_6_9.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/cxgb3_remove_eeh.patch -rw-r--r-- vlad/vlad 4218 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/cxio_hal_to_2.6.14.patch -rw-r--r-- vlad/vlad 2708 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-01-header.patch -rw-r--r-- vlad/vlad 1693 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-02-dont-leak-info-to-userspace.patch -rw-r--r-- vlad/vlad 3535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-03-iowrite32_copy.patch -rw-r--r-- vlad/vlad 1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-04-aio_write.patch -rw-r--r-- vlad/vlad 4258 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-05-page-hacks-2.6.14.patch -rw-r--r-- vlad/vlad 1420 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-06-page-hacks-2.6.9.patch -rw-r--r-- vlad/vlad 912 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-07-iounmap-2.6.9.patch -rw-r--r-- vlad/vlad 910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-08-fs-get_sb-2.6.17.patch -rw-r--r-- vlad/vlad 4543 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-09-sysfs-show-2.6.12.patch -rw-r--r-- vlad/vlad 558 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-10-rlimit-2.6.9.patch -rw-r--r-- vlad/vlad 980 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-13-class-2.6.9.patch -rw-r--r-- vlad/vlad 574 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-14-class-2.6.9_U4.patch -rw-r--r-- vlad/vlad 14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-16-htirq-2.6.18.patch -rw-r--r-- vlad/vlad 5818 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-19-remove-struct-device_attribute-attr-args.patch -rw-r--r-- vlad/vlad 1354 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-21-warnings.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipoib_0200_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipoib_0300_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipoib_to_2_6_16.patch -rw-r--r-- vlad/vlad 104942 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iscsi_01_sync_kernel_code_with_ofed_1_2_5.patch -rw-r--r-- vlad/vlad 6054 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iscsi_02_add_to_2_6_9.patch -rw-r--r-- vlad/vlad 2440 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iscsi_03_add_session_wq.patch -rw-r--r-- vlad/vlad 444 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iscsi_04_inet_sock_to_opt.patch -rw-r--r-- vlad/vlad 1622 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iscsi_05_release_host_lock_before_eh.patch -rw-r--r-- vlad/vlad 2362 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iscsi_06_scsi_addons.patch -rw-r--r-- vlad/vlad 1702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iser_01_revert_da9c0c770e775e655e3f77c96d91ee557b117adb.patch -rw-r--r-- vlad/vlad 585 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iser_02_revert_d8196ed2181b4595eaf464a5bcbddb6c28649a39.patch -rw-r--r-- vlad/vlad 3202 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iser_03_revert_1548271ece9e9312fd5feb41fd58773b56a71d39.patch -rw-r--r-- vlad/vlad 1297 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iser_04_revert_77a23c21aaa723f6b0ffc4a701be8c8e5a32346d.patch -rw-r--r-- vlad/vlad 683 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iser_05_revert_b2c6416736b847b91950bd43cc5153e11a1f83ee.patch -rw-r--r-- vlad/vlad 670 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iser_06_revert_857ae0bdb72999936a28ce621e38e2e288c485da.patch -rw-r--r-- vlad/vlad 637 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iser_07_revert_8ad5781ae9702a8f95cfdf30967752e4297613ee.patch -rw-r--r-- vlad/vlad 980 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iser_08_revert_0801c242a33426fddc005c2f559a3d2fa6fca7eb.patch -rw-r--r-- vlad/vlad 511 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iser_09_fix_inclusion_order.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iw_cxgb3_0200_states.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 1409 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iw_nes_200_to_2_6_13.patch -rw-r--r-- vlad/vlad 2001 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iw_nes_300_to_2_6_9.patch -rw-r--r-- vlad/vlad 401 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iwch_cm_to_2_6_9_U4.patch -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iwch_provider_to_2.6.9_U4.patch -rw-r--r-- vlad/vlad 1015 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/linux_stuff_to_2_6_17.patch -rw-r--r-- vlad/vlad 336 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/makefile_to_2_6_9.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 812 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/mlx4_compiler_warning.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 734 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/mthca_dev_3465_to_2_6_11.patch -rw-r--r-- vlad/vlad 9567 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/qlgc_vnic_sysfs_nested_class_dev.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 881 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/rds_to_2_6_9.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 2103 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/sdp_7277_to_2_6_11.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 783 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/srp_0300_include_linux_scatterlist_h.patch -rw-r--r-- vlad/vlad 2632 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/srp_7312_to_2_6_11.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/srp_cmd_to_2_6_22.patch -rwxr-xr-x vlad/vlad 976 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/srp_scsi_scan_target_7242_to_2_6_11.patch -rw-r--r-- vlad/vlad 1443 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/t3_hw_to_2_6_5-7_244.patch -rw-r--r-- vlad/vlad 628 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ucm_5245_to_2_6_9.patch -rw-r--r-- vlad/vlad 723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ucm_to_2_6_16.patch -rw-r--r-- vlad/vlad 702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ucma_to_2_6_16.patch -rw-r--r-- vlad/vlad 3036 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/user_mad_4603_to_2_6_9U4.patch -rw-r--r-- vlad/vlad 1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/user_mad_to_2_6_16.patch -rw-r--r-- vlad/vlad 1875 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/uverbs_main_3935_to_2_6_9U4.patch -rw-r--r-- vlad/vlad 1525 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/uverbs_to_2_6_16.patch -rw-r--r-- vlad/vlad 842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/uverbs_to_2_6_17.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1565 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/2_misc_device_to_2_6_9.patch -rw-r--r-- vlad/vlad 1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/addr_1_netevents_revert_to_2_6_17.patch -rw-r--r-- vlad/vlad 578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/addr_3926_to_2_6_13.patch -rw-r--r-- vlad/vlad 686 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/addr_4670_to_2_6_9.patch -rw-r--r-- vlad/vlad 344 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/amso1100_makefile_to_2_6_9.patch -rw-r--r-- vlad/vlad 3624 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/backport_ehca_1_2.6.9.patch -rw-r--r-- vlad/vlad 25896 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/backport_ehca_2_rhel45_umap.patch -rw-r--r-- vlad/vlad 7924 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/backport_ehca_3_rhel45_dma.patch -rw-r--r-- vlad/vlad 942 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/backport_ehca_4_rhel45_dma_fix.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/core_1sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 4827 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/core_4807_to_2_6_9U4.patch -rwxr-xr-x vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/core_ib_verbs_to_2_6_9.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 511 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/cxgb3_0500_is_valid_ether_addr.patch -rw-r--r-- vlad/vlad 1493 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/cxgb3_0600_simple_strtoul.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/cxgb3_remove_eeh.patch -rw-r--r-- vlad/vlad 4218 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/cxio_hal_to_2.6.14.patch -rw-r--r-- vlad/vlad 5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ehca_01_ibmebus_loc_code.patch -rw-r--r-- vlad/vlad 2708 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-01-header.patch -rw-r--r-- vlad/vlad 1693 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-02-dont-leak-info-to-userspace.patch -rw-r--r-- vlad/vlad 3535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-03-iowrite32_copy.patch -rw-r--r-- vlad/vlad 1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-04-aio_write.patch -rw-r--r-- vlad/vlad 4258 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-05-page-hacks-2.6.14.patch -rw-r--r-- vlad/vlad 1420 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-06-page-hacks-2.6.9.patch -rw-r--r-- vlad/vlad 912 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-07-iounmap-2.6.9.patch -rw-r--r-- vlad/vlad 910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-08-fs-get_sb-2.6.17.patch -rw-r--r-- vlad/vlad 4543 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-09-sysfs-show-2.6.12.patch -rw-r--r-- vlad/vlad 558 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-10-rlimit-2.6.9.patch -rw-r--r-- vlad/vlad 980 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-13-class-2.6.9.patch -rw-r--r-- vlad/vlad 574 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-14-class-2.6.9_U4.patch -rw-r--r-- vlad/vlad 14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-16-htirq-2.6.18.patch -rw-r--r-- vlad/vlad 5818 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-19-remove-struct-device_attribute-attr-args.patch -rw-r--r-- vlad/vlad 1354 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-21-warnings.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipoib_0200_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipoib_0300_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipoib_to_2_6_16.patch -rw-r--r-- vlad/vlad 104942 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iscsi_01_sync_kernel_code_with_ofed_1_2_5.patch -rw-r--r-- vlad/vlad 6054 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iscsi_02_add_to_2_6_9.patch -rw-r--r-- vlad/vlad 2440 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iscsi_03_add_session_wq.patch -rw-r--r-- vlad/vlad 444 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iscsi_04_inet_sock_to_opt.patch -rw-r--r-- vlad/vlad 1622 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iscsi_05_release_host_lock_before_eh.patch -rw-r--r-- vlad/vlad 2362 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iscsi_06_scsi_addons.patch -rw-r--r-- vlad/vlad 1702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iser_01_revert_da9c0c770e775e655e3f77c96d91ee557b117adb.patch -rw-r--r-- vlad/vlad 585 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iser_02_revert_d8196ed2181b4595eaf464a5bcbddb6c28649a39.patch -rw-r--r-- vlad/vlad 3202 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iser_03_revert_1548271ece9e9312fd5feb41fd58773b56a71d39.patch -rw-r--r-- vlad/vlad 1297 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iser_04_revert_77a23c21aaa723f6b0ffc4a701be8c8e5a32346d.patch -rw-r--r-- vlad/vlad 683 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iser_05_revert_b2c6416736b847b91950bd43cc5153e11a1f83ee.patch -rw-r--r-- vlad/vlad 670 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iser_06_revert_857ae0bdb72999936a28ce621e38e2e288c485da.patch -rw-r--r-- vlad/vlad 637 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iser_07_revert_8ad5781ae9702a8f95cfdf30967752e4297613ee.patch -rw-r--r-- vlad/vlad 980 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iser_08_revert_0801c242a33426fddc005c2f559a3d2fa6fca7eb.patch -rw-r--r-- vlad/vlad 511 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iser_09_fix_inclusion_order.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iw_cxgb3_0200_states.patch -rw-r--r-- vlad/vlad 545 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iw_cxgb3_0300_idr.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 1409 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iw_nes_200_to_2_6_13.patch -rw-r--r-- vlad/vlad 2001 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iw_nes_300_to_2_6_9.patch -rw-r--r-- vlad/vlad 401 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iwch_cm_to_2_6_9_U4.patch -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iwch_provider_to_2.6.9_U4.patch -rw-r--r-- vlad/vlad 1015 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/linux_stuff_to_2_6_17.patch -rw-r--r-- vlad/vlad 336 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/makefile_to_2_6_9.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 812 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/mlx4_compiler_warning.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 734 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/mthca_dev_3465_to_2_6_11.patch -rw-r--r-- vlad/vlad 9567 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/qlgc_vnic_sysfs_nested_class_dev.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 881 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/rds_to_2_6_9.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 2103 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/sdp_7277_to_2_6_11.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 783 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/srp_0300_include_linux_scatterlist_h.patch -rw-r--r-- vlad/vlad 2632 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/srp_7312_to_2_6_11.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/srp_cmd_to_2_6_22.patch -rwxr-xr-x vlad/vlad 976 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/srp_scsi_scan_target_7242_to_2_6_11.patch -rw-r--r-- vlad/vlad 1443 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/t3_hw_to_2_6_5-7_244.patch -rw-r--r-- vlad/vlad 628 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ucm_5245_to_2_6_9.patch -rw-r--r-- vlad/vlad 723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ucm_to_2_6_16.patch -rw-r--r-- vlad/vlad 702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ucma_to_2_6_16.patch -rw-r--r-- vlad/vlad 3036 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/user_mad_4603_to_2_6_9U4.patch -rw-r--r-- vlad/vlad 1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/user_mad_to_2_6_16.patch -rw-r--r-- vlad/vlad 1875 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/uverbs_main_3935_to_2_6_9U4.patch -rw-r--r-- vlad/vlad 1525 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/uverbs_to_2_6_16.patch -rw-r--r-- vlad/vlad 842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/uverbs_to_2_6_17.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ -rw-r--r-- vlad/vlad 2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/1_struct_path_revert_to_2_6_19.patch -rw-r--r-- vlad/vlad 1565 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/2_misc_device_to_2_6_9.patch -rw-r--r-- vlad/vlad 1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/addr_1_netevents_revert_to_2_6_17.patch -rw-r--r-- vlad/vlad 578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/addr_3926_to_2_6_13.patch -rw-r--r-- vlad/vlad 686 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/addr_4670_to_2_6_9.patch -rw-r--r-- vlad/vlad 344 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/amso1100_makefile_to_2_6_9.patch -rw-r--r-- vlad/vlad 3624 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/backport_ehca_1_2.6.9.patch -rw-r--r-- vlad/vlad 25896 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/backport_ehca_2_rhel45_umap.patch -rw-r--r-- vlad/vlad 7924 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/backport_ehca_3_rhel45_dma.patch -rw-r--r-- vlad/vlad 942 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/backport_ehca_4_rhel45_dma_fix.patch -rw-r--r-- vlad/vlad 950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/core_1sysfs_to_2_6_23.patch -rw-r--r-- vlad/vlad 4827 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/core_4807_to_2_6_9U4.patch -rwxr-xr-x vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/core_ib_verbs_to_2_6_9.patch -rw-r--r-- vlad/vlad 8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/cxg3_to_2_6_20.patch -rw-r--r-- vlad/vlad 18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/cxgb3_0100_napi.patch -rw-r--r-- vlad/vlad 897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/cxgb3_0200_sset.patch -rw-r--r-- vlad/vlad 535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/cxgb3_0300_sysfs.patch -rw-r--r-- vlad/vlad 511 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/cxgb3_0500_is_valid_ether_addr.patch -rw-r--r-- vlad/vlad 1493 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/cxgb3_0600_simple_strtoul.patch -rw-r--r-- vlad/vlad 301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/cxgb3_makefile_to_2_6_19.patch -rw-r--r-- vlad/vlad 3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/cxgb3_remove_eeh.patch -rw-r--r-- vlad/vlad 4218 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/cxio_hal_to_2.6.14.patch -rw-r--r-- vlad/vlad 5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ehca_01_ibmebus_loc_code.patch -rw-r--r-- vlad/vlad 2708 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-01-header.patch -rw-r--r-- vlad/vlad 1693 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-02-dont-leak-info-to-userspace.patch -rw-r--r-- vlad/vlad 3535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-03-iowrite32_copy.patch -rw-r--r-- vlad/vlad 1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-04-aio_write.patch -rw-r--r-- vlad/vlad 4258 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-05-page-hacks-2.6.14.patch -rw-r--r-- vlad/vlad 1420 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-06-page-hacks-2.6.9.patch -rw-r--r-- vlad/vlad 912 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-07-iounmap-2.6.9.patch -rw-r--r-- vlad/vlad 910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-08-fs-get_sb-2.6.17.patch -rw-r--r-- vlad/vlad 4543 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-09-sysfs-show-2.6.12.patch -rw-r--r-- vlad/vlad 558 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-10-rlimit-2.6.9.patch -rw-r--r-- vlad/vlad 980 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-13-class-2.6.9.patch -rw-r--r-- vlad/vlad 574 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-14-class-2.6.9_U4.patch -rw-r--r-- vlad/vlad 14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-16-htirq-2.6.18.patch -rw-r--r-- vlad/vlad 5818 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-19-remove-struct-device_attribute-attr-args.patch -rw-r--r-- vlad/vlad 1354 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-21-warnings.patch -rw-r--r-- vlad/vlad 14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipoib_0100_to_2.6.21.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipoib_0110_restore_get_stats.patch -rw-r--r-- vlad/vlad 6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipoib_0200_class_device_to_2_6_20.patch -rw-r--r-- vlad/vlad 1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipoib_0300_class_device_to_2_6_20_umcast.patch -rw-r--r-- vlad/vlad 2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipoib_to_2_6_16.patch -rw-r--r-- vlad/vlad 104942 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iscsi_01_sync_kernel_code_with_ofed_1_2_5.patch -rw-r--r-- vlad/vlad 6054 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iscsi_02_add_to_2_6_9.patch -rw-r--r-- vlad/vlad 2440 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iscsi_03_add_session_wq.patch -rw-r--r-- vlad/vlad 444 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iscsi_04_inet_sock_to_opt.patch -rw-r--r-- vlad/vlad 1622 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iscsi_05_release_host_lock_before_eh.patch -rw-r--r-- vlad/vlad 2362 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iscsi_06_scsi_addons.patch -rw-r--r-- vlad/vlad 1702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iser_01_revert_da9c0c770e775e655e3f77c96d91ee557b117adb.patch -rw-r--r-- vlad/vlad 585 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iser_02_revert_d8196ed2181b4595eaf464a5bcbddb6c28649a39.patch -rw-r--r-- vlad/vlad 3202 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iser_03_revert_1548271ece9e9312fd5feb41fd58773b56a71d39.patch -rw-r--r-- vlad/vlad 1297 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iser_04_revert_77a23c21aaa723f6b0ffc4a701be8c8e5a32346d.patch -rw-r--r-- vlad/vlad 683 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iser_05_revert_b2c6416736b847b91950bd43cc5153e11a1f83ee.patch -rw-r--r-- vlad/vlad 670 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iser_06_revert_857ae0bdb72999936a28ce621e38e2e288c485da.patch -rw-r--r-- vlad/vlad 637 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iser_07_revert_8ad5781ae9702a8f95cfdf30967752e4297613ee.patch -rw-r--r-- vlad/vlad 980 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iser_08_revert_0801c242a33426fddc005c2f559a3d2fa6fca7eb.patch -rw-r--r-- vlad/vlad 511 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iser_09_fix_inclusion_order.patch -rw-r--r-- vlad/vlad 699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iw_cxgb3_0100_namespace.patch -rw-r--r-- vlad/vlad 495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iw_cxgb3_0200_states.patch -rw-r--r-- vlad/vlad 545 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iw_cxgb3_0300_idr.patch -rw-r--r-- vlad/vlad 4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iw_nes_100_to_2_6_23.patch -rw-r--r-- vlad/vlad 1409 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iw_nes_200_to_2_6_13.patch -rw-r--r-- vlad/vlad 2001 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iw_nes_300_to_2_6_9.patch -rw-r--r-- vlad/vlad 401 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iwch_cm_to_2_6_9_U4.patch -rw-r--r-- vlad/vlad 588 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iwch_provider_to_2.6.9_U4.patch -rw-r--r-- vlad/vlad 1015 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/linux_stuff_to_2_6_17.patch -rw-r--r-- vlad/vlad 336 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/makefile_to_2_6_9.patch -rw-r--r-- vlad/vlad 341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/mlx4_0050_wc.patch -rw-r--r-- vlad/vlad 812 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/mlx4_compiler_warning.patch -rw-r--r-- vlad/vlad 1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/mthca_0001_pcix_to_2_6_22.patch -rw-r--r-- vlad/vlad 734 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/mthca_dev_3465_to_2_6_11.patch -rw-r--r-- vlad/vlad 9567 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/qlgc_vnic_sysfs_nested_class_dev.patch -rw-r--r-- vlad/vlad 1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/rds_to_2_6_20.patch -rw-r--r-- vlad/vlad 881 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/rds_to_2_6_9.patch -rw-r--r-- vlad/vlad 2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/sdp_0100_revert_to_2_6_23.patch -rw-r--r-- vlad/vlad 2103 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/sdp_7277_to_2_6_11.patch -rw-r--r-- vlad/vlad 1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/srp_0100_revert_role_to_2_6_23.patch -rw-r--r-- vlad/vlad 4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/srp_0200_revert_srp_transport_to_2.6.23.patch -rw-r--r-- vlad/vlad 783 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/srp_0300_include_linux_scatterlist_h.patch -rw-r--r-- vlad/vlad 2632 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/srp_7312_to_2_6_11.patch -rw-r--r-- vlad/vlad 2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/srp_cmd_to_2_6_22.patch -rwxr-xr-x vlad/vlad 976 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/srp_scsi_scan_target_7242_to_2_6_11.patch -rw-r--r-- vlad/vlad 1443 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/t3_hw_to_2_6_5-7_244.patch -rw-r--r-- vlad/vlad 628 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ucm_5245_to_2_6_9.patch -rw-r--r-- vlad/vlad 723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ucm_to_2_6_16.patch -rw-r--r-- vlad/vlad 702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ucma_to_2_6_16.patch -rw-r--r-- vlad/vlad 3036 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/user_mad_4603_to_2_6_9U4.patch -rw-r--r-- vlad/vlad 1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/user_mad_to_2_6_16.patch -rw-r--r-- vlad/vlad 1875 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/uverbs_main_3935_to_2_6_9U4.patch -rw-r--r-- vlad/vlad 1525 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/uverbs_to_2_6_16.patch -rw-r--r-- vlad/vlad 842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/uverbs_to_2_6_17.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ -rwxr-xr-x vlad/vlad 1865 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cma_0010_response_timeout.patch -rw-r--r-- vlad/vlad 1591 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cma_0020__iwcm_ordird.patch -rw-r--r-- vlad/vlad 1329 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cma_0030_tavor_quirk.patch -rw-r--r-- vlad/vlad 1268 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cma_0040_re-enable-device-removal.patch -rw-r--r-- vlad/vlad 2303 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cma_0050_rcma_cma_mra.patch -rw-r--r-- vlad/vlad 2818 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cma_established1.patch -rw-r--r-- vlad/vlad 2609 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0010_dma_map_sg.patch -rw-r--r-- vlad/vlad 1462 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0020_csum.patch -rw-r--r-- vlad/vlad 1711 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0025_qp_create_flags.patch -rw-r--r-- vlad/vlad 1731 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0030_lso.patch -rw-r--r-- vlad/vlad 2050 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0040_modify_cq.patch -rw-r--r-- vlad/vlad 23821 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0050_xrc.patch -rw-r--r-- vlad/vlad 14035 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0060_xrc_file_desc.patch -rw-r--r-- vlad/vlad 5473 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0080_kernel_xrc.patch -rw-r--r-- vlad/vlad 1171 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0090_core_delete_redundant_check_for_DR_SMP.patch -rw-r--r-- vlad/vlad 1685 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0100_core_Dont_modify_outgoing_DR_SMP_if_first_pa.patch -rw-r--r-- vlad/vlad 20581 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0110_xrc_rcv.patch -rw-r--r-- vlad/vlad 753 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0010_MSI-X_failure_path.patch -rw-r--r-- vlad/vlad 1611 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0020_Use_wild_card_for_PCI_subdevice_ID_match.patch -rw-r--r-- vlad/vlad 397 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_00300_add_ofed_version_tag.patch -rw-r--r-- vlad/vlad 1223 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0030_Fix_resources_release.patch -rw-r--r-- vlad/vlad 3642 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0040_Add_EEH_support.patch -rw-r--r-- vlad/vlad 1679 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0050_FW_upgrade.patch -rw-r--r-- vlad/vlad 7199 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0060_fix_interaction_with_pktgen.patch -rw-r--r-- vlad/vlad 3612 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0070_sysfs_methods_clean_up.patch -rw-r--r-- vlad/vlad 5234 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0080_HW_set_up_updates.patch -rw-r--r-- vlad/vlad 1952 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0090_Fix_I-O_synchronization.patch -rw-r--r-- vlad/vlad 8617 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0100_trim_trailing_whitespace.patch -rw-r--r-- vlad/vlad 30508 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0210_Parity_initialization_for_T3C_adapters.patch -rw-r--r-- vlad/vlad 2515 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0220_Fix_EEH_missing_softirq_blocking.patch -rw-r--r-- vlad/vlad 1015 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0230_Handle_ARP_completions_that_mark_neighbors_stale.patch -rw-r--r-- vlad/vlad 2354 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0001_Add_missing_spaces_in_the_middle_of_format.patch -rw-r--r-- vlad/vlad 2103 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0002_Forward_event_client_reregister_required.patch -rw-r--r-- vlad/vlad 1063 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0003_Use_round_jiffies_for_EQ_polling_timer.patch -rw-r--r-- vlad/vlad 1106 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0004_Remove_CQ_QP_link_before_destroying_QP.patch -rw-r--r-- vlad/vlad 1908 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0005_Define_array_to_store_SMI_GSI_QPs.patch -rw-r--r-- vlad/vlad 14839 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0006_Add_port_connection_autodetect_mode.patch -rw-r--r-- vlad/vlad 9231 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0007_Prevent_RDMA_related_connection_failures.patch -rw-r--r-- vlad/vlad 614 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0008_Prevent_sending_ud_packets_to_qp0.patch -rw-r--r-- vlad/vlad 935 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0009_Update_sma_attr_also_in_case_of_disruptive.patch -rw-r--r-- vlad/vlad 5963 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0010_Add_PMA_support.patch -rw-r--r-- vlad/vlad 986 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0011_Alloc_firmware_context_with_GFP_ATOMIC.patch -rw-r--r-- vlad/vlad 997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0012_Change_version_number.patch -rw-r--r-- vlad/vlad 3246 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath-22-memcpy_cachebypass.patch -rw-r--r-- vlad/vlad 1324 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0030_improve_interrupt_handler_cache_footprin.patch -rw-r--r-- vlad/vlad 4421 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0040_convert_the_semaphore_ipath_eep_s.patch -rw-r--r-- vlad/vlad 2921 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0050_remove_dead_code_for_user_process_waiting.patch -rw-r--r-- vlad/vlad 11481 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0060_fix_sendctrl_locking.patch -rw-r--r-- vlad/vlad 1132 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0070_fix_return_error_number_for_ib_resize_cq.patch -rw-r--r-- vlad/vlad 1129 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0080_fix_comments_for_ipath_create_srq.patch -rw-r--r-- vlad/vlad 1353 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0090_better_comment_for_rmb_in_ipath_intr.patch -rw-r--r-- vlad/vlad 1148 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0100_add_the_work_completion_error_code_to_the.patch -rw-r--r-- vlad/vlad 1371 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0120_enable_loopback_of_DR_SMP_responses_from.patch -rw-r--r-- vlad/vlad 2906 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0130_fix_RNR_NAK_handling.patch -rw-r--r-- vlad/vlad 1482 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0140_cleanup_ipath_get_egrbuf.patch -rw-r--r-- vlad/vlad 10304 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0150_kreceive_uses_portdata_rather_than_devdat.patch -rw-r--r-- vlad/vlad 8155 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0160_generalize_some_macros_SHIFT.patch -rw-r--r-- vlad/vlad 6446 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0170_changes_for_fields_moving_from_devdata_to.patch -rw-r--r-- vlad/vlad 50685 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0180_header_file_changes_to_support_IBA7220.patch -rw-r--r-- vlad/vlad 3290 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0190_isolate_7220_specific_content.patch -rw-r--r-- vlad/vlad 86870 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0200_HCA_specific_code_to_support_IBA7220.patch -rw-r--r-- vlad/vlad 33032 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0210_support_for_SerDes_portion_of_IBA7220.patch -rw-r--r-- vlad/vlad 57778 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0220_add_IBA7220_specific_initialization_data.patch -rw-r--r-- vlad/vlad 23073 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0230_add_code_for_IBA7220_send_DMA.patch -rw-r--r-- vlad/vlad 3273 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0240_user_mode_send_DMA_header_file.patch -rw-r--r-- vlad/vlad 22812 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0250_user_mode_send_DMA.patch -rw-r--r-- vlad/vlad 86395 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0260_remaining_7220_changes_to_headers_and_af.patch -rw-r--r-- vlad/vlad 3749 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0270_misc_changes_to_prepare_for_iba7220_intro.patch -rw-r--r-- vlad/vlad 1718 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0280_cancel_send_DMA_buffers.patch -rw-r--r-- vlad/vlad 39087 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0290_changes_to_IB_link_state_machine_handling.patch -rw-r--r-- vlad/vlad 49741 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0300_error_handling_improvements_debuggabilit.patch -rw-r--r-- vlad/vlad 15460 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0310_eeprom_support_for_7220_devices_robustne.patch -rw-r--r-- vlad/vlad 8993 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0320_enable_use_of_4KB_MTU_via_module_paramate.patch -rw-r--r-- vlad/vlad 7967 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0330_infrastructure_updates_for_sdma_support.patch -rw-r--r-- vlad/vlad 6918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0370_enable_sdma_for_user_programs.patch -rw-r--r-- vlad/vlad 4725 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0340_changes_to_support_PIO_bandwidth_check_on.patch -rw-r--r-- vlad/vlad 44086 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0350_add_remaining_small_pieces_of_7220_suppor.patch -rw-r--r-- vlad/vlad 41192 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0360_misc_changes_related_to_the_iba7220.patch -rw-r--r-- vlad/vlad 3632 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0380_set_ipath_lbus_info_where_bus_parameters.patch -rw-r--r-- vlad/vlad 5111 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0390_fix_IB_compliance_problems_with_link_stat.patch -rw-r--r-- vlad/vlad 1863 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0400_set_static_rate_and_VL15_flags_for_IBA722.patch -rw-r--r-- vlad/vlad 62922 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0410_update.patch -rw-r--r-- vlad/vlad 7759 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0420_ipoib_4k_mtu.patch -rw-r--r-- vlad/vlad 871 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0430_dapl_rdma_read.patch -rw-r--r-- vlad/vlad 973 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0010_Add-high-dma-support-to-ipoib.patch -rw-r--r-- vlad/vlad 9243 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0020_Add-s-g-support-for-IPOIB.patch -rw-r--r-- vlad/vlad 4649 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0040_checksum-offload.patch -rw-r--r-- vlad/vlad 10194 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0050_Add-LSO-support.patch -rw-r--r-- vlad/vlad 4567 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0060_ethtool-support.patch -rw-r--r-- vlad/vlad 3494 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0070_modiy_cq_params.patch -rw-r--r-- vlad/vlad 1295 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0110_set_default_cq_patams.patch -rw-r--r-- vlad/vlad 1826 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0120_check_grat_arp_with_cm.patch -rw-r--r-- vlad/vlad 10825 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0180_split_cq.patch -rw-r--r-- vlad/vlad 13253 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0190_unsig_udqp.patch -rw-r--r-- vlad/vlad 20425 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0200_non_srq.patch -rw-r--r-- vlad/vlad 2882 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0210_draft_wr.patch -rw-r--r-- vlad/vlad 4898 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0220_ud_post_list.patch -rw-r--r-- vlad/vlad 6181 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0230_srq_post_n.patch -rw-r--r-- vlad/vlad 14279 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0240_4kmtu.patch -rw-r--r-- vlad/vlad 1290 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0250_non_srq_param.patch -rw-r--r-- vlad/vlad 970 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0260_pkey_change.patch -rw-r--r-- vlad/vlad 596 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0270_remove_alloc.patch -rw-r--r-- vlad/vlad 7229 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0280_vmap.patch -rw-r--r-- vlad/vlad 3001 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0290_reduce_cm_tx.patch -rw-r--r-- vlad/vlad 2061 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0300_reap.patch -rw-r--r-- vlad/vlad 1011 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0310_def_ring_sizes.patch -rw-r--r-- vlad/vlad 1894 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0320_small_skb_copy.patch -rw-r--r-- vlad/vlad 839 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0330_child_mtu.patch -rw-r--r-- vlad/vlad 2299 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_selector_updated.patch -rw-r--r-- vlad/vlad 1030 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/iser_01_Print_information_about_unhandled_RDMA_CM_events.patch -rw-r--r-- vlad/vlad 1570 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0020_Hold_rtnl_lock_around_ethtool_get_drvinfo_call.patch -rw-r--r-- vlad/vlad 1996 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0030_Support_version_5.0_firmware.patch -rw-r--r-- vlad/vlad 1968 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0040_Flush_the_RQ_when_closing.patch -rw-r--r-- vlad/vlad 1284 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0050_fix_page_shift_calculation.patch -rw-r--r-- vlad/vlad 1385 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0060_Mark_qp_as_privileged.patch -rw-r--r-- vlad/vlad 2550 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0070_Fix_the_T3A_workaround_checks.patch -rw-r--r-- vlad/vlad 3372 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mad_0010_enable_loopback_of_DR_SMP_responses_from_use.patch -rw-r--r-- vlad/vlad 8923 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0010_add_wc.patch -rw-r--r-- vlad/vlad 1131 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0015_set_cacheline_sz.patch -rw-r--r-- vlad/vlad 1066 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0020_cmd_tout.patch -rw-r--r-- vlad/vlad 5101 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0030_checksum_offload.patch -rw-r--r-- vlad/vlad 724 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0040_qp_max_msg.patch -rw-r--r-- vlad/vlad 2354 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0045_qp_flags.patch -rw-r--r-- vlad/vlad 9394 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0050_lso.patch -rw-r--r-- vlad/vlad 6385 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0060_modify_cq.patch -rw-r--r-- vlad/vlad 24057 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0070_xrc.patch -rw-r--r-- vlad/vlad 3191 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0080_profile_parm.patch -rw-r--r-- vlad/vlad 5511 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0090_fix_sq_wrs.patch -rw-r--r-- vlad/vlad 9449 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0120_xrc_kernel.patch -rw-r--r-- vlad/vlad 1866 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0125_xrc_kernel_missed.patch -rw-r--r-- vlad/vlad 793 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0150_increase_default_qp.patch -rw-r--r-- vlad/vlad 17818 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0170_shrinking_wqe.patch -rw-r--r-- vlad/vlad 903 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0180_max_eqs.patch -rw-r--r-- vlad/vlad 1042 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0190_bogus_qp_event.patch -rw-r--r-- vlad/vlad 15928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0210_xrc_rcv.patch -rw-r--r-- vlad/vlad 1432 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0220_enable_qos.patch -rw-r--r-- vlad/vlad 2956 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0230_hw_id.patch -rw-r--r-- vlad/vlad 1570 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0240_optimize_poll.patch -rw-r--r-- vlad/vlad 14258 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0250_debug_output.patch -rw-r--r-- vlad/vlad 1571 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0260_optimze_stamping.patch -rw-r--r-- vlad/vlad 2156 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0270_fmr_enable.patch -rw-r--r-- vlad/vlad 8901 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0280_diag_counters_sysfs.patch -rw-r--r-- vlad/vlad 3447 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0290_mcast_loopback.patch -rw-r--r-- vlad/vlad 971 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0300_bogus_qp.patch -rw-r--r-- vlad/vlad 1245 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0310_date_version.patch -rw-r--r-- vlad/vlad 599 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0001_catas_wqueue_namelen.patch -rw-r--r-- vlad/vlad 2838 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0002_wrid_swap.patch -rw-r--r-- vlad/vlad 5646 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0003_checksum_offload.patch -rw-r--r-- vlad/vlad 4721 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0004_prelink_wqes.patch -rw-r--r-- vlad/vlad 3037 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0005_hw_ver.patch -rw-r--r-- vlad/vlad 3043 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0006_page_size_calc.patch -rw-r--r-- vlad/vlad 1450 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0007_fmr_alloc_error.patch -rw-r--r-- vlad/vlad 709 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0008_roland_fmr_alloc_fix.patch -rw-r--r-- vlad/vlad 971 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0009_sg_init_table.patch -rw-r--r-- vlad/vlad 920 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0010_bogus_qp.patch -rw-r--r-- vlad/vlad 705 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0011_date_version.patch -rw-r--r-- vlad/vlad 1484 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/sean_cm_flush_workqueue.patch -rw-r--r-- vlad/vlad 5666 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/sean_cm_limit_mra_timeout.patch -rw-r--r-- vlad/vlad 39321 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/sean_local_sa_1_notifications.patch -rw-r--r-- vlad/vlad 41248 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/sean_local_sa_2_cache.patch -rw-r--r-- vlad/vlad 773 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/sean_local_sa_3_disable.patch -rw-r--r-- vlad/vlad 1211 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/sean_local_sa_4_fix_hang.patch -rw-r--r-- vlad/vlad 2640 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/srp_1_recreate_at_reconnect.patch -rwxr-xr-x vlad/vlad 883 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/srp_2_disconnect_without_wait.patch -rwxr-xr-x vlad/vlad 2506 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/srp_3_qp_err_timer_reconnect_target.patch -rw-r--r-- vlad/vlad 2897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/srp_4_respect_target_credit_limit.patch -rw-r--r-- vlad/vlad 12540 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/srp_5_add_info_to_log_messages.patch -rw-r--r-- vlad/vlad 4419 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/srp_6_retry_stale_connections.patch -rw-r--r-- vlad/vlad 972 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/uverbs_warning.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/hpage_patches/ -rw-r--r-- vlad/vlad 5418 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/hpage_patches/hpages.patch drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/lib/ -rw-r--r-- vlad/vlad 7010 2008-02-28 09:59:53 ofa_kernel-1.3/lib/klist.c drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/net/ drwxr-xr-x vlad/vlad 0 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ -rw-r--r-- vlad/vlad 311 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/Kconfig -rw-r--r-- vlad/vlad 569 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/Makefile -rw-r--r-- vlad/vlad 15905 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/af_rds.c -rw-r--r-- vlad/vlad 4934 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/bind.c -rw-r--r-- vlad/vlad 11842 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/cong.c -rw-r--r-- vlad/vlad 12563 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/connection.c -rw-r--r-- vlad/vlad 6593 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ib.c -rw-r--r-- vlad/vlad 7355 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ib.h -rw-r--r-- vlad/vlad 19290 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ib_cm.c -rw-r--r-- vlad/vlad 13630 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ib_rdma.c -rw-r--r-- vlad/vlad 6879 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ib_rds.h -rw-r--r-- vlad/vlad 27878 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ib_recv.c -rw-r--r-- vlad/vlad 4913 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ib_ring.c -rw-r--r-- vlad/vlad 21049 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ib_send.c -rw-r--r-- vlad/vlad 2749 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ib_stats.c -rw-r--r-- vlad/vlad 4952 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ib_sysctl.c -rw-r--r-- vlad/vlad 6596 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/info.c -rw-r--r-- vlad/vlad 1308 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/info.h -rw-r--r-- vlad/vlad 4418 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/loop.c -rw-r--r-- vlad/vlad 111 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/loop.h -rw-r--r-- vlad/vlad 9446 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/message.c -rw-r--r-- vlad/vlad 5915 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/page.c -rw-r--r-- vlad/vlad 16448 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/rdma.c -rw-r--r-- vlad/vlad 1907 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/rdma.h -rw-r--r-- vlad/vlad 22289 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/rds.h -rw-r--r-- vlad/vlad 14718 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/recv.c -rw-r--r-- vlad/vlad 23194 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/send.c -rw-r--r-- vlad/vlad 4223 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/stats.c -rw-r--r-- vlad/vlad 4491 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/sysctl.c -rw-r--r-- vlad/vlad 8073 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/tcp.c -rw-r--r-- vlad/vlad 2801 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/tcp.h -rw-r--r-- vlad/vlad 4243 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/tcp_connect.c -rw-r--r-- vlad/vlad 5405 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/tcp_listen.c -rw-r--r-- vlad/vlad 9580 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/tcp_recv.c -rw-r--r-- vlad/vlad 7626 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/tcp_send.c -rw-r--r-- vlad/vlad 2425 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/tcp_stats.c -rw-r--r-- vlad/vlad 8234 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/threads.c -rw-r--r-- vlad/vlad 4410 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/transport.c lrwxrwxrwx vlad/vlad 0 2008-02-28 09:59:56 ofa_kernel-1.3/configure -> ofed_scripts/configure lrwxrwxrwx vlad/vlad 0 2008-02-28 09:59:56 ofa_kernel-1.3/Makefile -> ofed_scripts/Makefile lrwxrwxrwx vlad/vlad 0 2008-02-28 09:59:56 ofa_kernel-1.3/makefile -> ofed_scripts/makefile -rw-r--r-- vlad/vlad 114 2008-02-28 09:59:54 ofa_kernel-1.3/BUILD_ID + STATUS=0 + '[' 0 -ne 0 ']' + cd ofa_kernel-1.3 ++ /usr/bin/id -u + '[' 0 = 0 ']' + /bin/chown -Rhf root . ++ /usr/bin/id -u + '[' 0 = 0 ']' + /bin/chgrp -Rhf root . + /bin/chmod -Rf a+rX,u+w,g-w,o-w . + exit 0 Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.52212 + umask 022 + cd /var/tmp/OFED_topdir/BUILD + /bin/rm -rf /var/tmp/OFED ++ dirname /var/tmp/OFED + /bin/mkdir -p /var/tmp + /bin/mkdir /var/tmp/OFED + cd ofa_kernel-1.3 + rm -rf /var/tmp/OFED + cd /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3 + mkdir -p /var/tmp/OFED//usr/local/ofed-1.3/src + cp -a /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3 /var/tmp/OFED//usr/local/ofed-1.3/src + ./configure --prefix=/usr/local/ofed-1.3 --kernel-version 2.6.16-54-0.2.5_lustre.1.6.4.3smp --kernel-sources /lib/modules/2.6.16-54-0.2.5_lustre.1.6.4.3smp/build --modules-dir /lib/modules/2.6.16-54-0.2.5_lustre.1.6.4.3smp/updates --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod --with-mthca-mod --with-mlx4-mod --with-cxgb3-mod --with-nes-mod --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-srp-target-mod --with-rds-mod --with-qlgc_vnic-mod ofed_patch.mk does not exist. running ofed_patch.sh /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/ofed_scripts/ofed_patch.sh --kernel-version 2.6.16-54-0.2.5_lustre.1.6.4.3smp Quilt does not exist... Going to use patch. mkdir -p /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/patches touch /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/patches/quiltrc /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cma_0010_response_timeout.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 54 with fuzz 2 (offset -4 lines). Hunk #2 succeeded at 2179 (offset 18 lines). Hunk #3 succeeded at 2238 (offset 18 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cma_0020__iwcm_ordird.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 1266 with fuzz 1 (offset 129 lines). Hunk #2 succeeded at 1316 (offset 126 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cma_0030_tavor_quirk.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 50 with fuzz 1 (offset 2 lines). Hunk #2 succeeded at 1562 with fuzz 2 (offset 435 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cma_0040_re-enable-device-removal.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 1130 (offset 8 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cma_0050_rcma_cma_mra.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 1108 (offset 1 line). Hunk #2 succeeded at 1130 (offset 1 line). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cma_established1.patch patching file drivers/infiniband/ulp/sdp/sdp.h Hunk #1 succeeded at 152 with fuzz 1 (offset 24 lines). patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c Hunk #1 succeeded at 764 (offset 265 lines). patching file drivers/infiniband/ulp/sdp/sdp_cma.c Hunk #1 succeeded at 162 (offset 5 lines). Hunk #2 succeeded at 294 with fuzz 2 (offset 24 lines). patching file drivers/infiniband/ulp/sdp/sdp_main.c Hunk #1 succeeded at 759 (offset 196 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0010_dma_map_sg.patch patching file drivers/infiniband/core/device.c patching file drivers/infiniband/core/umem.c Hunk #1 succeeded at 46 with fuzz 2 (offset 6 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0020_csum.patch patching file include/rdma/ib_verbs.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0025_qp_create_flags.patch patching file drivers/infiniband/core/uverbs_cmd.c patching file include/rdma/ib_verbs.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0030_lso.patch patching file include/rdma/ib_verbs.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0040_modify_cq.patch patching file include/rdma/ib_verbs.h Hunk #1 succeeded at 984 (offset 10 lines). Hunk #2 succeeded at 1391 (offset 10 lines). patching file drivers/infiniband/core/verbs.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0050_xrc.patch patching file drivers/infiniband/core/uverbs_main.c patching file include/rdma/ib_user_verbs.h patching file drivers/infiniband/core/uverbs_cmd.c patching file drivers/infiniband/core/verbs.c patching file include/rdma/ib_verbs.h patching file drivers/infiniband/core/uverbs.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0060_xrc_file_desc.patch patching file drivers/infiniband/core/uverbs_cmd.c Hunk #5 succeeded at 1151 (offset 1 line). Hunk #6 succeeded at 1179 (offset 1 line). Hunk #7 succeeded at 2083 (offset 1 line). Hunk #8 succeeded at 2115 (offset 1 line). Hunk #9 succeeded at 2166 (offset 1 line). Hunk #10 succeeded at 2187 (offset 1 line). Hunk #11 succeeded at 2319 (offset 1 line). Hunk #12 succeeded at 2438 (offset 1 line). Hunk #13 succeeded at 2449 (offset 1 line). Hunk #14 succeeded at 2506 (offset 1 line). Hunk #15 succeeded at 2530 (offset 1 line). Hunk #16 succeeded at 2562 (offset 1 line). Hunk #17 succeeded at 2588 (offset 1 line). Hunk #18 succeeded at 2602 (offset 1 line). patching file include/rdma/ib_verbs.h Hunk #2 succeeded at 769 (offset 8 lines). Hunk #3 succeeded at 1092 (offset 8 lines). patching file drivers/infiniband/core/device.c patching file drivers/infiniband/core/uverbs_main.c patching file drivers/infiniband/core/uverbs.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0080_kernel_xrc.patch patching file include/rdma/ib_verbs.h Hunk #1 succeeded at 677 (offset 9 lines). Hunk #2 succeeded at 803 (offset 9 lines). Hunk #3 succeeded at 1256 (offset 9 lines). Hunk #4 succeeded at 1918 (offset 9 lines). patching file drivers/infiniband/core/verbs.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0090_core_delete_redundant_check_for_DR_SMP.patch patching file drivers/infiniband/core/mad.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0100_core_Dont_modify_outgoing_DR_SMP_if_first_pa.patch patching file drivers/infiniband/core/mad.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0110_xrc_rcv.patch patching file include/rdma/ib_verbs.h patching file drivers/infiniband/core/uverbs_main.c patching file drivers/infiniband/core/uverbs_cmd.c patching file include/rdma/ib_user_verbs.h patching file drivers/infiniband/core/uverbs.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0010_MSI-X_failure_path.patch patching file drivers/net/cxgb3/cxgb3_main.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0020_Use_wild_card_for_PCI_subdevice_ID_match.patch patching file drivers/net/cxgb3/cxgb3_main.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_00300_add_ofed_version_tag.patch patching file drivers/net/cxgb3/version.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0030_Fix_resources_release.patch patching file drivers/net/cxgb3/cxgb3_main.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0040_Add_EEH_support.patch patching file drivers/net/cxgb3/cxgb3_main.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0050_FW_upgrade.patch patching file drivers/net/cxgb3/t3_hw.c patching file drivers/net/cxgb3/version.h Hunk #1 succeeded at 38 with fuzz 1. /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0060_fix_interaction_with_pktgen.patch patching file drivers/net/cxgb3/sge.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0070_sysfs_methods_clean_up.patch patching file drivers/net/cxgb3/cxgb3_main.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0080_HW_set_up_updates.patch patching file drivers/net/cxgb3/cxgb3_main.c patching file drivers/net/cxgb3/regs.h patching file drivers/net/cxgb3/t3_hw.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0090_Fix_I-O_synchronization.patch patching file drivers/net/cxgb3/sge.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0100_trim_trailing_whitespace.patch patching file drivers/net/cxgb3/cxgb3_main.c patching file drivers/net/cxgb3/cxgb3_offload.c patching file drivers/net/cxgb3/firmware_exports.h patching file drivers/net/cxgb3/t3_hw.c patching file drivers/net/cxgb3/xgmac.c Hunk #1 succeeded at 153 (offset 5 lines). Hunk #2 succeeded at 187 (offset 5 lines). Hunk #3 succeeded at 336 (offset 6 lines). Hunk #4 succeeded at 449 (offset 14 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0210_Parity_initialization_for_T3C_adapters.patch patching file drivers/net/cxgb3/adapter.h patching file drivers/net/cxgb3/cxgb3_main.c patching file drivers/net/cxgb3/cxgb3_offload.c patching file drivers/net/cxgb3/regs.h patching file drivers/net/cxgb3/sge.c patching file drivers/net/cxgb3/t3_hw.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0220_Fix_EEH_missing_softirq_blocking.patch patching file drivers/net/cxgb3/cxgb3_main.c patching file drivers/net/cxgb3/sge.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0230_Handle_ARP_completions_that_mark_neighbors_stale.patch patching file drivers/net/cxgb3/l2t.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0001_Add_missing_spaces_in_the_middle_of_format.patch patching file drivers/infiniband/hw/ehca/ehca_cq.c patching file drivers/infiniband/hw/ehca/ehca_qp.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0002_Forward_event_client_reregister_required.patch patching file drivers/infiniband/hw/ehca/ehca_irq.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0003_Use_round_jiffies_for_EQ_polling_timer.patch patching file drivers/infiniband/hw/ehca/ehca_main.c Hunk #1 succeeded at 926 (offset 13 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0004_Remove_CQ_QP_link_before_destroying_QP.patch patching file drivers/infiniband/hw/ehca/ehca_qp.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0005_Define_array_to_store_SMI_GSI_QPs.patch patching file drivers/infiniband/hw/ehca/ehca_classes.h patching file drivers/infiniband/hw/ehca/ehca_main.c Hunk #1 succeeded at 511 (offset 13 lines). Hunk #2 succeeded at 537 (offset 13 lines). Hunk #3 succeeded at 550 (offset 13 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0006_Add_port_connection_autodetect_mode.patch patching file drivers/infiniband/hw/ehca/ehca_classes.h patching file drivers/infiniband/hw/ehca/ehca_irq.c patching file drivers/infiniband/hw/ehca/ehca_iverbs.h patching file drivers/infiniband/hw/ehca/ehca_main.c patching file drivers/infiniband/hw/ehca/ehca_qp.c patching file drivers/infiniband/hw/ehca/ehca_sqp.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0007_Prevent_RDMA_related_connection_failures.patch patching file drivers/infiniband/hw/ehca/ehca_classes.h patching file drivers/infiniband/hw/ehca/ehca_qp.c patching file drivers/infiniband/hw/ehca/ehca_reqs.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0008_Prevent_sending_ud_packets_to_qp0.patch patching file drivers/infiniband/hw/ehca/ehca_reqs.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0009_Update_sma_attr_also_in_case_of_disruptive.patch patching file drivers/infiniband/hw/ehca/ehca_irq.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0010_Add_PMA_support.patch patching file drivers/infiniband/hw/ehca/ehca_classes.h patching file drivers/infiniband/hw/ehca/ehca_iverbs.h patching file drivers/infiniband/hw/ehca/ehca_main.c patching file drivers/infiniband/hw/ehca/ehca_sqp.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0011_Alloc_firmware_context_with_GFP_ATOMIC.patch patching file drivers/infiniband/hw/ehca/ehca_hca.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0012_Change_version_number.patch patching file drivers/infiniband/hw/ehca/ehca_main.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0030_improve_interrupt_handler_cache_footprin.patch patching file drivers/infiniband/hw/ipath/ipath_intr.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0040_convert_the_semaphore_ipath_eep_s.patch patching file drivers/infiniband/hw/ipath/ipath_eeprom.c patching file drivers/infiniband/hw/ipath/ipath_init_chip.c patching file drivers/infiniband/hw/ipath/ipath_kernel.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0050_remove_dead_code_for_user_process_waiting.patch patching file drivers/infiniband/hw/ipath/ipath_intr.c patching file drivers/infiniband/hw/ipath/ipath_kernel.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0060_fix_sendctrl_locking.patch patching file drivers/infiniband/hw/ipath/ipath_driver.c patching file drivers/infiniband/hw/ipath/ipath_file_ops.c patching file drivers/infiniband/hw/ipath/ipath_init_chip.c patching file drivers/infiniband/hw/ipath/ipath_intr.c patching file drivers/infiniband/hw/ipath/ipath_kernel.h patching file drivers/infiniband/hw/ipath/ipath_ruc.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0070_fix_return_error_number_for_ib_resize_cq.patch patching file drivers/infiniband/hw/ipath/ipath_cq.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0080_fix_comments_for_ipath_create_srq.patch patching file drivers/infiniband/hw/ipath/ipath_srq.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0090_better_comment_for_rmb_in_ipath_intr.patch patching file drivers/infiniband/hw/ipath/ipath_intr.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0100_add_the_work_completion_error_code_to_the.patch patching file drivers/infiniband/hw/ipath/ipath_qp.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0120_enable_loopback_of_DR_SMP_responses_from.patch patching file drivers/infiniband/hw/ipath/ipath_mad.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0130_fix_RNR_NAK_handling.patch patching file drivers/infiniband/hw/ipath/ipath_rc.c patching file drivers/infiniband/hw/ipath/ipath_ruc.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0140_cleanup_ipath_get_egrbuf.patch patching file drivers/infiniband/hw/ipath/ipath_driver.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0150_kreceive_uses_portdata_rather_than_devdat.patch patching file drivers/infiniband/hw/ipath/ipath_driver.c patching file drivers/infiniband/hw/ipath/ipath_file_ops.c patching file drivers/infiniband/hw/ipath/ipath_init_chip.c patching file drivers/infiniband/hw/ipath/ipath_intr.c patching file drivers/infiniband/hw/ipath/ipath_kernel.h patching file drivers/infiniband/hw/ipath/ipath_stats.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0160_generalize_some_macros_SHIFT.patch patching file drivers/infiniband/hw/ipath/ipath_file_ops.c patching file drivers/infiniband/hw/ipath/ipath_iba6110.c patching file drivers/infiniband/hw/ipath/ipath_iba6120.c patching file drivers/infiniband/hw/ipath/ipath_init_chip.c patching file drivers/infiniband/hw/ipath/ipath_intr.c patching file drivers/infiniband/hw/ipath/ipath_kernel.h patching file drivers/infiniband/hw/ipath/ipath_registers.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0170_changes_for_fields_moving_from_devdata_to.patch patching file drivers/infiniband/hw/ipath/ipath_file_ops.c patching file drivers/infiniband/hw/ipath/ipath_init_chip.c patching file drivers/infiniband/hw/ipath/ipath_intr.c patching file drivers/infiniband/hw/ipath/ipath_kernel.h patching file drivers/infiniband/hw/ipath/ipath_stats.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0180_header_file_changes_to_support_IBA7220.patch patching file drivers/infiniband/hw/ipath/Makefile patching file drivers/infiniband/hw/ipath/ipath_common.h patching file drivers/infiniband/hw/ipath/ipath_debug.h patching file drivers/infiniband/hw/ipath/ipath_kernel.h patching file drivers/infiniband/hw/ipath/ipath_registers.h patching file drivers/infiniband/hw/ipath/ipath_verbs.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0190_isolate_7220_specific_content.patch patching file drivers/infiniband/hw/ipath/ipath_7220.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0200_HCA_specific_code_to_support_IBA7220.patch patching file drivers/infiniband/hw/ipath/ipath_iba7220.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0210_support_for_SerDes_portion_of_IBA7220.patch patching file drivers/infiniband/hw/ipath/ipath_sd7220.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0220_add_IBA7220_specific_initialization_data.patch patching file drivers/infiniband/hw/ipath/ipath_sd7220_img.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0230_add_code_for_IBA7220_send_DMA.patch patching file drivers/infiniband/hw/ipath/ipath_sdma.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0240_user_mode_send_DMA_header_file.patch patching file drivers/infiniband/hw/ipath/ipath_user_sdma.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0250_user_mode_send_DMA.patch patching file drivers/infiniband/hw/ipath/ipath_user_sdma.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0260_remaining_7220_changes_to_headers_and_af.patch patching file drivers/infiniband/hw/ipath/ipath_driver.c patching file drivers/infiniband/hw/ipath/ipath_file_ops.c patching file drivers/infiniband/hw/ipath/ipath_iba6110.c patching file drivers/infiniband/hw/ipath/ipath_iba6120.c patching file drivers/infiniband/hw/ipath/ipath_init_chip.c patching file drivers/infiniband/hw/ipath/ipath_intr.c patching file drivers/infiniband/hw/ipath/ipath_kernel.h patching file drivers/infiniband/hw/ipath/ipath_registers.h patching file drivers/infiniband/hw/ipath/ipath_stats.c patching file drivers/infiniband/hw/ipath/ipath_sysfs.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0270_misc_changes_to_prepare_for_iba7220_intro.patch patching file drivers/infiniband/hw/ipath/ipath_ruc.c patching file drivers/infiniband/hw/ipath/ipath_verbs.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0280_cancel_send_DMA_buffers.patch patching file drivers/infiniband/hw/ipath/ipath_driver.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0290_changes_to_IB_link_state_machine_handling.patch patching file drivers/infiniband/hw/ipath/ipath_diag.c patching file drivers/infiniband/hw/ipath/ipath_driver.c patching file drivers/infiniband/hw/ipath/ipath_intr.c patching file drivers/infiniband/hw/ipath/ipath_mad.c patching file drivers/infiniband/hw/ipath/ipath_registers.h patching file drivers/infiniband/hw/ipath/ipath_verbs.c patching file drivers/infiniband/hw/ipath/ipath_verbs.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0300_error_handling_improvements_debuggabilit.patch patching file drivers/infiniband/hw/ipath/ipath_common.h patching file drivers/infiniband/hw/ipath/ipath_diag.c patching file drivers/infiniband/hw/ipath/ipath_driver.c patching file drivers/infiniband/hw/ipath/ipath_file_ops.c patching file drivers/infiniband/hw/ipath/ipath_init_chip.c patching file drivers/infiniband/hw/ipath/ipath_intr.c patching file drivers/infiniband/hw/ipath/ipath_kernel.h patching file drivers/infiniband/hw/ipath/ipath_registers.h patching file drivers/infiniband/hw/ipath/ipath_stats.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0310_eeprom_support_for_7220_devices_robustne.patch patching file drivers/infiniband/hw/ipath/ipath_eeprom.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0320_enable_use_of_4KB_MTU_via_module_paramate.patch patching file drivers/infiniband/hw/ipath/ipath_driver.c patching file drivers/infiniband/hw/ipath/ipath_file_ops.c patching file drivers/infiniband/hw/ipath/ipath_init_chip.c patching file drivers/infiniband/hw/ipath/ipath_mad.c patching file drivers/infiniband/hw/ipath/ipath_qp.c patching file drivers/infiniband/hw/ipath/ipath_verbs.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0330_infrastructure_updates_for_sdma_support.patch patching file drivers/infiniband/hw/ipath/ipath_driver.c patching file drivers/infiniband/hw/ipath/ipath_intr.c patching file drivers/infiniband/hw/ipath/ipath_rc.c patching file drivers/infiniband/hw/ipath/ipath_verbs.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0340_changes_to_support_PIO_bandwidth_check_on.patch patching file drivers/infiniband/hw/ipath/ipath_common.h patching file drivers/infiniband/hw/ipath/ipath_driver.c patching file drivers/infiniband/hw/ipath/ipath_file_ops.c patching file drivers/infiniband/hw/ipath/ipath_kernel.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0350_add_remaining_small_pieces_of_7220_suppor.patch patching file drivers/infiniband/hw/ipath/Makefile patching file drivers/infiniband/hw/ipath/ipath_driver.c patching file drivers/infiniband/hw/ipath/ipath_file_ops.c patching file drivers/infiniband/hw/ipath/ipath_iba6110.c patching file drivers/infiniband/hw/ipath/ipath_iba6120.c patching file drivers/infiniband/hw/ipath/ipath_init_chip.c patching file drivers/infiniband/hw/ipath/ipath_intr.c patching file drivers/infiniband/hw/ipath/ipath_kernel.h patching file drivers/infiniband/hw/ipath/ipath_registers.h patching file drivers/infiniband/hw/ipath/ipath_sysfs.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0360_misc_changes_related_to_the_iba7220.patch patching file drivers/infiniband/hw/ipath/ipath_diag.c patching file drivers/infiniband/hw/ipath/ipath_driver.c patching file drivers/infiniband/hw/ipath/ipath_file_ops.c patching file drivers/infiniband/hw/ipath/ipath_fs.c patching file drivers/infiniband/hw/ipath/ipath_iba6110.c patching file drivers/infiniband/hw/ipath/ipath_iba6120.c patching file drivers/infiniband/hw/ipath/ipath_init_chip.c patching file drivers/infiniband/hw/ipath/ipath_intr.c patching file drivers/infiniband/hw/ipath/ipath_keys.c patching file drivers/infiniband/hw/ipath/ipath_mad.c patching file drivers/infiniband/hw/ipath/ipath_qp.c patching file drivers/infiniband/hw/ipath/ipath_stats.c patching file drivers/infiniband/hw/ipath/ipath_ud.c patching file drivers/infiniband/hw/ipath/ipath_verbs.c patching file drivers/infiniband/hw/ipath/ipath_verbs.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0370_enable_sdma_for_user_programs.patch patching file drivers/infiniband/hw/ipath/Makefile patching file drivers/infiniband/hw/ipath/ipath_file_ops.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0380_set_ipath_lbus_info_where_bus_parameters.patch patching file drivers/infiniband/hw/ipath/ipath_iba6110.c patching file drivers/infiniband/hw/ipath/ipath_iba6120.c patching file drivers/infiniband/hw/ipath/ipath_iba7220.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0390_fix_IB_compliance_problems_with_link_stat.patch patching file drivers/infiniband/hw/ipath/ipath_common.h patching file drivers/infiniband/hw/ipath/ipath_driver.c patching file drivers/infiniband/hw/ipath/ipath_kernel.h patching file drivers/infiniband/hw/ipath/ipath_mad.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0400_set_static_rate_and_VL15_flags_for_IBA722.patch patching file drivers/infiniband/hw/ipath/ipath_verbs.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0410_update.patch patching file drivers/infiniband/hw/ipath/ipath_common.h patching file drivers/infiniband/hw/ipath/ipath_diag.c patching file drivers/infiniband/hw/ipath/ipath_driver.c patching file drivers/infiniband/hw/ipath/ipath_eeprom.c patching file drivers/infiniband/hw/ipath/ipath_file_ops.c patching file drivers/infiniband/hw/ipath/ipath_iba6120.c patching file drivers/infiniband/hw/ipath/ipath_iba7220.c patching file drivers/infiniband/hw/ipath/ipath_init_chip.c patching file drivers/infiniband/hw/ipath/ipath_intr.c patching file drivers/infiniband/hw/ipath/ipath_kernel.h patching file drivers/infiniband/hw/ipath/ipath_qp.c patching file drivers/infiniband/hw/ipath/ipath_rc.c patching file drivers/infiniband/hw/ipath/ipath_sdma.c patching file drivers/infiniband/hw/ipath/ipath_srq.c patching file drivers/infiniband/hw/ipath/ipath_sysfs.c patching file drivers/infiniband/hw/ipath/ipath_ud.c patching file drivers/infiniband/hw/ipath/ipath_user_sdma.c patching file drivers/infiniband/hw/ipath/ipath_user_sdma.h patching file drivers/infiniband/hw/ipath/ipath_verbs.c Hunk #1 succeeded at 703 (offset -6 lines). Hunk #2 succeeded at 1094 (offset -6 lines). Hunk #3 succeeded at 1396 (offset -6 lines). Hunk #4 succeeded at 1413 (offset -6 lines). patching file drivers/infiniband/hw/ipath/ipath_verbs.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0420_ipoib_4k_mtu.patch patching file drivers/infiniband/hw/ipath/ipath_diag.c patching file drivers/infiniband/hw/ipath/ipath_driver.c patching file drivers/infiniband/hw/ipath/ipath_iba6120.c patching file drivers/infiniband/hw/ipath/ipath_iba7220.c patching file drivers/infiniband/hw/ipath/ipath_init_chip.c patching file drivers/infiniband/hw/ipath/ipath_kernel.h patching file drivers/infiniband/hw/ipath/ipath_rc.c patching file drivers/infiniband/hw/ipath/ipath_verbs.c Hunk #1 succeeded at 169 (offset -6 lines). Hunk #2 succeeded at 1180 (offset -6 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0430_dapl_rdma_read.patch patching file drivers/infiniband/hw/ipath/ipath_rc.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath-22-memcpy_cachebypass.patch patching file drivers/infiniband/hw/ipath/Makefile Hunk #1 succeeded at 36 with fuzz 1 (offset 4 lines). patching file drivers/infiniband/hw/ipath/ipath_verbs.c patching file drivers/infiniband/hw/ipath/memcpy_cachebypass_x86_64.S /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0010_Add-high-dma-support-to-ipoib.patch patching file drivers/infiniband/ulp/ipoib/ipoib_main.c Hunk #1 succeeded at 1120 (offset 2 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0020_Add-s-g-support-for-IPOIB.patch patching file drivers/infiniband/ulp/ipoib/ipoib.h Hunk #2 succeeded at 344 (offset 2 lines). patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c patching file drivers/infiniband/ulp/ipoib/ipoib_verbs.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0040_checksum-offload.patch patching file drivers/infiniband/ulp/ipoib/ipoib.h patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c patching file drivers/infiniband/ulp/ipoib/ipoib_main.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0050_Add-LSO-support.patch patching file drivers/infiniband/ulp/ipoib/ipoib.h patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c patching file drivers/infiniband/ulp/ipoib/ipoib_main.c patching file drivers/infiniband/ulp/ipoib/ipoib_verbs.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0060_ethtool-support.patch patching file drivers/infiniband/ulp/ipoib/Makefile patching file drivers/infiniband/ulp/ipoib/ipoib.h Hunk #1 succeeded at 522 with fuzz 2 (offset -5 lines). patching file drivers/infiniband/ulp/ipoib/ipoib_etool.c patching file drivers/infiniband/ulp/ipoib/ipoib_main.c Hunk #1 succeeded at 963 (offset -8 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0070_modiy_cq_params.patch patching file drivers/infiniband/ulp/ipoib/ipoib.h Hunk #1 succeeded at 305 with fuzz 1 (offset -3 lines). Hunk #2 succeeded at 387 (offset -3 lines). patching file drivers/infiniband/ulp/ipoib/ipoib_etool.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0110_set_default_cq_patams.patch patching file drivers/infiniband/ulp/ipoib/ipoib_verbs.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0120_check_grat_arp_with_cm.patch patching file drivers/infiniband/ulp/ipoib/ipoib_main.c Hunk #1 succeeded at 689 (offset -27 lines). Hunk #2 succeeded at 710 (offset -27 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0180_split_cq.patch patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c patching file drivers/infiniband/ulp/ipoib/ipoib.h patching file drivers/infiniband/ulp/ipoib/ipoib_verbs.c patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c patching file drivers/infiniband/ulp/ipoib/ipoib_etool.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0190_unsig_udqp.patch patching file drivers/infiniband/ulp/ipoib/ipoib.h patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c patching file drivers/infiniband/ulp/ipoib/ipoib_verbs.c patching file drivers/infiniband/ulp/ipoib/ipoib_multicast.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0200_non_srq.patch patching file drivers/infiniband/ulp/ipoib/ipoib.h Hunk #2 succeeded at 259 (offset 1 line). Hunk #3 succeeded at 308 (offset 1 line). Hunk #4 succeeded at 552 (offset 1 line). Hunk #5 succeeded at 590 (offset 1 line). Hunk #6 succeeded at 613 (offset 1 line). Hunk #7 succeeded at 645 (offset 1 line). patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c patching file drivers/infiniband/ulp/ipoib/ipoib_main.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0210_draft_wr.patch patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c patching file drivers/infiniband/ulp/ipoib/ipoib.h Hunk #1 succeeded at 328 (offset 1 line). patching file drivers/infiniband/ulp/ipoib/ipoib_verbs.c Hunk #1 succeeded at 222 with fuzz 1 (offset 5 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0220_ud_post_list.patch patching file drivers/infiniband/ulp/ipoib/ipoib.h Hunk #1 succeeded at 98 (offset 1 line). Hunk #2 succeeded at 328 (offset 1 line). patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c Hunk #2 succeeded at 790 (offset -24 lines). patching file drivers/infiniband/ulp/ipoib/ipoib_verbs.c Hunk #1 succeeded at 222 (offset -4 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0230_srq_post_n.patch patching file drivers/infiniband/ulp/ipoib/ipoib.h Hunk #1 succeeded at 99 (offset 1 line). Hunk #2 succeeded at 290 (offset 1 line). Hunk #3 succeeded at 318 (offset 1 line). patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0240_4kmtu.patch patching file drivers/infiniband/ulp/ipoib/ipoib.h Hunk #2 succeeded at 142 (offset 1 line). Hunk #3 succeeded at 340 (offset 1 line). Hunk #4 succeeded at 381 (offset 1 line). Hunk #5 succeeded at 415 (offset 1 line). Hunk #6 succeeded at 456 (offset 1 line). patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c Hunk #9 succeeded at 848 (offset 7 lines). patching file drivers/infiniband/ulp/ipoib/ipoib_main.c patching file drivers/infiniband/ulp/ipoib/ipoib_multicast.c patching file drivers/infiniband/ulp/ipoib/ipoib_verbs.c patching file drivers/infiniband/ulp/ipoib/ipoib_vlan.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0250_non_srq_param.patch patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c Hunk #2 succeeded at 1453 (offset 6 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0260_pkey_change.patch patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c Hunk #1 succeeded at 959 (offset 7 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0270_remove_alloc.patch patching file drivers/infiniband/ulp/ipoib/ipoib.h Hunk #1 succeeded at 282 (offset 1 line). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0280_vmap.patch patching file drivers/infiniband/ulp/ipoib/ipoib.h Hunk #1 succeeded at 270 (offset 1 line). Hunk #2 succeeded at 283 (offset 1 line). Hunk #3 succeeded at 303 (offset 1 line). Hunk #4 succeeded at 327 (offset 1 line). Hunk #5 succeeded at 389 (offset 1 line). Hunk #6 succeeded at 584 (offset 1 line). patching file drivers/infiniband/ulp/ipoib/ipoib_main.c patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0290_reduce_cm_tx.patch patching file drivers/infiniband/ulp/ipoib/ipoib.h Hunk #1 succeeded at 111 (offset 1 line). Hunk #2 succeeded at 289 (offset 7 lines). patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0300_reap.patch patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0310_def_ring_sizes.patch patching file drivers/infiniband/ulp/ipoib/ipoib.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0320_small_skb_copy.patch patching file drivers/infiniband/ulp/ipoib/ipoib.h Hunk #1 succeeded at 100 (offset 1 line). patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0330_child_mtu.patch patching file drivers/infiniband/ulp/ipoib/ipoib_vlan.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_selector_updated.patch patching file drivers/infiniband/ulp/ipoib/ipoib_main.c Hunk #1 succeeded at 201 (offset 19 lines). Hunk #2 succeeded at 491 (offset 37 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/iser_01_Print_information_about_unhandled_RDMA_CM_events.patch patching file drivers/infiniband/ulp/iser/iser_verbs.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0020_Hold_rtnl_lock_around_ethtool_get_drvinfo_call.patch patching file drivers/infiniband/hw/cxgb3/iwch_provider.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0030_Support_version_5.0_firmware.patch patching file drivers/infiniband/hw/cxgb3/iwch_qp.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0040_Flush_the_RQ_when_closing.patch patching file drivers/infiniband/hw/cxgb3/iwch_qp.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0050_fix_page_shift_calculation.patch patching file drivers/infiniband/hw/cxgb3/iwch_mem.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0060_Mark_qp_as_privileged.patch patching file drivers/infiniband/hw/cxgb3/cxio_wr.h patching file drivers/infiniband/hw/cxgb3/iwch_qp.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0070_Fix_the_T3A_workaround_checks.patch patching file drivers/infiniband/hw/cxgb3/cxio_hal.c patching file drivers/infiniband/hw/cxgb3/iwch_cm.c patching file drivers/infiniband/hw/cxgb3/iwch_provider.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mad_0010_enable_loopback_of_DR_SMP_responses_from_use.patch patching file drivers/infiniband/core/mad.c patching file drivers/infiniband/core/smi.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0010_add_wc.patch patching file drivers/infiniband/hw/mlx4/Makefile patching file drivers/infiniband/hw/mlx4/main.c Hunk #2 succeeded at 383 (offset 7 lines). Hunk #3 succeeded at 701 (offset 89 lines). patching file drivers/infiniband/hw/mlx4/wc.c patching file drivers/infiniband/hw/mlx4/wc.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0015_set_cacheline_sz.patch patching file drivers/net/mlx4/fw.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0020_cmd_tout.patch patching file drivers/net/mlx4/cmd.c Hunk #1 succeeded at 278 (offset 6 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0030_checksum_offload.patch patching file drivers/infiniband/hw/mlx4/cq.c patching file drivers/infiniband/hw/mlx4/main.c patching file drivers/infiniband/hw/mlx4/qp.c patching file drivers/net/mlx4/fw.c patching file include/linux/mlx4/cq.h patching file include/linux/mlx4/qp.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0040_qp_max_msg.patch patching file drivers/infiniband/hw/mlx4/qp.c Hunk #1 succeeded at 758 (offset -125 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0045_qp_flags.patch patching file drivers/infiniband/hw/mlx4/mlx4_ib.h patching file drivers/infiniband/hw/mlx4/qp.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0050_lso.patch patching file drivers/infiniband/hw/mlx4/cq.c patching file drivers/infiniband/hw/mlx4/main.c patching file drivers/infiniband/hw/mlx4/qp.c patching file drivers/net/mlx4/fw.c patching file drivers/net/mlx4/fw.h patching file drivers/net/mlx4/main.c patching file include/linux/mlx4/device.h patching file include/linux/mlx4/qp.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0060_modify_cq.patch patching file drivers/infiniband/hw/mlx4/main.c Hunk #1 succeeded at 602 (offset -13 lines). patching file drivers/infiniband/hw/mlx4/cq.c patching file drivers/infiniband/hw/mlx4/mlx4_ib.h Hunk #1 succeeded at 252 (offset 3 lines). patching file drivers/net/mlx4/cq.c patching file include/linux/mlx4/cq.h patching file include/linux/mlx4/cmd.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0070_xrc.patch patching file include/linux/mlx4/device.h patching file drivers/infiniband/hw/mlx4/main.c Hunk #1 succeeded at 104 (offset 1 line). Hunk #2 succeeded at 449 (offset 1 line). Hunk #3 succeeded at 659 (offset 1 line). patching file drivers/infiniband/hw/mlx4/mlx4_ib.h Hunk #2 succeeded at 136 (offset 4 lines). Hunk #3 succeeded at 200 (offset 5 lines). Hunk #4 succeeded at 280 (offset 5 lines). patching file drivers/net/mlx4/xrcd.c patching file drivers/net/mlx4/mlx4.h patching file drivers/net/mlx4/main.c patching file drivers/net/mlx4/srq.c patching file drivers/net/mlx4/fw.c patching file drivers/net/mlx4/fw.h patching file drivers/infiniband/hw/mlx4/qp.c Hunk #2 succeeded at 341 with fuzz 2 (offset 12 lines). Hunk #3 succeeded at 376 (offset 12 lines). Hunk #4 succeeded at 389 (offset 12 lines). Hunk #5 succeeded at 424 (offset 12 lines). Hunk #6 succeeded at 445 (offset 12 lines). Hunk #7 succeeded at 463 (offset 12 lines). Hunk #8 succeeded at 541 (offset 12 lines). Hunk #9 succeeded at 549 (offset 12 lines). Hunk #10 succeeded at 564 (offset 12 lines). Hunk #11 succeeded at 581 (offset 12 lines). Hunk #12 succeeded at 650 (offset 12 lines). Hunk #13 succeeded at 798 (offset 12 lines). Hunk #14 succeeded at 914 (offset 12 lines). Hunk #15 succeeded at 1002 (offset 12 lines). patching file drivers/infiniband/hw/mlx4/srq.c patching file include/linux/mlx4/qp.h patching file drivers/net/mlx4/Makefile patching file drivers/net/mlx4/qp.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0080_profile_parm.patch patching file drivers/net/mlx4/main.c Hunk #2 succeeded at 562 (offset -2 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0090_fix_sq_wrs.patch patching file drivers/infiniband/hw/mlx4/qp.c Hunk #4 succeeded at 284 (offset -2 lines). Hunk #5 succeeded at 292 (offset -2 lines). Hunk #6 succeeded at 305 (offset -2 lines). patching file drivers/infiniband/hw/mlx4/mlx4_ib.h patching file drivers/infiniband/hw/mlx4/main.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0120_xrc_kernel.patch patching file drivers/infiniband/hw/mlx4/cq.c Hunk #2 succeeded at 331 with fuzz 2 (offset -8 lines). Hunk #3 succeeded at 358 (offset -3 lines). Hunk #4 succeeded at 392 (offset -3 lines). Hunk #5 succeeded at 400 (offset -3 lines). Hunk #6 succeeded at 534 (offset -1 lines). Hunk #7 succeeded at 556 (offset -1 lines). patching file drivers/net/mlx4/mlx4.h patching file drivers/net/mlx4/srq.c patching file include/linux/mlx4/device.h patching file include/linux/mlx4/srq.h patching file drivers/infiniband/hw/mlx4/srq.c patching file drivers/infiniband/hw/mlx4/qp.c Hunk #1 succeeded at 1395 (offset 19 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0125_xrc_kernel_missed.patch patching file drivers/infiniband/hw/mlx4/qp.c Hunk #2 succeeded at 1031 (offset 14 lines). Hunk #3 succeeded at 1041 (offset 14 lines). Hunk #4 succeeded at 1708 (offset 16 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0150_increase_default_qp.patch patching file drivers/net/mlx4/main.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0170_shrinking_wqe.patch patching file drivers/infiniband/hw/mlx4/cq.c Hunk #1 succeeded at 358 (offset 5 lines). Hunk #2 succeeded at 402 (offset 5 lines). patching file drivers/infiniband/hw/mlx4/mlx4_ib.h patching file drivers/infiniband/hw/mlx4/qp.c Hunk #5 succeeded at 353 with fuzz 2 (offset -2 lines). Hunk #6 succeeded at 428 (offset -2 lines). Hunk #7 succeeded at 476 with fuzz 2 (offset -4 lines). Hunk #8 succeeded at 579 (offset -1 lines). Hunk #9 succeeded at 1089 (offset -1 lines). Hunk #10 succeeded at 1142 (offset -1 lines). Hunk #11 succeeded at 1477 (offset -1 lines). Hunk #12 succeeded at 1500 (offset -1 lines). Hunk #13 succeeded at 1633 (offset -1 lines). Hunk #14 succeeded at 1671 (offset -1 lines). patching file drivers/net/mlx4/alloc.c patching file include/linux/mlx4/device.h patching file include/linux/mlx4/qp.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0180_max_eqs.patch patching file drivers/net/mlx4/fw.c Hunk #1 succeeded at 205 (offset 3 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0190_bogus_qp_event.patch patching file drivers/net/mlx4/qp.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0210_xrc_rcv.patch patching file drivers/infiniband/hw/mlx4/mlx4_ib.h patching file drivers/infiniband/hw/mlx4/qp.c patching file drivers/infiniband/hw/mlx4/main.c patching file drivers/infiniband/hw/mlx4/cq.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0220_enable_qos.patch patching file drivers/net/mlx4/fw.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0230_hw_id.patch patching file drivers/net/mlx4/fw.h patching file drivers/net/mlx4/main.c patching file drivers/infiniband/hw/mlx4/main.c patching file drivers/net/mlx4/fw.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0240_optimize_poll.patch patching file drivers/infiniband/hw/mlx4/cq.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0250_debug_output.patch patching file drivers/infiniband/hw/mlx4/main.c patching file drivers/infiniband/hw/mlx4/mlx4_ib.h patching file drivers/infiniband/hw/mlx4/qp.c patching file drivers/infiniband/hw/mlx4/cq.c patching file drivers/infiniband/hw/mlx4/srq.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0260_optimze_stamping.patch patching file drivers/infiniband/hw/mlx4/qp.c Hunk #3 succeeded at 1153 (offset 41 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0270_fmr_enable.patch patching file drivers/infiniband/hw/mlx4/mr.c patching file drivers/net/mlx4/mr.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0280_diag_counters_sysfs.patch patching file drivers/net/mlx4/fw.c patching file include/linux/mlx4/device.h patching file drivers/infiniband/hw/mlx4/main.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0290_mcast_loopback.patch patching file drivers/net/mlx4/mcg.c patching file drivers/net/mlx4/main.c patching file drivers/net/mlx4/mlx4.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0300_bogus_qp.patch patching file drivers/net/mlx4/qp.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0310_date_version.patch patching file drivers/infiniband/hw/mlx4/main.c patching file drivers/net/mlx4/mlx4.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0001_catas_wqueue_namelen.patch patching file drivers/infiniband/hw/mthca/mthca_catas.c Hunk #1 succeeded at 205 with fuzz 2. /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0002_wrid_swap.patch patching file drivers/infiniband/hw/mthca/mthca_cq.c Hunk #1 succeeded at 538 (offset 1 line). Hunk #2 succeeded at 558 (offset 1 line). patching file drivers/infiniband/hw/mthca/mthca_qp.c Hunk #1 succeeded at 1766 (offset 76 lines). Hunk #2 succeeded at 1883 (offset 73 lines). Hunk #3 succeeded at 2109 (offset 41 lines). Hunk #4 succeeded at 2222 with fuzz 2 (offset 30 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0003_checksum_offload.patch patching file drivers/infiniband/hw/mthca/mthca_cmd.c patching file drivers/infiniband/hw/mthca/mthca_cmd.h patching file drivers/infiniband/hw/mthca/mthca_cq.c Hunk #3 succeeded at 636 (offset -1 lines). patching file drivers/infiniband/hw/mthca/mthca_main.c patching file drivers/infiniband/hw/mthca/mthca_qp.c patching file drivers/infiniband/hw/mthca/mthca_wqe.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0004_prelink_wqes.patch patching file drivers/infiniband/hw/mthca/mthca_qp.c patching file drivers/infiniband/hw/mthca/mthca_srq.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0005_hw_ver.patch patching file drivers/infiniband/hw/mthca/mthca_cmd.c patching file drivers/infiniband/hw/mthca/mthca_main.c patching file drivers/infiniband/hw/mthca/mthca_provider.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0006_page_size_calc.patch patching file drivers/infiniband/hw/mthca/mthca_provider.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0007_fmr_alloc_error.patch patching file drivers/infiniband/hw/mthca/mthca_mr.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0008_roland_fmr_alloc_fix.patch patching file drivers/infiniband/hw/mthca/mthca_mr.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0009_sg_init_table.patch patching file drivers/infiniband/hw/mthca/mthca_memfree.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0010_bogus_qp.patch patching file drivers/infiniband/hw/mthca/mthca_qp.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0011_date_version.patch patching file drivers/infiniband/hw/mthca/mthca_dev.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/sean_cm_flush_workqueue.patch patching file drivers/infiniband/core/cm.c Hunk #1 succeeded at 3466 (offset -47 lines). Hunk #2 succeeded at 3512 (offset -47 lines). Hunk #3 succeeded at 3520 (offset -47 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/sean_cm_limit_mra_timeout.patch patching file drivers/infiniband/core/cm.c Hunk #1 succeeded at 53 (offset -1 lines). Hunk #2 succeeded at 917 (offset 18 lines). Hunk #3 succeeded at 1045 (offset 20 lines). Hunk #4 succeeded at 1449 (offset 20 lines). Hunk #5 succeeded at 2353 (offset 14 lines). Hunk #6 succeeded at 2764 (offset 16 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/sean_local_sa_1_notifications.patch patching file drivers/infiniband/core/Makefile patching file drivers/infiniband/core/notice.c patching file drivers/infiniband/core/sa.h patching file drivers/infiniband/core/sa_query.c patching file include/rdma/ib_sa.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/sean_local_sa_2_cache.patch patching file drivers/infiniband/core/Makefile patching file drivers/infiniband/core/local_sa.c patching file drivers/infiniband/core/multicast.c patching file drivers/infiniband/core/sa.h patching file drivers/infiniband/core/sa_query.c Hunk #1 succeeded at 461 (offset -3 lines). Hunk #2 succeeded at 780 (offset 22 lines). Hunk #3 succeeded at 846 (offset 18 lines). Hunk #4 succeeded at 1415 (offset 6 lines). Hunk #5 succeeded at 1434 (offset 6 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/sean_local_sa_3_disable.patch patching file drivers/infiniband/core/local_sa.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/sean_local_sa_4_fix_hang.patch patching file drivers/infiniband/core/local_sa.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/srp_1_recreate_at_reconnect.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 504 (offset 9 lines). Hunk #2 succeeded at 531 (offset 9 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/srp_2_disconnect_without_wait.patch (Stripping trailing CRs from patch.) patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 403 (offset 3 lines). Hunk #2 succeeded at 1273 (offset -21 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/srp_3_qp_err_timer_reconnect_target.patch (Stripping trailing CRs from patch.) patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 862 (offset -22 lines). Hunk #2 succeeded at 893 (offset -22 lines). Hunk #3 succeeded at 1010 (offset -22 lines). (Stripping trailing CRs from patch.) patching file drivers/infiniband/ulp/srp/ib_srp.h Hunk #1 succeeded at 155 (offset -5 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/srp_4_respect_target_credit_limit.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 958 (offset 28 lines). Hunk #2 succeeded at 1027 (offset 29 lines). Hunk #3 succeeded at 1214 (offset 29 lines). Hunk #4 succeeded at 1313 (offset 28 lines). patching file drivers/infiniband/ulp/srp/ib_srp.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/srp_5_add_info_to_log_messages.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 272 (offset 3 lines). Hunk #2 succeeded at 304 (offset 3 lines). Hunk #3 succeeded at 381 (offset 3 lines). Hunk #4 succeeded at 403 (offset 3 lines). Hunk #5 succeeded at 571 (offset 4 lines). Hunk #6 succeeded at 687 (offset 4 lines). Hunk #7 succeeded at 791 (offset 4 lines). Hunk #8 succeeded at 837 (offset 4 lines). Hunk #9 succeeded at 859 (offset 4 lines). Hunk #10 succeeded at 901 (offset 4 lines). Hunk #11 succeeded at 1067 (offset 4 lines). Hunk #12 succeeded at 1081 (offset 4 lines). Hunk #13 succeeded at 1136 (offset 4 lines). Hunk #14 succeeded at 1162 (offset 4 lines). Hunk #15 succeeded at 1188 (offset 4 lines). Hunk #16 succeeded at 1217 (offset 4 lines). Hunk #17 succeeded at 1233 (offset 4 lines). Hunk #18 succeeded at 1280 (offset 4 lines). Hunk #19 succeeded at 1307 (offset 4 lines). Hunk #20 succeeded at 1385 (offset 4 lines). Hunk #21 succeeded at 1415 (offset 4 lines). Hunk #22 succeeded at 1442 (offset 4 lines). Hunk #23 succeeded at 1867 (offset 17 lines). Hunk #24 succeeded at 1896 (offset 17 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/srp_6_retry_stale_connections.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 204 (offset 3 lines). Hunk #2 succeeded at 451 (offset 4 lines). Hunk #3 succeeded at 484 (offset 4 lines). Hunk #4 succeeded at 538 (offset 4 lines). Hunk #5 succeeded at 556 (offset 4 lines). Hunk #6 succeeded at 1226 (offset 4 lines). Hunk #7 succeeded at 1918 (offset 17 lines). patching file drivers/infiniband/ulp/srp/ib_srp.h /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/uverbs_warning.patch patching file drivers/infiniband/core/uverbs_cmd.c Applying patches for 2.6.16 kernel: /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/1_struct_path_revert_to_2_6_19.patch patching file drivers/infiniband/core/uverbs_main.c Hunk #1 succeeded at 564 (offset 30 lines). patching file drivers/infiniband/hw/ipath/ipath_file_ops.c Hunk #1 succeeded at 1890 with fuzz 1 (offset 146 lines). patching file drivers/infiniband/hw/ipath/ipath_fs.c Hunk #1 succeeded at 114 (offset -3 lines). Hunk #2 succeeded at 154 (offset -5 lines). Hunk #3 succeeded at 207 (offset -5 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/2_misc_device_to_2_6_19.patch patching file drivers/infiniband/core/ucma.c Hunk #1 succeeded at 1109 (offset 262 lines). Hunk #2 succeeded at 1123 (offset 262 lines). Hunk #3 succeeded at 1137 (offset 262 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/addr_1_netevents_revert_to_2_6_17.patch patching file drivers/infiniband/core/addr.c Hunk #2 succeeded at 351 (offset -2 lines). Hunk #3 succeeded at 378 (offset -2 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/core_sysfs_to_2_6_23.patch patching file drivers/infiniband/core/sysfs.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxg3_to_2_6_20.patch patching file drivers/net/cxgb3/cxgb3_main.c Hunk #1 succeeded at 76 with fuzz 2. Hunk #2 succeeded at 483 (offset 36 lines). Hunk #3 succeeded at 494 (offset 35 lines). Hunk #4 succeeded at 525 (offset 35 lines). Hunk #5 succeeded at 547 (offset 35 lines). Hunk #6 succeeded at 567 (offset 35 lines). Hunk #7 succeeded at 619 (offset 35 lines). Hunk #8 succeeded at 644 (offset 35 lines). Hunk #9 succeeded at 664 (offset 35 lines). Hunk #10 succeeded at 1012 (offset 49 lines). Hunk #11 succeeded at 1037 (offset 49 lines). Hunk #12 succeeded at 2729 (offset 155 lines). Hunk #13 succeeded at 2760 with fuzz 1 (offset 155 lines). patching file drivers/net/cxgb3/cxgb3_offload.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxgb3_0100_napi.patch patching file drivers/net/cxgb3/adapter.h patching file drivers/net/cxgb3/cxgb3_main.c Hunk #9 succeeded at 2713 (offset -6 lines). Hunk #10 succeeded at 2814 (offset -6 lines). patching file drivers/net/cxgb3/sge.c Hunk #5 succeeded at 1649 (offset 5 lines). Hunk #6 succeeded at 1686 (offset 5 lines). Hunk #7 succeeded at 1735 (offset 5 lines). Hunk #8 succeeded at 2082 (offset 5 lines). Hunk #9 succeeded at 2208 (offset 5 lines). Hunk #10 succeeded at 2220 (offset 5 lines). Hunk #11 succeeded at 2240 (offset 5 lines). Hunk #12 succeeded at 2289 (offset 5 lines). Hunk #13 succeeded at 2314 (offset 5 lines). Hunk #14 succeeded at 2420 (offset 5 lines). Hunk #15 succeeded at 2435 (offset 5 lines). Hunk #16 succeeded at 2545 (offset 5 lines). Hunk #17 succeeded at 2557 (offset 5 lines). Hunk #18 succeeded at 2593 (offset 5 lines). Hunk #19 succeeded at 2618 (offset 5 lines). Hunk #20 succeeded at 2739 (offset 5 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxgb3_0200_sset.patch patching file drivers/net/cxgb3/cxgb3_main.c Hunk #1 succeeded at 1246 (offset 115 lines). Hunk #2 succeeded at 1755 (offset 115 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxgb3_0300_sysfs.patch patching file drivers/net/cxgb3/cxgb3_main.c Hunk #1 succeeded at 1050 (offset 83 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxgb3_main_to_2_6_22.patch patching file drivers/net/cxgb3/cxgb3_main.c Hunk #1 succeeded at 1761 with fuzz 2 (offset 178 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxgb3_makefile_to_2_6_19.patch patching file drivers/net/cxgb3/Makefile /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/ehca_01_ibmebus_loc_code.patch (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ehca/ehca_classes.h Hunk #1 succeeded at 111 (offset 4 lines). (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ehca/ehca_eq.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ehca/ehca_main.c Hunk #1 succeeded at 429 (offset 11 lines). Hunk #2 succeeded at 683 (offset 11 lines). Hunk #3 succeeded at 691 with fuzz 2 (offset 11 lines). Hunk #4 succeeded at 713 with fuzz 2 (offset 13 lines). Hunk #5 succeeded at 791 (offset 13 lines). Hunk #6 succeeded at 841 (offset 13 lines). Hunk #7 succeeded at 897 (offset 13 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipath-04-aio_write.patch patching file drivers/infiniband/hw/ipath/ipath_file_ops.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_0100_to_2.6.21.patch patching file drivers/infiniband/ulp/ipoib/ipoib.h Hunk #1 succeeded at 358 (offset 18 lines). Hunk #2 succeeded at 414 (offset 20 lines). Hunk #3 succeeded at 509 (offset 20 lines). patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c Hunk #1 succeeded at 270 (offset 1 line). Hunk #2 succeeded at 286 (offset 1 line). Hunk #3 succeeded at 331 (offset 1 line). Hunk #4 succeeded at 377 (offset 1 line). Hunk #5 succeeded at 399 (offset 1 line). Hunk #6 succeeded at 412 (offset 1 line). Hunk #11 succeeded at 824 with fuzz 2 (offset -29 lines). Hunk #12 succeeded at 904 (offset -28 lines). patching file drivers/infiniband/ulp/ipoib/ipoib_main.c Hunk #1 succeeded at 99 (offset 1 line). Hunk #2 succeeded at 141 (offset 1 line). Hunk #3 succeeded at 544 (offset 1 line). Hunk #4 succeeded at 609 (offset 1 line). Hunk #5 succeeded at 658 (offset 1 line). Hunk #6 succeeded at 677 (offset 1 line). Hunk #7 succeeded at 748 (offset 1 line). Hunk #8 succeeded at 774 (offset 1 line). Hunk #9 succeeded at 803 (offset 1 line). Hunk #10 succeeded at 891 (offset 1 line). Hunk #11 succeeded at 1013 (offset 45 lines). Hunk #12 succeeded at 1022 (offset 45 lines). patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c Hunk #1 succeeded at 583 (offset 12 lines). Hunk #2 succeeded at 633 (offset 26 lines). Hunk #3 succeeded at 651 with fuzz 2 (offset 27 lines). Hunk #4 succeeded at 697 (offset 27 lines). Hunk #5 succeeded at 717 (offset 27 lines). Hunk #6 succeeded at 727 (offset 27 lines). Hunk #7 succeeded at 764 with fuzz 1 (offset 27 lines). patching file drivers/infiniband/ulp/ipoib/ipoib_multicast.c Hunk #3 succeeded at 691 (offset 39 lines). Hunk #4 succeeded at 706 (offset 39 lines). Hunk #5 succeeded at 721 (offset 39 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_0110_restore_get_stats.patch patching file drivers/infiniband/ulp/ipoib/ipoib_main.c Hunk #1 succeeded at 788 (offset -2 lines). Hunk #2 succeeded at 1028 (offset 6 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_0200_class_device_to_2_6_20.patch patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c Hunk #1 succeeded at 51 with fuzz 2 (offset 12 lines). Hunk #2 succeeded at 1401 (offset 178 lines). Hunk #3 succeeded at 1411 (offset 178 lines). Hunk #4 succeeded at 1449 with fuzz 1 (offset 175 lines). patching file drivers/infiniband/ulp/ipoib/ipoib_main.c Hunk #1 succeeded at 93 (offset -2 lines). Hunk #2 succeeded at 1091 (offset 29 lines). Hunk #3 succeeded at 1130 with fuzz 1 (offset 29 lines). Hunk #4 succeeded at 1152 (offset 29 lines). Hunk #5 succeeded at 1171 with fuzz 1 (offset 29 lines). Hunk #6 succeeded at 1282 (offset 31 lines). patching file drivers/infiniband/ulp/ipoib/ipoib_vlan.c Hunk #2 succeeded at 127 (offset 4 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_0300_class_device_to_2_6_20_umcast.patch patching file drivers/infiniband/ulp/ipoib/ipoib_main.c Hunk #1 succeeded at 1099 (offset 37 lines). Hunk #2 succeeded at 1121 with fuzz 1 (offset 37 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_0400_skb_to_2_6_20.patch patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_to_2_6_16.patch patching file drivers/infiniband/ulp/ipoib/ipoib.h Hunk #1 succeeded at 470 (offset 42 lines). patching file drivers/infiniband/ulp/ipoib/ipoib_main.c Hunk #1 succeeded at 84 (offset -2 lines). Hunk #2 succeeded at 861 (offset -2 lines). Hunk #3 succeeded at 909 (offset 3 lines). Hunk #4 succeeded at 921 (offset 3 lines). Hunk #5 succeeded at 944 (offset 3 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/iw_cxgb3_0100_namespace.patch patching file drivers/infiniband/hw/cxgb3/cxio_hal.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/iw_cxgb3_0200_states.patch patching file drivers/infiniband/hw/cxgb3/iwch_cm.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/iw_nes_100_to_2_6_23.patch patching file drivers/infiniband/hw/nes/nes_hw.c patching file drivers/infiniband/hw/nes/nes_hw.h patching file drivers/infiniband/hw/nes/nes_nic.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/linux_stuff_to_2_6_17.patch patching file drivers/infiniband/core/genalloc.c patching file drivers/infiniband/core/netevent.c patching file drivers/infiniband/core/Makefile Hunk #1 succeeded at 31 (offset 1 line). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/mlx4_0050_wc.patch patching file drivers/infiniband/hw/mlx4/wc.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/mthca_0001_pcix_to_2_6_22.patch patching file drivers/infiniband/hw/mthca/mthca_main.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/rds_to_2_6_20.patch patching file net/rds/sysctl.c Hunk #1 succeeded at 146 (offset 19 lines). patching file net/rds/ib_sysctl.c Hunk #1 succeeded at 151 (offset 23 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/sdp_0100_revert_to_2_6_23.patch patching file drivers/infiniband/ulp/sdp/sdp_main.c Hunk #1 succeeded at 2144 (offset 22 lines). Hunk #2 succeeded at 2162 (offset 22 lines). Hunk #3 succeeded at 2346 (offset 22 lines). Hunk #4 succeeded at 2360 (offset 22 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/srp_0100_revert_role_to_2_6_23.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 1643 (offset 84 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/srp_0200_revert_srp_transport_to_2.6.23.patch patching file drivers/infiniband/ulp/srp/Kconfig patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #3 succeeded at 439 (offset 19 lines). Hunk #4 succeeded at 1628 (offset 84 lines). Hunk #5 succeeded at 1859 (offset 84 lines). Hunk #6 succeeded at 2120 (offset 84 lines). Hunk #7 succeeded at 2138 (offset 84 lines). Hunk #8 succeeded at 2150 (offset 84 lines). Hunk #9 succeeded at 2158 (offset 84 lines). Hunk #10 succeeded at 2171 (offset 84 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/srp_cmd_to_2_6_22.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 505 (offset 50 lines). Hunk #2 succeeded at 518 (offset 50 lines). Hunk #3 succeeded at 730 (offset 46 lines). patching file drivers/infiniband/ulp/srp/ib_srp.h Hunk #1 succeeded at 112 (offset 6 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/ucma_to_2_6_16.patch patching file drivers/infiniband/core/ucma.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/ucm_to_2_6_16.patch patching file drivers/infiniband/core/ucm.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/user_mad_to_2_6_16.patch patching file drivers/infiniband/core/user_mad.c Hunk #1 succeeded at 849 (offset 18 lines). Hunk #2 succeeded at 926 (offset 21 lines). /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/uverbs_to_2_6_16.patch patching file drivers/infiniband/core/uverbs_main.c /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/uverbs_to_2_6_17.patch patching file drivers/infiniband/core/uverbs_main.c Hunk #1 succeeded at 851 with fuzz 1 (offset 36 lines). Created ofed_patch.mk: BACKPORT_INCLUDES=-I${CWD}/kernel_addons/backport/2.6.16/include/ Created configure.mk.kernel: # Current working directory CWD=/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3 # Kernel level KVERSION=2.6.16-54-0.2.5_lustre.1.6.4.3smp ARCH=x86_64 MODULES_DIR=/lib/modules/2.6.16-54-0.2.5_lustre.1.6.4.3smp/updates KSRC=/lib/modules/2.6.16-54-0.2.5_lustre.1.6.4.3smp/build AUTOCONF_H=/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/include/linux/autoconf.h WITH_MAKE_PARAMS= CONFIG_MEMTRACK= CONFIG_DEBUG_INFO=y CONFIG_INFINIBAND=m CONFIG_INFINIBAND_IPOIB=m CONFIG_INFINIBAND_IPOIB_CM=y CONFIG_INFINIBAND_SDP=m CONFIG_INFINIBAND_SRP=m CONFIG_INFINIBAND_SRPT=m CONFIG_INFINIBAND_USER_MAD=m CONFIG_INFINIBAND_USER_ACCESS=m CONFIG_INFINIBAND_ADDR_TRANS=y CONFIG_INFINIBAND_USER_MEM=y CONFIG_INFINIBAND_MTHCA=m CONFIG_MLX4_CORE=m CONFIG_MLX4_INFINIBAND=m CONFIG_MLX4_DEBUG=y CONFIG_INFINIBAND_IPOIB_DEBUG=y CONFIG_INFINIBAND_ISER= CONFIG_SCSI_ISCSI_ATTRS= CONFIG_ISCSI_TCP= CONFIG_INFINIBAND_EHCA= CONFIG_INFINIBAND_EHCA_SCALING= CONFIG_RDS=m CONFIG_RDS_IB=m CONFIG_RDS_TCP=m CONFIG_RDS_DEBUG= CONFIG_INFINIBAND_MADEYE= CONFIG_INFINIBAND_QLGC_VNIC=m CONFIG_INFINIBAND_CXGB3=m CONFIG_CHELSIO_T3=m CONFIG_INFINIBAND_NES=m CONFIG_INFINIBAND_IPOIB_DEBUG_DATA= CONFIG_INFINIBAND_SDP_SEND_ZCOPY= CONFIG_INFINIBAND_SDP_RECV_ZCOPY= CONFIG_INFINIBAND_SDP_DEBUG=y CONFIG_INFINIBAND_SDP_DEBUG_DATA= CONFIG_INFINIBAND_IPATH= CONFIG_INFINIBAND_MTHCA_DEBUG=y CONFIG_INFINIBAND_QLGC_VNIC_DEBUG= CONFIG_INFINIBAND_QLGC_VNIC_STATS= CONFIG_INFINIBAND_CXGB3_DEBUG= CONFIG_INFINIBAND_NES_DEBUG= CONFIG_INFINIBAND_AMSO1100= Created /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/include/linux/autoconf.h: #undef CONFIG_MEMTRACK #undef CONFIG_DEBUG_INFO #undef CONFIG_INFINIBAND #undef CONFIG_INFINIBAND_IPOIB #undef CONFIG_INFINIBAND_IPOIB_CM #undef CONFIG_INFINIBAND_SDP #undef CONFIG_INFINIBAND_SRP #undef CONFIG_INFINIBAND_SRPT #undef CONFIG_INFINIBAND_USER_MAD #undef CONFIG_INFINIBAND_USER_ACCESS #undef CONFIG_INFINIBAND_ADDR_TRANS #undef CONFIG_INFINIBAND_USER_MEM #undef CONFIG_INFINIBAND_MTHCA #undef CONFIG_MLX4_CORE #undef CONFIG_MLX4_DEBUG #undef CONFIG_MLX4_INFINIBAND #undef CONFIG_INFINIBAND_IPOIB_DEBUG #undef CONFIG_INFINIBAND_ISER #undef CONFIG_INFINIBAND_EHCA #undef CONFIG_INFINIBAND_EHCA_SCALING #undef CONFIG_RDS #undef CONFIG_RDS_IB #undef CONFIG_RDS_TCP #undef CONFIG_RDS_DEBUG #undef CONFIG_INFINIBAND_MADEYE #undef CONFIG_INFINIBAND_QLGC_VNIC #undef CONFIG_INFINIBAND_QLGC_VNIC_DEBUG #undef CONFIG_INFINIBAND_QLGC_VNIC_STATS #undef CONFIG_INFINIBAND_CXGB3 #undef CONFIG_INFINIBAND_CXGB3_DEBUG #undef CONFIG_CHELSIO_T3 #undef CONFIG_INFINIBAND_NES #undef CONFIG_INFINIBAND_NES_DEBUG #undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA #undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY #undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY #undef CONFIG_INFINIBAND_SDP_DEBUG #undef CONFIG_INFINIBAND_SDP_DEBUG_DATA #undef CONFIG_INFINIBAND_IPATH #undef CONFIG_INFINIBAND_MTHCA_DEBUG #undef CONFIG_INFINIBAND_AMSO1100 #define CONFIG_INFINIBAND 1 #define CONFIG_INFINIBAND_IPOIB 1 #define CONFIG_INFINIBAND_IPOIB_CM 1 #define CONFIG_INFINIBAND_SDP 1 #define CONFIG_INFINIBAND_SRP 1 #define CONFIG_INFINIBAND_SRPT 1 #define CONFIG_INFINIBAND_USER_MAD 1 #define CONFIG_INFINIBAND_USER_ACCESS 1 #define CONFIG_INFINIBAND_ADDR_TRANS 1 #define CONFIG_INFINIBAND_USER_MEM 1 #define CONFIG_INFINIBAND_MTHCA 1 #define CONFIG_INFINIBAND_QLGC_VNIC 1 #define CONFIG_INFINIBAND_CXGB3 1 #define CONFIG_CHELSIO_T3 1 #define CONFIG_INFINIBAND_NES 1 #define CONFIG_INFINIBAND_IPOIB_DEBUG 1 #undef CONFIG_INFINIBAND_ISER #undef CONFIG_SCSI_ISCSI_ATTRS #undef CONFIG_ISCSI_TCP #undef CONFIG_INFINIBAND_EHCA #define CONFIG_RDS 1 #define CONFIG_RDS_IB 1 #define CONFIG_RDS_TCP 1 #undef CONFIG_RDS_DEBUG #undef CONFIG_INFINIBAND_QLGC_VNIC_DEBUG #undef CONFIG_INFINIBAND_QLGC_VNIC_STATS #undef CONFIG_INFINIBAND_CXGB3_DEBUG #undef CONFIG_INFINIBAND_NES_DEBUG #define CONFIG_MLX4_CORE 1 #define CONFIG_MLX4_INFINIBAND 1 #define CONFIG_MLX4_DEBUG 1 #undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA #undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY #undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY #define CONFIG_INFINIBAND_SDP_DEBUG 1 #undef CONFIG_INFINIBAND_SDP_DEBUG_DATA #undef CONFIG_INFINIBAND_IPATH #define CONFIG_INFINIBAND_MTHCA_DEBUG 1 #undef CONFIG_INFINIBAND_MADEYE #undef CONFIG_INFINIBAND_AMSO1100 + install -d /var/tmp/OFED//usr/local/ofed-1.3/src/ofa_kernel + cp -a /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/include/ /var/tmp/OFED//usr/local/ofed-1.3/src/ofa_kernel + cp -a /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/configure.mk.kernel /var/tmp/OFED//usr/local/ofed-1.3/src/ofa_kernel + cd /var/tmp/OFED//usr/local/ofed-1.3/src/ + ln -s ofa_kernel openib + cd - /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3 + make kernel Building kernel modules Kernel version: 2.6.16-54-0.2.5_lustre.1.6.4.3smp Modules directory: //lib/modules/2.6.16-54-0.2.5_lustre.1.6.4.3smp/updates Kernel sources: /lib/modules/2.6.16-54-0.2.5_lustre.1.6.4.3smp/build env CWD=/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3 BACKPORT_INCLUDES=-I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/ \ make -C /lib/modules/2.6.16-54-0.2.5_lustre.1.6.4.3smp/build SUBDIRS="/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3" \ V=1 \ CONFIG_MEMTRACK= \ CONFIG_DEBUG_INFO=y \ CONFIG_INFINIBAND=m \ CONFIG_INFINIBAND_IPOIB=m \ CONFIG_INFINIBAND_IPOIB_CM=y \ CONFIG_INFINIBAND_SDP=m \ CONFIG_INFINIBAND_SRP=m \ CONFIG_INFINIBAND_SRPT=m \ CONFIG_INFINIBAND_USER_MAD=m \ CONFIG_INFINIBAND_USER_ACCESS=m \ CONFIG_INFINIBAND_USER_MEM=y \ CONFIG_INFINIBAND_ADDR_TRANS=y \ CONFIG_INFINIBAND_MTHCA=m \ CONFIG_INFINIBAND_IPOIB_DEBUG=y \ CONFIG_INFINIBAND_ISER= \ CONFIG_SCSI_ISCSI_ATTRS= \ CONFIG_ISCSI_TCP= \ CONFIG_INFINIBAND_EHCA= \ CONFIG_INFINIBAND_EHCA_SCALING= \ CONFIG_RDS=m \ CONFIG_RDS_IB=m \ CONFIG_RDS_TCP=m \ CONFIG_RDS_DEBUG= \ CONFIG_INFINIBAND_IPOIB_DEBUG_DATA= \ CONFIG_INFINIBAND_SDP_SEND_ZCOPY= \ CONFIG_INFINIBAND_SDP_RECV_ZCOPY= \ CONFIG_INFINIBAND_SDP_DEBUG=y \ CONFIG_INFINIBAND_SDP_DEBUG_DATA= \ CONFIG_INFINIBAND_IPATH= \ CONFIG_INFINIBAND_MTHCA_DEBUG=y \ CONFIG_INFINIBAND_MADEYE= \ CONFIG_INFINIBAND_QLGC_VNIC=m \ CONFIG_INFINIBAND_QLGC_VNIC_DEBUG= \ CONFIG_INFINIBAND_QLGC_VNIC_STATS= \ CONFIG_CHELSIO_T3=m \ CONFIG_INFINIBAND_CXGB3=m \ CONFIG_INFINIBAND_CXGB3_DEBUG= \ CONFIG_INFINIBAND_NES=m \ CONFIG_INFINIBAND_NES_DEBUG= \ CONFIG_MLX4_CORE=m \ CONFIG_MLX4_INFINIBAND=m \ CONFIG_MLX4_ETHERNET= \ CONFIG_MLX4_DEBUG=y \ CONFIG_INFINIBAND_AMSO1100= \ LINUXINCLUDE=' \ -include include/linux/autoconf.h \ -include /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/include/linux/autoconf.h \ -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/ \ \ \ -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/include \ -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/debug \ -I/usr/local/include/scst \ -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/ulp/srpt \ -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/net/cxgb3 \ -Iinclude \ $(if $(KBUILD_SRC),-Iinclude2 -I$(srctree)/include) \ ' \ modules make[1]: Entering directory `/usr/src/linux-2.6.16-54-0.2.5_lustre.1.6.4.3' rm -rf /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/.tmp_versions mkdir -p /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/.tmp_versions make -f scripts/Makefile.build obj=/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3 make -f scripts/Makefile.build obj=/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband make -f scripts/Makefile.build obj=/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/core gcc -Wp,-MD,/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/core/.addr.o.d -nostdinc -isystem /usr/lib64/gcc/x86_64-suse-linux/4.1.0/include -D__KERNEL__ -include include/linux/autoconf.h -include /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/include/linux/autoconf.h -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/ -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/include -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/debug -I/usr/local/include/scst -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/ulp/srpt -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/net/cxgb3 -Iinclude -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -Werror-implicit-function-declaration -fno-strict-aliasing -fno-common -ffreestanding -Os -mtune=generic -m64 -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -fno-asynchronous-unwind-tables -funit-at-a-time -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -fomit-frame-pointer -g -fno-stack-protector -Wdeclaration-after-statement -Wno-pointer-sign -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(addr)" -D"KBUILD_MODNAME=KBUILD_STR(ib_addr)" -c -o /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/core/.tmp_addr.o /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/core/addr.c In file included from include/asm/processor.h:23, from include/linux/prefetch.h:14, from include/linux/list.h:7, from include/linux/mutex.h:13, from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/mutex.h:5, from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/core/addr.c:31: /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/cpumask.h:6:1: warning: "for_each_possible_cpu" redefined In file included from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/cpumask.h:4, from include/asm/processor.h:23, from include/linux/prefetch.h:14, from include/linux/list.h:7, from include/linux/mutex.h:13, from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/mutex.h:5, from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/core/addr.c:31: include/linux/cpumask.h:411:1: warning: this is the location of the previous definition In file included from include/linux/if_ether.h:111, from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/if_ether.h:4, from include/linux/netdevice.h:29, from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/netdevice.h:4, from include/linux/inetdevice.h:7, from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/inetdevice.h:4, from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/core/addr.c:32: /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/skbuff.h: In function ‘backport_skb_linearize_to_2_6_17’: /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/skbuff.h:13: error: too many arguments to function ‘skb_linearize’ /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/skbuff.h: At top level: /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/skbuff.h:101: error: redefinition of ‘skb_is_gso’ include/linux/skbuff.h:1424: error: previous definition of ‘skb_is_gso’ was here /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/skbuff.h: In function ‘skb_is_gso’: /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/skbuff.h:102: error: ‘struct skb_shared_info’ has no member named ‘tso_size’ In file included from include/linux/inetdevice.h:7, from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/inetdevice.h:4, from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/core/addr.c:32: /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/netdevice.h: At top level: /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/netdevice.h:7: error: redefinition of ‘netif_tx_lock’ include/linux/netdevice.h:925: error: previous definition of ‘netif_tx_lock’ was here /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/netdevice.h: In function ‘netif_tx_lock’: /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/netdevice.h:8: error: ‘struct net_device’ has no member named ‘xmit_lock’ /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/netdevice.h: At top level: /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/netdevice.h:13: error: redefinition of ‘netif_tx_unlock’ include/linux/netdevice.h:945: error: previous definition of ‘netif_tx_unlock’ was here /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/netdevice.h: In function ‘netif_tx_unlock’: /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/netdevice.h:15: error: ‘struct net_device’ has no member named ‘xmit_lock’ make[4]: *** [/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/core] Error 2 make[2]: *** [/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband] Error 2 make[1]: *** [_module_/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3] Error 2 make[1]: Leaving directory `/usr/src/linux-2.6.16-54-0.2.5_lustre.1.6.4.3' make: *** [kernel] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.52212 (%build) RPM build errors: user vlad does not exist - using root group vlad does not exist - using root user vlad does not exist - using root group vlad does not exist - using root Bad exit status from /var/tmp/rpm-tmp.52212 (%build) From bs at q-leap.de Tue Apr 8 02:35:35 2008 From: bs at q-leap.de (Bernd Schubert) Date: Tue, 8 Apr 2008 11:35:35 +0200 Subject: [ofa-general] ERR 0108: Unknown remote side In-Reply-To: <20080408014406.GA16864@sashak.voltaire.com> References: <200804041147.27565.bs@q-leap.de> <20080408014406.GA16864@sashak.voltaire.com> Message-ID: <200804081135.35846.bs@q-leap.de> Hello Sasha, On Tuesday 08 April 2008 03:44:06 Sasha Copyist wrote: > Hi Bernd, > > On 11:47 Fri 04 Apr , Bernd Schubert wrote: > > opensm-3.2.1 logs some error messages like this: > > > > Apr 04 00:00:08 325114 [4580A960] 0x01 -> > > __osm_state_mgr_light_sweep_start: ERR 0108: Unknown remote side for node > > 0 > > x000b8cffff002ba2(SW_pfs1_leaf4) port 13. Adding to light sweep sampling > > list Apr 04 00:00:08 325126 [4580A960] 0x01 -> Directed Path Dump of 3 > > hop path: Path = 0,1,14,13 > > > > > > From ibnetdiscover output I see port13 of this switch is a > > switch-interconnect (sorry, I don't know what the correct name/identifier > > for switches within switches): > > > > [13] "S-000b8cffff002bfa"[13] # "SW_pfs1_inter7" lid > > 263 4xSDR > > It is possible that port was DOWN during first subnet discovery. Finally > everything should be initialized after those messages. Isn't it the case > here? I think everything is initialized, but I don't think the port was down during first subnet discovery, since the port is on a spine board (I called it 'inter') to another switch system. We also never added any leafes to the switches. Thanks, Bernd -- Bernd Schubert Q-Leap Networks GmbH From erezz at voltaire.com Tue Apr 8 03:27:40 2008 From: erezz at voltaire.com (Erez Zilber) Date: Tue, 08 Apr 2008 13:27:40 +0300 Subject: [ofa-general] [PATCH] IB/iSER: Release connection resources when receiving a RDMA_CM_EVENT_DEVICE_REMOVAL event Message-ID: <47FB489C.6030507@voltaire.com> When a RDMA_CM_EVENT_DEVICE_REMOVAL event is raised, iSER should release the connection resources except for the rdma cm id (which will be released by the cma itself). This behavior is necessary if IB modules are unloaded while open-iscsi is still running. Currently, iSER just initiates a BUG() call. Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/iscsi_iser.h | 2 ++ drivers/infiniband/ulp/iser/iser_verbs.c | 18 ++++++++++++++---- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index 1ee867b..9fe0b3f 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -249,6 +249,8 @@ struct iser_conn { struct iser_page_vec *page_vec; /* represents SG to fmr maps* * maps serialized as tx is*/ struct list_head conn_list; /* entry in ig conn list */ + wait_queue_head_t rem_wait; + int dev_removed; }; struct iscsi_iser_conn { diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 993f0a8..9beddb9 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -219,7 +219,8 @@ static int iser_free_ib_conn_res(struct iser_conn *ib_conn) if (ib_conn->qp != NULL) rdma_destroy_qp(ib_conn->cma_id); - if (ib_conn->cma_id != NULL) + /* if the device was removed, the cma will call rdma_destroy_id itself */ + if (ib_conn->cma_id != NULL && !ib_conn->dev_removed) rdma_destroy_id(ib_conn->cma_id); ib_conn->fmr_pool = NULL; @@ -325,7 +326,10 @@ static void iser_conn_release(struct iser_conn *ib_conn) iser_device_try_release(device); if (ib_conn->iser_conn) ib_conn->iser_conn->ib_conn = NULL; - kfree(ib_conn); + if (ib_conn->dev_removed) + wake_up_interruptible(&ib_conn->rem_wait); + else + kfree(ib_conn); } /** @@ -451,6 +455,7 @@ static void iser_disconnected_handler(struct rdma_cm_id *cma_id) static int iser_cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event) { int ret = 0; + struct iser_conn *ib_conn; iser_err("event %d conn %p id %p\n",event->event,cma_id->context,cma_id); @@ -476,8 +481,12 @@ static int iser_cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *eve iser_disconnected_handler(cma_id); break; case RDMA_CM_EVENT_DEVICE_REMOVAL: - iser_err("Device removal is currently unsupported\n"); - BUG(); + ib_conn = (struct iser_conn *)cma_id->context; + ib_conn->dev_removed = 1; + iser_disconnected_handler(cma_id); + wait_event_interruptible(ib_conn->rem_wait, ib_conn->state == ISER_CONN_DOWN); + kfree(ib_conn); + ret = 1; break; default: iser_err("Unexpected RDMA CM event (%d)\n", event->event); @@ -497,6 +506,7 @@ int iser_conn_init(struct iser_conn **ibconn) } ib_conn->state = ISER_CONN_INIT; init_waitqueue_head(&ib_conn->wait); + init_waitqueue_head(&ib_conn->rem_wait); atomic_set(&ib_conn->post_recv_buf_count, 0); atomic_set(&ib_conn->post_send_buf_count, 0); INIT_LIST_HEAD(&ib_conn->conn_list); -- 1.5.3.6 Roland, This patch was built against your 2.6.26 branch. Can you add it to your list? Thanks, Erez From EObuhova at agrocombank.kiev.ua Tue Apr 8 06:21:07 2008 From: EObuhova at agrocombank.kiev.ua (Freida Dunbar) Date: Tue, 8 Apr 2008 14:21:07 +0100 Subject: [ofa-general] Start earning the salary you deserve by obtaining the appropriate University Degree. Message-ID: <01c89983$c8e0cb80$0c23933a@EObuhova> Want the degree but can�t find the time? CALL +1 770-456-5282 ! WHAT A GREAT IDEA! We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree. Bachelors, Masters or even a Doctorate. Think of it, within four to six weeks, you too could be a college graduate. Many people share the same frustration, they are all doing the work of the person that has the degree and the person that has the degree is getting all the money. Don�t you think that it is time you were paid fair compensation for the level of work you are already doing? This is your chance to finally make the right move and receive your due benefits. If you are like most people, you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success. CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE! CALL +1 770-456-5282 ! billiard-room was situated at the head of the front stairs. Warrington broach the subject most vital to both. Finally, Mrs. Bennington beauty is of the kind in which nature has no hand. No man is a hero to From moshek at voltaire.com Tue Apr 8 05:56:20 2008 From: moshek at voltaire.com (Moshe Kazir) Date: Tue, 8 Apr 2008 15:56:20 +0300 Subject: [ofa-general] ofed-1.3 uninstall.sh do not remove all the infiniband stack components properlly on RH 4 u 5 or rh 4 u 6 full instalation. In-Reply-To: <47FAA913.7090805@opengridcomputing.com> References: <47FA3D60.3020905@opengridcomputing.com> <47FAA913.7090805@opengridcomputing.com> Message-ID: <39C75744D164D948A170E9792AF8E7CAC5AEED@exil.voltaire.com> Some rpm's (openmpi-libs, libmthca-devel,etc.) are not removed and cause dependency problems. The attaches patch solves the problem. Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -------------- next part -------------- A non-text attachment was scrubbed... Name: ofed_1.3_uninstall_sh.patch Type: application/octet-stream Size: 4249 bytes Desc: ofed_1.3_uninstall_sh.patch URL: From michael.heinz at qlogic.com Tue Apr 8 07:27:55 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Tue, 8 Apr 2008 09:27:55 -0500 Subject: [ofa-general] MVAPICH2 crashes on mixed fabric In-Reply-To: References: Message-ID: Wei, No joy. The following command: + /usr/mpi/pgi/mvapich2-1.0.2/bin/mpiexec -1 -machinefile /home/mheinz/mvapich2-pgi/mpi_hosts -n 4 -env MV2_USE_COALESCE 0 -env MV2_VBUF_TOTAL_SIZE 9216 PMB2.2.1/SRC_PMB/PMB-MPI1 Produced the following error: [0] Abort: Got FATAL event 3 at line 796 in file ibv_channel_manager.c rank 0 in job 48 compute-0-3.local_33082 caused collective abort of all ranks exit status of rank 0: killed by signal 9 + set +x Note that compute-0-3 has a connect-x HCA. If I restrict the ring to only nodes with connect-x the problem does not occur. This isn't a huge problem for me; this 4-node cluster is actually for testing the creation of Rocks Rolls and I can simply record it as a known limitation when using mvapich2 - but it could impact users in the field if a cluster gets extended with newer HCAs. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: wei huang [mailto:huanwei at cse.ohio-state.edu] Sent: Sunday, April 06, 2008 8:58 PM To: Mike Heinz Cc: general at lists.openfabrics.org Subject: Re: [ofa-general] MVAPICH2 crashes on mixed fabric Hi Mike, Currently mvapich2 will detect different HCA type and thus select different parameters for communication, which may cause the problem. We are working on this feature and it will be available in our next release. For now, if you want to run on this setup, please set few environmental variables like: mpiexec -n 2 -env MV2_USE_COALESCE 0 -env MV2_VBUF_TOTAL_SIZE 9216 ./a.out Please let us know if this works. Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Fri, 4 Apr 2008, Mike Heinz wrote: > Hey, all, I'm not sure if this is a known bug or some sort of > limitation I'm unaware of, but I've been building and testing with the > OFED 1.3 GA release on a small fabric that has a mix of Arbel-based > and newer Connect-X HCAs. > > What I've discovered is that mvapich and openmpi work fine across the > entire fabric, but mvapich2 crashes when I use a mix of Arbels and > Connect-X. The errors vary depending on the test program but here's an > example: > > [mheinz at compute-0-0 IMB-3.0]$ mpirun -n 5 ./IMB-MPI1 . > . > . > (output snipped) > . > . > . > > #--------------------------------------------------------------------- > -- > ------ > # Benchmarking Sendrecv > # #processes = 2 > # ( 3 additional processes waiting in MPI_Barrier) > #--------------------------------------------------------------------- > -- > ------ > #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] > Mbytes/sec > 0 1000 3.51 3.51 3.51 > 0.00 > 1 1000 3.63 3.63 3.63 > 0.52 > 2 1000 3.67 3.67 3.67 > 1.04 > 4 1000 3.64 3.64 3.64 > 2.09 > 8 1000 3.67 3.67 3.67 > 4.16 > 16 1000 3.67 3.67 3.67 > 8.31 > 32 1000 3.74 3.74 3.74 > 16.32 > 64 1000 3.90 3.90 3.90 > 31.28 > 128 1000 4.75 4.75 4.75 > 51.39 > 256 1000 5.21 5.21 5.21 > 93.79 > 512 1000 5.96 5.96 5.96 > 163.77 > 1024 1000 7.88 7.89 7.89 > 247.54 > 2048 1000 11.42 11.42 11.42 > 342.00 > 4096 1000 15.33 15.33 15.33 > 509.49 > 8192 1000 22.19 22.20 22.20 > 703.83 > 16384 1000 34.57 34.57 34.57 > 903.88 > 32768 1000 51.32 51.32 51.32 > 1217.94 > 65536 640 85.80 85.81 85.80 > 1456.74 > 131072 320 155.23 155.24 155.24 > 1610.40 > 262144 160 301.84 301.86 301.85 > 1656.39 > 524288 80 598.62 598.69 598.66 > 1670.31 > 1048576 40 1175.22 1175.30 1175.26 > 1701.69 > 2097152 20 2309.05 2309.05 2309.05 > 1732.32 > 4194304 10 4548.72 4548.98 4548.85 > 1758.64 > [0] Abort: Got FATAL event 3 > at line 796 in file ibv_channel_manager.c > rank 0 in job 1 compute-0-0.local_36049 caused collective abort of > all ranks > exit status of rank 0: killed by signal 9 > > If, however, I define my mpdring to contain only Connect-X systems OR > only Arbel systems, IMB-MPI1 runs to completion. > > Can any suggest a workaround or is this a real bug with mvapich2? > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania > > From Brian.Murrell at Sun.COM Tue Apr 8 07:43:09 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Tue, 08 Apr 2008 10:43:09 -0400 Subject: [ofa-general] kernel ib build (OFED 1.3) fails on SLES 10 In-Reply-To: <200804081013.52983.grossmann@hlrs.de> References: <200804081013.52983.grossmann@hlrs.de> Message-ID: <1207665789.13415.20.camel@pc.ilinx> On Tue, 2008-04-08 at 10:13 +0200, Thomas Großmann wrote: > Hi, Hi > kernel ib build (OFED 1.3) fails on SLES 10. To be fair, it fails on Sun's version of the SLES 10 kernel for Lustre, and here is why: > Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.52212 > + umask 022 > + cd /var/tmp/OFED_topdir/BUILD > + /bin/rm -rf /var/tmp/OFED > ++ dirname /var/tmp/OFED > + /bin/mkdir -p /var/tmp > + /bin/mkdir /var/tmp/OFED > + cd ofa_kernel-1.3 > + rm -rf /var/tmp/OFED > + cd /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3 > + mkdir -p /var/tmp/OFED//usr/local/ofed-1.3/src > + cp -a /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3 /var/tmp/OFED//usr/local/ofed-1.3/src > + ./configure --prefix=/usr/local/ofed-1.3 --kernel-version 2.6.16-54-0.2.5_lustre.1.6.4.3smp --kernel-sources /lib/modules/2.6.16-54-0.2.5_lustre.1.6.4.3smp/build --modules-dir /lib/modules/2.6.16-54-0.2.5_lustre.1.6.4.3smp/updates --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod --with-mthca-mod --with-mlx4-mod --with-cxgb3-mod --with-nes-mod --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-srp-target-mod --with-rds-mod --with-qlgc_vnic-mod > ofed_patch.mk does not exist. running ofed_patch.sh > /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/ofed_scripts/ofed_patch.sh --kernel-version 2.6.16-54-0.2.5_lustre.1.6.4.3smp ----------------------------------------------------------------------------------------------------------------------^ This kernel version does match what ofed_patch.sh thinks is a SLES 10 kernel because it is not of the form "2.6.16.*-*-*". Here's the code in ofed_patch.sh which detects SLES 10 kernels and assigns the right patch series for it: 2.6.16.*-*-*) minor=$(echo $KVERSION | cut -d"." -f4 | cut -d"-" -f1) if [ $minor -lt 37 ]; then echo 2.6.16_sles10 elif [ $minor -lt 60 ]; then echo 2.6.16_sles10_sp1 else echo 2.6.16_sles10_sp2 fi ;; The lustre kernel version for SLES 10 is "2.6.16-54-0.2.5_lustre.1.6.4.3smp". In order for it to match the above code it needs to have a "-" put before the "smp" at the end. I am working on the Lustre build process to do exactly this right at this moment as well as build our released RPMs with OFED 1.3 support right in them. My work is being done in Lustre bugzilla ticket 15316. When I have something working, I will post an attachment there with a patch for our current b1_6 that should apply to 1.6.4.3. In theory you should be able use the "--with-backport*" configure options to override this detection when building the RPMs however see my message to this list (inconsistent use of --with-backport[-patches]) last Saturday about how this seems to be broken currently. Cheers, b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From contact at hondapromo.com Tue Apr 8 08:13:17 2008 From: contact at hondapromo.com (HONDA SPLASH PROMO) Date: Tue, 08 Apr 2008 10:13:17 -0500 Subject: [ofa-general] HONDA SPLASH PROMO (((Congratulations!!!))) You Have won the sum of � 400,000.00 pounds Message-ID: European Region Office: 10B Canada Square, London HP1 8HFD, England. Ref: 475061725 Batch: 7056490902/188 CONGRATULATIONS! AWARD NOTIFICATION We happily announce to you the draw of the Honda Lottery Promotions for the year 2007 at our Lotto Draw Headquarters London, United Kingdom organized by Honda. This is to inform you that you have won a prize money of GBP ��400,000.00 (Four hundred thousand British Pounds Sterling). Honda Promo Inc. arranged and gathered some of all the e-mail addresses of the people that are active online, among the millions that subscribed to all Email Providers most especially Yahoo and Hotmail, we only selected Twenty (20) candidates per annually as our winners through Electronic Balloting System (EBS) without the candidate applying, we congratulate you for being one of our lucky winners. PAYMENT OF PRIZE AND CLAIM We are sorry that your Payment Approval File was sent to London due to we have 3 lucky winners in United Kingdom so that you can be cleared and paid simultaneously there. You are to contact our UK Location Claim Office on or before your date of Claim. Honda Lottery Prize must be claimed not later than 14 days from date of Draw Notification after the Draw date in which Prize has won. Note: Any prize not claimed within this period (14 days) will be forfeited. These are your identification numbers: Ticket number: 00545 188 564756, Prize # 77801209/UK, Winning Number:4634553/N, Serial number 5368/02 Lucky numbers: 17 98 09 67 46 HOW TO CLAIM YOUR PRIZE: Simply contact our Customer Service Information Headquarters (CSIH Department), Mr Mark Johnson E-mail: h_pro_splash at hotmail.com IMPORTANT NOTICE Send these following informations to the CSIH Department immediately for further procession. 1 Full Name: ...................... 2 Cell phone:...................... or Telephone:................. 3 Contact Address .................... 4 Date of birth ........................... 5 Age ................................. 6 Occupation............................. 7 Sex................................. 8 Marital Status.............. 9 Country of Origin.......................... KEEP THIS PRIVATE AND CONFIDENTIAL UNTIL YOU RECEIVE YOUR PRIZE MONEY. NOTE THAT NOBODY OR EVEN THE APPOINTED BANK HAS THE RIGHT TO TOUCH, DEDUCT OR GET ACCESS TO YOUR PRIZE MONEY FOR ANY REASON THEY HAVE BEEN WARNED STRICTLY! For security reasons, we advice all winners to keep this information confidential from the public until your claim is processed and your prize released to you. This is part of our security protocol to avoid double claiming and unwarranted taking advantage of this program by non-participant or unofficial personnel. Congratulations once again on your winnings!!! From sashak at voltaire.com Tue Apr 8 11:31:13 2008 From: sashak at voltaire.com (Sasha Copyist) Date: Tue, 8 Apr 2008 18:31:13 +0000 Subject: [ofa-general] ERR 0108: Unknown remote side In-Reply-To: <200804081135.35846.bs@q-leap.de> References: <200804041147.27565.bs@q-leap.de> <20080408014406.GA16864@sashak.voltaire.com> <200804081135.35846.bs@q-leap.de> Message-ID: <20080408183113.GA18308@sashak.voltaire.com> Hi Bernd, [adding Yevgeny..] On 11:35 Tue 08 Apr , Bernd Schubert wrote: > On Tuesday 08 April 2008 03:44:06 Sasha Copyist wrote: > > Hi Bernd, > > > > On 11:47 Fri 04 Apr , Bernd Schubert wrote: > > > opensm-3.2.1 logs some error messages like this: > > > > > > Apr 04 00:00:08 325114 [4580A960] 0x01 -> > > > __osm_state_mgr_light_sweep_start: ERR 0108: Unknown remote side for node > > > 0 > > > x000b8cffff002ba2(SW_pfs1_leaf4) port 13. Adding to light sweep sampling > > > list Apr 04 00:00:08 325126 [4580A960] 0x01 -> Directed Path Dump of 3 > > > hop path: Path = 0,1,14,13 > > > > > > > > > From ibnetdiscover output I see port13 of this switch is a > > > switch-interconnect (sorry, I don't know what the correct name/identifier > > > for switches within switches): > > > > > > [13] "S-000b8cffff002bfa"[13] # "SW_pfs1_inter7" lid > > > 263 4xSDR > > > > It is possible that port was DOWN during first subnet discovery. Finally > > everything should be initialized after those messages. Isn't it the case > > here? > > I think everything is initialized, but I don't think the port was down during > first subnet discovery, since the port is on a spine board (I called > it 'inter') to another switch system. We also never added any leafes to the > switches. It is interesting phenomena then. Yevgeny, do you aware about such issue with Flextrinocs switches? Sasha From andrea at qumranet.com Tue Apr 8 08:44:07 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 08 Apr 2008 17:44:07 +0200 Subject: [ofa-general] [PATCH 4 of 9] Move the tlb flushing into free_pgtables. The conversion of the locks In-Reply-To: Message-ID: <2c2ed514f294dbbfc661.1207669447@duo.random> # HG changeset patch # User Andrea Arcangeli # Date 1207666463 -7200 # Node ID 2c2ed514f294dbbfc66157f771bc900789ac6005 # Parent 33de2e17d0f5670515833bf8d3d2ea19e2a85b09 Move the tlb flushing into free_pgtables. The conversion of the locks taken for reverse map scanning would require taking sleeping locks in free_pgtables(). Moving the tlb flushing into free_pgtables allows sleeping in parts of free_pgtables(). This means that we do a tlb_finish_mmu() before freeing the page tables. Strictly speaking there may not be the need to do another tlb flush after freeing the tables. But its the only way to free a series of page table pages from the tlb list. And we do not want to call into the page allocator for performance reasons. Aim9 numbers look okay after this patch. Signed-off-by: Christoph Lameter diff --git a/include/linux/mm.h b/include/linux/mm.h --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -751,8 +751,8 @@ void *private); void free_pgd_range(struct mmu_gather **tlb, unsigned long addr, unsigned long end, unsigned long floor, unsigned long ceiling); -void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *start_vma, - unsigned long floor, unsigned long ceiling); +void free_pgtables(struct vm_area_struct *start_vma, unsigned long floor, + unsigned long ceiling); int copy_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *vma); void unmap_mapping_range(struct address_space *mapping, diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -272,9 +272,11 @@ } while (pgd++, addr = next, addr != end); } -void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *vma, - unsigned long floor, unsigned long ceiling) +void free_pgtables(struct vm_area_struct *vma, unsigned long floor, + unsigned long ceiling) { + struct mmu_gather *tlb; + while (vma) { struct vm_area_struct *next = vma->vm_next; unsigned long addr = vma->vm_start; @@ -286,8 +288,10 @@ unlink_file_vma(vma); if (is_vm_hugetlb_page(vma)) { - hugetlb_free_pgd_range(tlb, addr, vma->vm_end, + tlb = tlb_gather_mmu(vma->vm_mm, 0); + hugetlb_free_pgd_range(&tlb, addr, vma->vm_end, floor, next? next->vm_start: ceiling); + tlb_finish_mmu(tlb, addr, vma->vm_end); } else { /* * Optimization: gather nearby vmas into one call down @@ -299,8 +303,10 @@ anon_vma_unlink(vma); unlink_file_vma(vma); } - free_pgd_range(tlb, addr, vma->vm_end, + tlb = tlb_gather_mmu(vma->vm_mm, 0); + free_pgd_range(&tlb, addr, vma->vm_end, floor, next? next->vm_start: ceiling); + tlb_finish_mmu(tlb, addr, vma->vm_end); } vma = next; } diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1752,9 +1752,9 @@ mmu_notifier_invalidate_range_start(mm, start, end); unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL); vm_unacct_memory(nr_accounted); - free_pgtables(&tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS, + tlb_finish_mmu(tlb, start, end); + free_pgtables(vma, prev? prev->vm_end: FIRST_USER_ADDRESS, next? next->vm_start: 0); - tlb_finish_mmu(tlb, start, end); mmu_notifier_invalidate_range_end(mm, start, end); } @@ -2051,8 +2051,8 @@ /* Use -1 here to ensure all VMAs in the mm are unmapped */ end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL); vm_unacct_memory(nr_accounted); - free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 0); tlb_finish_mmu(tlb, 0, end); + free_pgtables(vma, FIRST_USER_ADDRESS, 0); /* * Walk the list again, actually closing and freeing it, From andrea at qumranet.com Tue Apr 8 08:44:06 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 08 Apr 2008 17:44:06 +0200 Subject: [ofa-general] [PATCH 3 of 9] Moves all mmu notifier methods outside the PT lock (first and not last In-Reply-To: Message-ID: <33de2e17d0f567051583.1207669446@duo.random> # HG changeset patch # User Andrea Arcangeli # Date 1207666463 -7200 # Node ID 33de2e17d0f5670515833bf8d3d2ea19e2a85b09 # Parent baceb322b45ed43280654dac6c964c9d3d8a936f Moves all mmu notifier methods outside the PT lock (first and not last step to make them sleep capable). Signed-off-by: Andrea Arcangeli diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -117,27 +117,6 @@ INIT_HLIST_HEAD(&mm->mmu_notifier_list); } -#define ptep_clear_flush_notify(__vma, __address, __ptep) \ -({ \ - pte_t __pte; \ - struct vm_area_struct *___vma = __vma; \ - unsigned long ___address = __address; \ - __pte = ptep_clear_flush(___vma, ___address, __ptep); \ - mmu_notifier_invalidate_page(___vma->vm_mm, ___address); \ - __pte; \ -}) - -#define ptep_clear_flush_young_notify(__vma, __address, __ptep) \ -({ \ - int __young; \ - struct vm_area_struct *___vma = __vma; \ - unsigned long ___address = __address; \ - __young = ptep_clear_flush_young(___vma, ___address, __ptep); \ - __young |= mmu_notifier_clear_flush_young(___vma->vm_mm, \ - ___address); \ - __young; \ -}) - #else /* CONFIG_MMU_NOTIFIER */ static inline void mmu_notifier_release(struct mm_struct *mm) @@ -169,9 +148,6 @@ { } -#define ptep_clear_flush_young_notify ptep_clear_flush_young -#define ptep_clear_flush_notify ptep_clear_flush - #endif /* CONFIG_MMU_NOTIFIER */ #endif /* _LINUX_MMU_NOTIFIER_H */ diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c --- a/mm/filemap_xip.c +++ b/mm/filemap_xip.c @@ -194,11 +194,13 @@ if (pte) { /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); - pteval = ptep_clear_flush_notify(vma, address, pte); + pteval = ptep_clear_flush(vma, address, pte); page_remove_rmap(page, vma); dec_mm_counter(mm, file_rss); BUG_ON(pte_dirty(pteval)); pte_unmap_unlock(pte, ptl); + /* must invalidate_page _before_ freeing the page */ + mmu_notifier_invalidate_page(mm, address); page_cache_release(page); } } diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -1626,9 +1626,10 @@ */ page_table = pte_offset_map_lock(mm, pmd, address, &ptl); - page_cache_release(old_page); + new_page = NULL; if (!pte_same(*page_table, orig_pte)) goto unlock; + page_cache_release(old_page); page_mkwrite = 1; } @@ -1644,6 +1645,7 @@ if (ptep_set_access_flags(vma, address, page_table, entry,1)) update_mmu_cache(vma, address, entry); ret |= VM_FAULT_WRITE; + old_page = new_page = NULL; goto unlock; } @@ -1688,7 +1690,7 @@ * seen in the presence of one thread doing SMC and another * thread doing COW. */ - ptep_clear_flush_notify(vma, address, page_table); + ptep_clear_flush(vma, address, page_table); set_pte_at(mm, address, page_table, entry); update_mmu_cache(vma, address, entry); lru_cache_add_active(new_page); @@ -1700,12 +1702,18 @@ } else mem_cgroup_uncharge_page(new_page); - if (new_page) +unlock: + pte_unmap_unlock(page_table, ptl); + + if (new_page) { + if (new_page == old_page) + /* cow happened, notify before releasing old_page */ + mmu_notifier_invalidate_page(mm, address); page_cache_release(new_page); + } if (old_page) page_cache_release(old_page); -unlock: - pte_unmap_unlock(page_table, ptl); + if (dirty_page) { if (vma->vm_file) file_update_time(vma->vm_file); diff --git a/mm/rmap.c b/mm/rmap.c --- a/mm/rmap.c +++ b/mm/rmap.c @@ -275,7 +275,7 @@ unsigned long address; pte_t *pte; spinlock_t *ptl; - int referenced = 0; + int referenced = 0, clear_flush_young = 0; address = vma_address(page, vma); if (address == -EFAULT) @@ -288,8 +288,11 @@ if (vma->vm_flags & VM_LOCKED) { referenced++; *mapcount = 1; /* break early from loop */ - } else if (ptep_clear_flush_young_notify(vma, address, pte)) - referenced++; + } else { + clear_flush_young = 1; + if (ptep_clear_flush_young(vma, address, pte)) + referenced++; + } /* Pretend the page is referenced if the task has the swap token and is in the middle of a page fault. */ @@ -299,6 +302,10 @@ (*mapcount)--; pte_unmap_unlock(pte, ptl); + + if (clear_flush_young) + referenced += mmu_notifier_clear_flush_young(mm, address); + out: return referenced; } @@ -457,7 +464,7 @@ pte_t entry; flush_cache_page(vma, address, pte_pfn(*pte)); - entry = ptep_clear_flush_notify(vma, address, pte); + entry = ptep_clear_flush(vma, address, pte); entry = pte_wrprotect(entry); entry = pte_mkclean(entry); set_pte_at(mm, address, pte, entry); @@ -465,6 +472,10 @@ } pte_unmap_unlock(pte, ptl); + + if (ret) + mmu_notifier_invalidate_page(mm, address); + out: return ret; } @@ -717,15 +728,14 @@ * If it's recently referenced (perhaps page_referenced * skipped over this mm) then we should reactivate it. */ - if (!migration && ((vma->vm_flags & VM_LOCKED) || - (ptep_clear_flush_young_notify(vma, address, pte)))) { + if (!migration && (vma->vm_flags & VM_LOCKED)) { ret = SWAP_FAIL; goto out_unmap; } /* Nuke the page table entry. */ flush_cache_page(vma, address, page_to_pfn(page)); - pteval = ptep_clear_flush_notify(vma, address, pte); + pteval = ptep_clear_flush(vma, address, pte); /* Move the dirty bit to the physical page now the pte is gone. */ if (pte_dirty(pteval)) @@ -780,6 +790,8 @@ out_unmap: pte_unmap_unlock(pte, ptl); + if (ret != SWAP_FAIL) + mmu_notifier_invalidate_page(mm, address); out: return ret; } @@ -818,7 +830,7 @@ spinlock_t *ptl; struct page *page; unsigned long address; - unsigned long end; + unsigned long start, end; address = (vma->vm_start + cursor) & CLUSTER_MASK; end = address + CLUSTER_SIZE; @@ -839,6 +851,8 @@ if (!pmd_present(*pmd)) return; + start = address; + mmu_notifier_invalidate_range_start(mm, start, end); pte = pte_offset_map_lock(mm, pmd, address, &ptl); /* Update high watermark before we lower rss */ @@ -850,12 +864,12 @@ page = vm_normal_page(vma, address, *pte); BUG_ON(!page || PageAnon(page)); - if (ptep_clear_flush_young_notify(vma, address, pte)) + if (ptep_clear_flush_young(vma, address, pte)) continue; /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); - pteval = ptep_clear_flush_notify(vma, address, pte); + pteval = ptep_clear_flush(vma, address, pte); /* If nonlinear, store the file page offset in the pte. */ if (page->index != linear_page_index(vma, address)) @@ -871,6 +885,7 @@ (*mapcount)--; } pte_unmap_unlock(pte - 1, ptl); + mmu_notifier_invalidate_range_end(mm, start, end); } static int try_to_unmap_anon(struct page *page, int migration) From andrea at qumranet.com Tue Apr 8 08:44:03 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 08 Apr 2008 17:44:03 +0200 Subject: [ofa-general] [PATCH 0 of 9] mmu notifier #v12 Message-ID: The difference with #v11 is a different implementation of mm_lock that guarantees handling signals in O(N). It's also more lowlatency friendly. Note that mmu_notifier_unregister may also fail with -EINTR if there are signal pending or the system runs out of vmalloc space or physical memory, only exit_mmap guarantees that any kernel module can be unloaded in presence of an oom condition. Either #v11 or the first three #v12 1,2,3 patches are suitable for inclusion in -mm, pick what you prefer looking at the mmu_notifier_register retval and mm_lock retval difference, I implemented and slighty tested both. GRU and KVM only needs 1,2,3, XPMEM needs the rest of the patchset too (4, ...) but all patches from 4 to the end can be deffered to a second merge window. From andrea at qumranet.com Tue Apr 8 08:44:04 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 08 Apr 2008 17:44:04 +0200 Subject: [ofa-general] [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen In-Reply-To: Message-ID: # HG changeset patch # User Andrea Arcangeli # Date 1207666462 -7200 # Node ID ec6d8f91b299cf26cce5c3d49bb25d35ee33c137 # Parent d4c25404de6376297ed34fada14cd6b894410eb0 Lock the entire mm to prevent any mmu related operation to happen. Signed-off-by: Andrea Arcangeli diff --git a/include/linux/mm.h b/include/linux/mm.h --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1050,6 +1050,15 @@ unsigned long addr, unsigned long len, unsigned long flags, struct page **pages); +struct mm_lock_data { + spinlock_t **i_mmap_locks; + spinlock_t **anon_vma_locks; + unsigned long nr_i_mmap_locks; + unsigned long nr_anon_vma_locks; +}; +extern struct mm_lock_data *mm_lock(struct mm_struct * mm); +extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data); + extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr, diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -2242,3 +2243,140 @@ return 0; } + +/* + * This operation locks against the VM for all pte/vma/mm related + * operations that could ever happen on a certain mm. This includes + * vmtruncate, try_to_unmap, and all page faults. The holder + * must not hold any mm related lock. A single task can't take more + * than one mm lock in a row or it would deadlock. + */ +struct mm_lock_data *mm_lock(struct mm_struct * mm) +{ + struct vm_area_struct *vma; + spinlock_t *i_mmap_lock_last, *anon_vma_lock_last; + unsigned long nr_i_mmap_locks, nr_anon_vma_locks, i; + struct mm_lock_data *data; + int err; + + down_write(&mm->mmap_sem); + + err = -EINTR; + nr_i_mmap_locks = nr_anon_vma_locks = 0; + for (vma = mm->mmap; vma; vma = vma->vm_next) { + cond_resched(); + if (unlikely(signal_pending(current))) + goto out; + + if (vma->vm_file && vma->vm_file->f_mapping) + nr_i_mmap_locks++; + if (vma->anon_vma) + nr_anon_vma_locks++; + } + + err = -ENOMEM; + data = kmalloc(sizeof(struct mm_lock_data), GFP_KERNEL); + if (!data) + goto out; + + if (nr_i_mmap_locks) { + data->i_mmap_locks = vmalloc(nr_i_mmap_locks * + sizeof(spinlock_t)); + if (!data->i_mmap_locks) + goto out_kfree; + } else + data->i_mmap_locks = NULL; + + if (nr_anon_vma_locks) { + data->anon_vma_locks = vmalloc(nr_anon_vma_locks * + sizeof(spinlock_t)); + if (!data->anon_vma_locks) + goto out_vfree; + } else + data->anon_vma_locks = NULL; + + err = -EINTR; + i_mmap_lock_last = NULL; + nr_i_mmap_locks = 0; + for (;;) { + spinlock_t *i_mmap_lock = (spinlock_t *) -1UL; + for (vma = mm->mmap; vma; vma = vma->vm_next) { + cond_resched(); + if (unlikely(signal_pending(current))) + goto out_vfree_both; + + if (!vma->vm_file || !vma->vm_file->f_mapping) + continue; + if ((unsigned long) i_mmap_lock > + (unsigned long) + &vma->vm_file->f_mapping->i_mmap_lock && + (unsigned long) + &vma->vm_file->f_mapping->i_mmap_lock > + (unsigned long) i_mmap_lock_last) + i_mmap_lock = + &vma->vm_file->f_mapping->i_mmap_lock; + } + if (i_mmap_lock == (spinlock_t *) -1UL) + break; + i_mmap_lock_last = i_mmap_lock; + data->i_mmap_locks[nr_i_mmap_locks++] = i_mmap_lock; + } + data->nr_i_mmap_locks = nr_i_mmap_locks; + + anon_vma_lock_last = NULL; + nr_anon_vma_locks = 0; + for (;;) { + spinlock_t *anon_vma_lock = (spinlock_t *) -1UL; + for (vma = mm->mmap; vma; vma = vma->vm_next) { + cond_resched(); + if (unlikely(signal_pending(current))) + goto out_vfree_both; + + if (!vma->anon_vma) + continue; + if ((unsigned long) anon_vma_lock > + (unsigned long) &vma->anon_vma->lock && + (unsigned long) &vma->anon_vma->lock > + (unsigned long) anon_vma_lock_last) + anon_vma_lock = &vma->anon_vma->lock; + } + if (anon_vma_lock == (spinlock_t *) -1UL) + break; + anon_vma_lock_last = anon_vma_lock; + data->anon_vma_locks[nr_anon_vma_locks++] = anon_vma_lock; + } + data->nr_anon_vma_locks = nr_anon_vma_locks; + + for (i = 0; i < nr_i_mmap_locks; i++) + spin_lock(data->i_mmap_locks[i]); + for (i = 0; i < nr_anon_vma_locks; i++) + spin_lock(data->anon_vma_locks[i]); + + return data; + +out_vfree_both: + vfree(data->anon_vma_locks); +out_vfree: + vfree(data->i_mmap_locks); +out_kfree: + kfree(data); +out: + up_write(&mm->mmap_sem); + return ERR_PTR(err); +} + +void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data) +{ + unsigned long i; + + for (i = 0; i < data->nr_i_mmap_locks; i++) + spin_unlock(data->i_mmap_locks[i]); + for (i = 0; i < data->nr_anon_vma_locks; i++) + spin_unlock(data->anon_vma_locks[i]); + + up_write(&mm->mmap_sem); + + vfree(data->i_mmap_locks); + vfree(data->anon_vma_locks); + kfree(data); +} From andrea at qumranet.com Tue Apr 8 08:44:05 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 08 Apr 2008 17:44:05 +0200 Subject: [ofa-general] [PATCH 2 of 9] Core of mmu notifiers In-Reply-To: Message-ID: # HG changeset patch # User Andrea Arcangeli # Date 1207666462 -7200 # Node ID baceb322b45ed43280654dac6c964c9d3d8a936f # Parent ec6d8f91b299cf26cce5c3d49bb25d35ee33c137 Core of mmu notifiers. Signed-off-by: Andrea Arcangeli Signed-off-by: Nick Piggin Signed-off-by: Christoph Lameter diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -225,6 +225,9 @@ #ifdef CONFIG_CGROUP_MEM_RES_CTLR struct mem_cgroup *mem_cgroup; #endif +#ifdef CONFIG_MMU_NOTIFIER + struct hlist_head mmu_notifier_list; +#endif }; #endif /* _LINUX_MM_TYPES_H */ diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h new file mode 100644 --- /dev/null +++ b/include/linux/mmu_notifier.h @@ -0,0 +1,177 @@ +#ifndef _LINUX_MMU_NOTIFIER_H +#define _LINUX_MMU_NOTIFIER_H + +#include +#include +#include + +struct mmu_notifier; +struct mmu_notifier_ops; + +#ifdef CONFIG_MMU_NOTIFIER + +struct mmu_notifier_ops { + /* + * Called when nobody can register any more notifier in the mm + * and after the "mn" notifier has been disarmed already. + */ + void (*release)(struct mmu_notifier *mn, + struct mm_struct *mm); + + /* + * clear_flush_young is called after the VM is + * test-and-clearing the young/accessed bitflag in the + * pte. This way the VM will provide proper aging to the + * accesses to the page through the secondary MMUs and not + * only to the ones through the Linux pte. + */ + int (*clear_flush_young)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long address); + + /* + * Before this is invoked any secondary MMU is still ok to + * read/write to the page previously pointed by the Linux pte + * because the old page hasn't been freed yet. If required + * set_page_dirty has to be called internally to this method. + */ + void (*invalidate_page)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long address); + + /* + * invalidate_range_start() and invalidate_range_end() must be + * paired. Multiple invalidate_range_start/ends may be nested + * or called concurrently. + */ + void (*invalidate_range_start)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, unsigned long end); + void (*invalidate_range_end)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, unsigned long end); +}; + +struct mmu_notifier { + struct hlist_node hlist; + const struct mmu_notifier_ops *ops; +}; + +static inline int mm_has_notifiers(struct mm_struct *mm) +{ + return unlikely(!hlist_empty(&mm->mmu_notifier_list)); +} + +extern int mmu_notifier_register(struct mmu_notifier *mn, + struct mm_struct *mm); +extern int mmu_notifier_unregister(struct mmu_notifier *mn, + struct mm_struct *mm); +extern void __mmu_notifier_release(struct mm_struct *mm); +extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address); +extern void __mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address); +extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm, + unsigned long start, unsigned long end); +extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end); + + +static inline void mmu_notifier_release(struct mm_struct *mm) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_release(mm); +} + +static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address) +{ + if (mm_has_notifiers(mm)) + return __mmu_notifier_clear_flush_young(mm, address); + return 0; +} + +static inline void mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_invalidate_page(mm, address); +} + +static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_invalidate_range_start(mm, start, end); +} + +static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_invalidate_range_end(mm, start, end); +} + +static inline void mmu_notifier_mm_init(struct mm_struct *mm) +{ + INIT_HLIST_HEAD(&mm->mmu_notifier_list); +} + +#define ptep_clear_flush_notify(__vma, __address, __ptep) \ +({ \ + pte_t __pte; \ + struct vm_area_struct *___vma = __vma; \ + unsigned long ___address = __address; \ + __pte = ptep_clear_flush(___vma, ___address, __ptep); \ + mmu_notifier_invalidate_page(___vma->vm_mm, ___address); \ + __pte; \ +}) + +#define ptep_clear_flush_young_notify(__vma, __address, __ptep) \ +({ \ + int __young; \ + struct vm_area_struct *___vma = __vma; \ + unsigned long ___address = __address; \ + __young = ptep_clear_flush_young(___vma, ___address, __ptep); \ + __young |= mmu_notifier_clear_flush_young(___vma->vm_mm, \ + ___address); \ + __young; \ +}) + +#else /* CONFIG_MMU_NOTIFIER */ + +static inline void mmu_notifier_release(struct mm_struct *mm) +{ +} + +static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address) +{ + return 0; +} + +static inline void mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address) +{ +} + +static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ +} + +static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ +} + +static inline void mmu_notifier_mm_init(struct mm_struct *mm) +{ +} + +#define ptep_clear_flush_young_notify ptep_clear_flush_young +#define ptep_clear_flush_notify ptep_clear_flush + +#endif /* CONFIG_MMU_NOTIFIER */ + +#endif /* _LINUX_MMU_NOTIFIER_H */ diff --git a/kernel/fork.c b/kernel/fork.c --- a/kernel/fork.c +++ b/kernel/fork.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include @@ -362,6 +363,7 @@ if (likely(!mm_alloc_pgd(mm))) { mm->def_flags = 0; + mmu_notifier_mm_init(mm); return mm; } diff --git a/mm/Kconfig b/mm/Kconfig --- a/mm/Kconfig +++ b/mm/Kconfig @@ -193,3 +193,7 @@ config VIRT_TO_BUS def_bool y depends on !ARCH_NO_VIRT_TO_BUS + +config MMU_NOTIFIER + def_bool y + bool "MMU notifier, for paging KVM/RDMA" diff --git a/mm/Makefile b/mm/Makefile --- a/mm/Makefile +++ b/mm/Makefile @@ -33,4 +33,5 @@ obj-$(CONFIG_SMP) += allocpercpu.o obj-$(CONFIG_QUICKLIST) += quicklist.o obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o +obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c --- a/mm/filemap_xip.c +++ b/mm/filemap_xip.c @@ -194,7 +194,7 @@ if (pte) { /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); - pteval = ptep_clear_flush(vma, address, pte); + pteval = ptep_clear_flush_notify(vma, address, pte); page_remove_rmap(page, vma); dec_mm_counter(mm, file_rss); BUG_ON(pte_dirty(pteval)); diff --git a/mm/fremap.c b/mm/fremap.c --- a/mm/fremap.c +++ b/mm/fremap.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -214,7 +215,9 @@ spin_unlock(&mapping->i_mmap_lock); } + mmu_notifier_invalidate_range_start(mm, start, start + size); err = populate_range(mm, vma, start, size, pgoff); + mmu_notifier_invalidate_range_end(mm, start, start + size); if (!err && !(flags & MAP_NONBLOCK)) { if (unlikely(has_write_lock)) { downgrade_write(&mm->mmap_sem); diff --git a/mm/hugetlb.c b/mm/hugetlb.c --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include @@ -799,6 +800,7 @@ BUG_ON(start & ~HPAGE_MASK); BUG_ON(end & ~HPAGE_MASK); + mmu_notifier_invalidate_range_start(mm, start, end); spin_lock(&mm->page_table_lock); for (address = start; address < end; address += HPAGE_SIZE) { ptep = huge_pte_offset(mm, address); @@ -819,6 +821,7 @@ } spin_unlock(&mm->page_table_lock); flush_tlb_range(vma, start, end); + mmu_notifier_invalidate_range_end(mm, start, end); list_for_each_entry_safe(page, tmp, &page_list, lru) { list_del(&page->lru); put_page(page); diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -51,6 +51,7 @@ #include #include #include +#include #include #include @@ -611,6 +612,9 @@ if (is_vm_hugetlb_page(vma)) return copy_hugetlb_page_range(dst_mm, src_mm, vma); + if (is_cow_mapping(vma->vm_flags)) + mmu_notifier_invalidate_range_start(src_mm, addr, end); + dst_pgd = pgd_offset(dst_mm, addr); src_pgd = pgd_offset(src_mm, addr); do { @@ -621,6 +625,11 @@ vma, addr, next)) return -ENOMEM; } while (dst_pgd++, src_pgd++, addr = next, addr != end); + + if (is_cow_mapping(vma->vm_flags)) + mmu_notifier_invalidate_range_end(src_mm, + vma->vm_start, end); + return 0; } @@ -897,7 +906,9 @@ lru_add_drain(); tlb = tlb_gather_mmu(mm, 0); update_hiwater_rss(mm); + mmu_notifier_invalidate_range_start(mm, address, end); end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details); + mmu_notifier_invalidate_range_end(mm, address, end); if (tlb) tlb_finish_mmu(tlb, address, end); return end; @@ -1463,10 +1474,11 @@ { pgd_t *pgd; unsigned long next; - unsigned long end = addr + size; + unsigned long start = addr, end = addr + size; int err; BUG_ON(addr >= end); + mmu_notifier_invalidate_range_start(mm, start, end); pgd = pgd_offset(mm, addr); do { next = pgd_addr_end(addr, end); @@ -1474,6 +1486,7 @@ if (err) break; } while (pgd++, addr = next, addr != end); + mmu_notifier_invalidate_range_end(mm, start, end); return err; } EXPORT_SYMBOL_GPL(apply_to_page_range); @@ -1675,7 +1688,7 @@ * seen in the presence of one thread doing SMC and another * thread doing COW. */ - ptep_clear_flush(vma, address, page_table); + ptep_clear_flush_notify(vma, address, page_table); set_pte_at(mm, address, page_table, entry); update_mmu_cache(vma, address, entry); lru_cache_add_active(new_page); diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include @@ -1748,11 +1749,13 @@ lru_add_drain(); tlb = tlb_gather_mmu(mm, 0); update_hiwater_rss(mm); + mmu_notifier_invalidate_range_start(mm, start, end); unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL); vm_unacct_memory(nr_accounted); free_pgtables(&tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS, next? next->vm_start: 0); tlb_finish_mmu(tlb, start, end); + mmu_notifier_invalidate_range_end(mm, start, end); } /* @@ -2038,6 +2041,7 @@ unsigned long end; /* mm's last user has gone, and its about to be pulled down */ + mmu_notifier_release(mm); arch_exit_mmap(mm); lru_add_drain(); diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c new file mode 100644 --- /dev/null +++ b/mm/mmu_notifier.c @@ -0,0 +1,126 @@ +/* + * linux/mm/mmu_notifier.c + * + * Copyright (C) 2008 Qumranet, Inc. + * Copyright (C) 2008 SGI + * Christoph Lameter + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ + +#include +#include +#include +#include + +/* + * No synchronization. This function can only be called when only a single + * process remains that performs teardown. + */ +void __mmu_notifier_release(struct mm_struct *mm) +{ + struct mmu_notifier *mn; + + while (unlikely(!hlist_empty(&mm->mmu_notifier_list))) { + mn = hlist_entry(mm->mmu_notifier_list.first, + struct mmu_notifier, + hlist); + hlist_del(&mn->hlist); + if (mn->ops->release) + mn->ops->release(mn, mm); + } +} + +/* + * If no young bitflag is supported by the hardware, ->clear_flush_young can + * unmap the address and return 1 or 0 depending if the mapping previously + * existed or not. + */ +int __mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + int young = 0; + + hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) { + if (mn->ops->clear_flush_young) + young |= mn->ops->clear_flush_young(mn, mm, address); + } + + return young; +} + +void __mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + + hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) { + if (mn->ops->invalidate_page) + mn->ops->invalidate_page(mn, mm, address); + } +} + +void __mmu_notifier_invalidate_range_start(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + + hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) { + if (mn->ops->invalidate_range_start) + mn->ops->invalidate_range_start(mn, mm, start, end); + } +} + +void __mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + + hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) { + if (mn->ops->invalidate_range_end) + mn->ops->invalidate_range_end(mn, mm, start, end); + } +} + +/* + * Must not hold mmap_sem nor any other VM related lock when calling + * this registration function. + */ +int mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm) +{ + struct mm_lock_data *data; + + data = mm_lock(mm); + if (unlikely(IS_ERR(data))) + return PTR_ERR(data); + hlist_add_head(&mn->hlist, &mm->mmu_notifier_list); + mm_unlock(mm, data); + return 0; +} +EXPORT_SYMBOL_GPL(mmu_notifier_register); + +/* + * mm_users can't go down to zero while mmu_notifier_unregister() + * runs or it can race with ->release. So a mm_users pin must + * be taken by the caller (if mm can be different from current->mm). + */ +int mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm) +{ + struct mm_lock_data *data; + + BUG_ON(!atomic_read(&mm->mm_users)); + + data = mm_lock(mm); + if (unlikely(IS_ERR(data))) + return PTR_ERR(data); + hlist_del(&mn->hlist); + mm_unlock(mm, data); + return 0; +} +EXPORT_SYMBOL_GPL(mmu_notifier_unregister); diff --git a/mm/mprotect.c b/mm/mprotect.c --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include #include @@ -198,10 +199,12 @@ dirty_accountable = 1; } + mmu_notifier_invalidate_range_start(mm, start, end); if (is_vm_hugetlb_page(vma)) hugetlb_change_protection(vma, start, end, vma->vm_page_prot); else change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable); + mmu_notifier_invalidate_range_end(mm, start, end); vm_stat_account(mm, oldflags, vma->vm_file, -nrpages); vm_stat_account(mm, newflags, vma->vm_file, nrpages); return 0; diff --git a/mm/mremap.c b/mm/mremap.c --- a/mm/mremap.c +++ b/mm/mremap.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -74,7 +75,11 @@ struct mm_struct *mm = vma->vm_mm; pte_t *old_pte, *new_pte, pte; spinlock_t *old_ptl, *new_ptl; + unsigned long old_start; + old_start = old_addr; + mmu_notifier_invalidate_range_start(vma->vm_mm, + old_start, old_end); if (vma->vm_file) { /* * Subtle point from Rajesh Venkatasubramanian: before @@ -116,6 +121,7 @@ pte_unmap_unlock(old_pte - 1, old_ptl); if (mapping) spin_unlock(&mapping->i_mmap_lock); + mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end); } #define LATENCY_LIMIT (64 * PAGE_SIZE) diff --git a/mm/rmap.c b/mm/rmap.c --- a/mm/rmap.c +++ b/mm/rmap.c @@ -49,6 +49,7 @@ #include #include #include +#include #include @@ -287,7 +288,7 @@ if (vma->vm_flags & VM_LOCKED) { referenced++; *mapcount = 1; /* break early from loop */ - } else if (ptep_clear_flush_young(vma, address, pte)) + } else if (ptep_clear_flush_young_notify(vma, address, pte)) referenced++; /* Pretend the page is referenced if the task has the @@ -456,7 +457,7 @@ pte_t entry; flush_cache_page(vma, address, pte_pfn(*pte)); - entry = ptep_clear_flush(vma, address, pte); + entry = ptep_clear_flush_notify(vma, address, pte); entry = pte_wrprotect(entry); entry = pte_mkclean(entry); set_pte_at(mm, address, pte, entry); @@ -717,14 +718,14 @@ * skipped over this mm) then we should reactivate it. */ if (!migration && ((vma->vm_flags & VM_LOCKED) || - (ptep_clear_flush_young(vma, address, pte)))) { + (ptep_clear_flush_young_notify(vma, address, pte)))) { ret = SWAP_FAIL; goto out_unmap; } /* Nuke the page table entry. */ flush_cache_page(vma, address, page_to_pfn(page)); - pteval = ptep_clear_flush(vma, address, pte); + pteval = ptep_clear_flush_notify(vma, address, pte); /* Move the dirty bit to the physical page now the pte is gone. */ if (pte_dirty(pteval)) @@ -849,12 +850,12 @@ page = vm_normal_page(vma, address, *pte); BUG_ON(!page || PageAnon(page)); - if (ptep_clear_flush_young(vma, address, pte)) + if (ptep_clear_flush_young_notify(vma, address, pte)) continue; /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); - pteval = ptep_clear_flush(vma, address, pte); + pteval = ptep_clear_flush_notify(vma, address, pte); /* If nonlinear, store the file page offset in the pte. */ if (page->index != linear_page_index(vma, address)) From andrea at qumranet.com Tue Apr 8 08:44:09 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 08 Apr 2008 17:44:09 +0200 Subject: [ofa-general] [PATCH 6 of 9] We no longer abort unmapping in unmap vmas because we can reschedule while In-Reply-To: Message-ID: # HG changeset patch # User Andrea Arcangeli # Date 1207666893 -7200 # Node ID b0cb674314534b9cc4759603f123474d38427b2d # Parent 20e829e35dfeceeb55a816ef495afda10cd50b98 We no longer abort unmapping in unmap vmas because we can reschedule while unmapping since we are holding a semaphore. This would allow moving more of the tlb flusing into unmap_vmas reducing code in various places. Signed-off-by: Christoph Lameter diff --git a/include/linux/mm.h b/include/linux/mm.h --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -723,8 +723,7 @@ struct page *vm_normal_page(struct vm_area_struct *, unsigned long, pte_t); unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size, struct zap_details *); -unsigned long unmap_vmas(struct mmu_gather **tlb, - struct vm_area_struct *start_vma, unsigned long start_addr, +unsigned long unmap_vmas(struct vm_area_struct *start_vma, unsigned long start_addr, unsigned long end_addr, unsigned long *nr_accounted, struct zap_details *); diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -805,7 +805,6 @@ /** * unmap_vmas - unmap a range of memory covered by a list of vma's - * @tlbp: address of the caller's struct mmu_gather * @vma: the starting vma * @start_addr: virtual address at which to start unmapping * @end_addr: virtual address at which to end unmapping @@ -817,20 +816,13 @@ * Unmap all pages in the vma list. * * We aim to not hold locks for too long (for scheduling latency reasons). - * So zap pages in ZAP_BLOCK_SIZE bytecounts. This means we need to - * return the ending mmu_gather to the caller. + * So zap pages in ZAP_BLOCK_SIZE bytecounts. * * Only addresses between `start' and `end' will be unmapped. * * The VMA list must be sorted in ascending virtual address order. - * - * unmap_vmas() assumes that the caller will flush the whole unmapped address - * range after unmap_vmas() returns. So the only responsibility here is to - * ensure that any thus-far unmapped pages are flushed before unmap_vmas() - * drops the lock and schedules. */ -unsigned long unmap_vmas(struct mmu_gather **tlbp, - struct vm_area_struct *vma, unsigned long start_addr, +unsigned long unmap_vmas(struct vm_area_struct *vma, unsigned long start_addr, unsigned long end_addr, unsigned long *nr_accounted, struct zap_details *details) { @@ -838,7 +830,15 @@ unsigned long tlb_start = 0; /* For tlb_finish_mmu */ int tlb_start_valid = 0; unsigned long start = start_addr; - int fullmm = (*tlbp)->fullmm; + int fullmm; + struct mmu_gather *tlb; + struct mm_struct *mm = vma->vm_mm; + + mmu_notifier_invalidate_range_start(mm, start_addr, end_addr); + lru_add_drain(); + tlb = tlb_gather_mmu(mm, 0); + update_hiwater_rss(mm); + fullmm = tlb->fullmm; for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) { unsigned long end; @@ -865,7 +865,7 @@ (HPAGE_SIZE / PAGE_SIZE); start = end; } else - start = unmap_page_range(*tlbp, vma, + start = unmap_page_range(tlb, vma, start, end, &zap_work, details); if (zap_work > 0) { @@ -873,13 +873,15 @@ break; } - tlb_finish_mmu(*tlbp, tlb_start, start); + tlb_finish_mmu(tlb, tlb_start, start); cond_resched(); - *tlbp = tlb_gather_mmu(vma->vm_mm, fullmm); + tlb = tlb_gather_mmu(vma->vm_mm, fullmm); tlb_start_valid = 0; zap_work = ZAP_BLOCK_SIZE; } } + tlb_finish_mmu(tlb, start_addr, end_addr); + mmu_notifier_invalidate_range_end(mm, start_addr, end_addr); return start; /* which is now the end (or restart) address */ } @@ -893,20 +895,10 @@ unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size, struct zap_details *details) { - struct mm_struct *mm = vma->vm_mm; - struct mmu_gather *tlb; unsigned long end = address + size; unsigned long nr_accounted = 0; - lru_add_drain(); - tlb = tlb_gather_mmu(mm, 0); - update_hiwater_rss(mm); - mmu_notifier_invalidate_range_start(mm, address, end); - end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details); - mmu_notifier_invalidate_range_end(mm, address, end); - if (tlb) - tlb_finish_mmu(tlb, address, end); - return end; + return unmap_vmas(vma, address, end, &nr_accounted, details); } /* diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1743,19 +1743,12 @@ unsigned long start, unsigned long end) { struct vm_area_struct *next = prev? prev->vm_next: mm->mmap; - struct mmu_gather *tlb; unsigned long nr_accounted = 0; - lru_add_drain(); - tlb = tlb_gather_mmu(mm, 0); - update_hiwater_rss(mm); - mmu_notifier_invalidate_range_start(mm, start, end); - unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL); + unmap_vmas(vma, start, end, &nr_accounted, NULL); vm_unacct_memory(nr_accounted); - tlb_finish_mmu(tlb, start, end); free_pgtables(vma, prev? prev->vm_end: FIRST_USER_ADDRESS, next? next->vm_start: 0); - mmu_notifier_invalidate_range_end(mm, start, end); } /* @@ -2035,7 +2028,6 @@ /* Release all mmaps. */ void exit_mmap(struct mm_struct *mm) { - struct mmu_gather *tlb; struct vm_area_struct *vma = mm->mmap; unsigned long nr_accounted = 0; unsigned long end; @@ -2046,12 +2038,9 @@ lru_add_drain(); flush_cache_mm(mm); - tlb = tlb_gather_mmu(mm, 1); - /* Don't update_hiwater_rss(mm) here, do_exit already did */ - /* Use -1 here to ensure all VMAs in the mm are unmapped */ - end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL); + + end = unmap_vmas(vma, 0, -1, &nr_accounted, NULL); vm_unacct_memory(nr_accounted); - tlb_finish_mmu(tlb, 0, end); free_pgtables(vma, FIRST_USER_ADDRESS, 0); /* From andrea at qumranet.com Tue Apr 8 08:44:08 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 08 Apr 2008 17:44:08 +0200 Subject: [ofa-general] [PATCH 5 of 9] The conversion to a rwsem allows callbacks during rmap traversal In-Reply-To: Message-ID: <20e829e35dfeceeb55a8.1207669448@duo.random> # HG changeset patch # User Andrea Arcangeli # Date 1207666463 -7200 # Node ID 20e829e35dfeceeb55a816ef495afda10cd50b98 # Parent 2c2ed514f294dbbfc66157f771bc900789ac6005 The conversion to a rwsem allows callbacks during rmap traversal for files in a non atomic context. A rw style lock also allows concurrent walking of the reverse map. This is fairly straightforward if one removes pieces of the resched checking. [Restarting unmapping is an issue to be discussed]. This slightly increases Aim9 performance results on an 8p. Signed-off-by: Andrea Arcangeli Signed-off-by: Christoph Lameter diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c --- a/arch/x86/mm/hugetlbpage.c +++ b/arch/x86/mm/hugetlbpage.c @@ -69,7 +69,7 @@ if (!vma_shareable(vma, addr)) return; - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); vma_prio_tree_foreach(svma, &iter, &mapping->i_mmap, idx, idx) { if (svma == vma) continue; @@ -94,7 +94,7 @@ put_page(virt_to_page(spte)); spin_unlock(&mm->page_table_lock); out: - spin_unlock(&mapping->i_mmap_lock); + up_read(&mapping->i_mmap_sem); } /* diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -454,10 +454,10 @@ pgoff = offset >> PAGE_SHIFT; i_size_write(inode, offset); - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); if (!prio_tree_empty(&mapping->i_mmap)) hugetlb_vmtruncate_list(&mapping->i_mmap, pgoff); - spin_unlock(&mapping->i_mmap_lock); + up_read(&mapping->i_mmap_sem); truncate_hugepages(inode, offset); return 0; } diff --git a/fs/inode.c b/fs/inode.c --- a/fs/inode.c +++ b/fs/inode.c @@ -210,7 +210,7 @@ INIT_LIST_HEAD(&inode->i_devices); INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC); rwlock_init(&inode->i_data.tree_lock); - spin_lock_init(&inode->i_data.i_mmap_lock); + init_rwsem(&inode->i_data.i_mmap_sem); INIT_LIST_HEAD(&inode->i_data.private_list); spin_lock_init(&inode->i_data.private_lock); INIT_RAW_PRIO_TREE_ROOT(&inode->i_data.i_mmap); diff --git a/include/linux/fs.h b/include/linux/fs.h --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -503,7 +503,7 @@ unsigned int i_mmap_writable;/* count VM_SHARED mappings */ struct prio_tree_root i_mmap; /* tree of private and shared mappings */ struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */ - spinlock_t i_mmap_lock; /* protect tree, count, list */ + struct rw_semaphore i_mmap_sem; /* protect tree, count, list */ unsigned int truncate_count; /* Cover race condition with truncate */ unsigned long nrpages; /* number of total pages */ pgoff_t writeback_index;/* writeback starts here */ diff --git a/include/linux/mm.h b/include/linux/mm.h --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -716,7 +716,7 @@ struct address_space *check_mapping; /* Check page->mapping if set */ pgoff_t first_index; /* Lowest page->index to unmap */ pgoff_t last_index; /* Highest page->index to unmap */ - spinlock_t *i_mmap_lock; /* For unmap_mapping_range: */ + struct rw_semaphore *i_mmap_sem; /* For unmap_mapping_range: */ unsigned long truncate_count; /* Compare vm_truncate_count */ }; @@ -1051,9 +1051,9 @@ unsigned long flags, struct page **pages); struct mm_lock_data { - spinlock_t **i_mmap_locks; + struct rw_semaphore **i_mmap_sems; spinlock_t **anon_vma_locks; - unsigned long nr_i_mmap_locks; + unsigned long nr_i_mmap_sems; unsigned long nr_anon_vma_locks; }; extern struct mm_lock_data *mm_lock(struct mm_struct * mm); diff --git a/kernel/fork.c b/kernel/fork.c --- a/kernel/fork.c +++ b/kernel/fork.c @@ -274,12 +274,12 @@ atomic_dec(&inode->i_writecount); /* insert tmp into the share list, just after mpnt */ - spin_lock(&file->f_mapping->i_mmap_lock); + down_write(&file->f_mapping->i_mmap_sem); tmp->vm_truncate_count = mpnt->vm_truncate_count; flush_dcache_mmap_lock(file->f_mapping); vma_prio_tree_add(tmp, mpnt); flush_dcache_mmap_unlock(file->f_mapping); - spin_unlock(&file->f_mapping->i_mmap_lock); + up_write(&file->f_mapping->i_mmap_sem); } /* diff --git a/mm/filemap.c b/mm/filemap.c --- a/mm/filemap.c +++ b/mm/filemap.c @@ -61,16 +61,16 @@ /* * Lock ordering: * - * ->i_mmap_lock (vmtruncate) + * ->i_mmap_sem (vmtruncate) * ->private_lock (__free_pte->__set_page_dirty_buffers) * ->swap_lock (exclusive_swap_page, others) * ->mapping->tree_lock * * ->i_mutex - * ->i_mmap_lock (truncate->unmap_mapping_range) + * ->i_mmap_sem (truncate->unmap_mapping_range) * * ->mmap_sem - * ->i_mmap_lock + * ->i_mmap_sem * ->page_table_lock or pte_lock (various, mainly in memory.c) * ->mapping->tree_lock (arch-dependent flush_dcache_mmap_lock) * @@ -87,7 +87,7 @@ * ->sb_lock (fs/fs-writeback.c) * ->mapping->tree_lock (__sync_single_inode) * - * ->i_mmap_lock + * ->i_mmap_sem * ->anon_vma.lock (vma_adjust) * * ->anon_vma.lock diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c --- a/mm/filemap_xip.c +++ b/mm/filemap_xip.c @@ -184,7 +184,7 @@ if (!page) return; - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) { mm = vma->vm_mm; address = vma->vm_start + @@ -204,7 +204,7 @@ page_cache_release(page); } } - spin_unlock(&mapping->i_mmap_lock); + up_read(&mapping->i_mmap_sem); } /* diff --git a/mm/fremap.c b/mm/fremap.c --- a/mm/fremap.c +++ b/mm/fremap.c @@ -206,13 +206,13 @@ } goto out; } - spin_lock(&mapping->i_mmap_lock); + down_write(&mapping->i_mmap_sem); flush_dcache_mmap_lock(mapping); vma->vm_flags |= VM_NONLINEAR; vma_prio_tree_remove(vma, &mapping->i_mmap); vma_nonlinear_insert(vma, &mapping->i_mmap_nonlinear); flush_dcache_mmap_unlock(mapping); - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); } mmu_notifier_invalidate_range_start(mm, start, start + size); diff --git a/mm/hugetlb.c b/mm/hugetlb.c --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -790,7 +790,7 @@ struct page *page; struct page *tmp; /* - * A page gathering list, protected by per file i_mmap_lock. The + * A page gathering list, protected by per file i_mmap_sem. The * lock is used to avoid list corruption from multiple unmapping * of the same page since we are using page->lru. */ @@ -840,9 +840,9 @@ * do nothing in this case. */ if (vma->vm_file) { - spin_lock(&vma->vm_file->f_mapping->i_mmap_lock); + down_write(&vma->vm_file->f_mapping->i_mmap_sem); __unmap_hugepage_range(vma, start, end); - spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock); + up_write(&vma->vm_file->f_mapping->i_mmap_sem); } } @@ -1085,7 +1085,7 @@ BUG_ON(address >= end); flush_cache_range(vma, address, end); - spin_lock(&vma->vm_file->f_mapping->i_mmap_lock); + down_write(&vma->vm_file->f_mapping->i_mmap_sem); spin_lock(&mm->page_table_lock); for (; address < end; address += HPAGE_SIZE) { ptep = huge_pte_offset(mm, address); @@ -1100,7 +1100,7 @@ } } spin_unlock(&mm->page_table_lock); - spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock); + up_write(&vma->vm_file->f_mapping->i_mmap_sem); flush_tlb_range(vma, start, end); } diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -838,7 +838,6 @@ unsigned long tlb_start = 0; /* For tlb_finish_mmu */ int tlb_start_valid = 0; unsigned long start = start_addr; - spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL; int fullmm = (*tlbp)->fullmm; for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) { @@ -875,22 +874,12 @@ } tlb_finish_mmu(*tlbp, tlb_start, start); - - if (need_resched() || - (i_mmap_lock && spin_needbreak(i_mmap_lock))) { - if (i_mmap_lock) { - *tlbp = NULL; - goto out; - } - cond_resched(); - } - + cond_resched(); *tlbp = tlb_gather_mmu(vma->vm_mm, fullmm); tlb_start_valid = 0; zap_work = ZAP_BLOCK_SIZE; } } -out: return start; /* which is now the end (or restart) address */ } @@ -1752,7 +1741,7 @@ /* * Helper functions for unmap_mapping_range(). * - * __ Notes on dropping i_mmap_lock to reduce latency while unmapping __ + * __ Notes on dropping i_mmap_sem to reduce latency while unmapping __ * * We have to restart searching the prio_tree whenever we drop the lock, * since the iterator is only valid while the lock is held, and anyway @@ -1771,7 +1760,7 @@ * can't efficiently keep all vmas in step with mapping->truncate_count: * so instead reset them all whenever it wraps back to 0 (then go to 1). * mapping->truncate_count and vma->vm_truncate_count are protected by - * i_mmap_lock. + * i_mmap_sem. * * In order to make forward progress despite repeatedly restarting some * large vma, note the restart_addr from unmap_vmas when it breaks out: @@ -1821,7 +1810,7 @@ restart_addr = zap_page_range(vma, start_addr, end_addr - start_addr, details); - need_break = need_resched() || spin_needbreak(details->i_mmap_lock); + need_break = need_resched(); if (restart_addr >= end_addr) { /* We have now completed this vma: mark it so */ @@ -1835,9 +1824,9 @@ goto again; } - spin_unlock(details->i_mmap_lock); + up_write(details->i_mmap_sem); cond_resched(); - spin_lock(details->i_mmap_lock); + down_write(details->i_mmap_sem); return -EINTR; } @@ -1931,9 +1920,9 @@ details.last_index = hba + hlen - 1; if (details.last_index < details.first_index) details.last_index = ULONG_MAX; - details.i_mmap_lock = &mapping->i_mmap_lock; + details.i_mmap_sem = &mapping->i_mmap_sem; - spin_lock(&mapping->i_mmap_lock); + down_write(&mapping->i_mmap_sem); /* Protect against endless unmapping loops */ mapping->truncate_count++; @@ -1948,7 +1937,7 @@ unmap_mapping_range_tree(&mapping->i_mmap, &details); if (unlikely(!list_empty(&mapping->i_mmap_nonlinear))) unmap_mapping_range_list(&mapping->i_mmap_nonlinear, &details); - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); } EXPORT_SYMBOL(unmap_mapping_range); diff --git a/mm/migrate.c b/mm/migrate.c --- a/mm/migrate.c +++ b/mm/migrate.c @@ -211,12 +211,12 @@ if (!mapping) return; - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) remove_migration_pte(vma, old, new); - spin_unlock(&mapping->i_mmap_lock); + up_read(&mapping->i_mmap_sem); } /* diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -188,7 +188,7 @@ } /* - * Requires inode->i_mapping->i_mmap_lock + * Requires inode->i_mapping->i_mmap_sem */ static void __remove_shared_vm_struct(struct vm_area_struct *vma, struct file *file, struct address_space *mapping) @@ -216,9 +216,9 @@ if (file) { struct address_space *mapping = file->f_mapping; - spin_lock(&mapping->i_mmap_lock); + down_write(&mapping->i_mmap_sem); __remove_shared_vm_struct(vma, file, mapping); - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); } } @@ -441,7 +441,7 @@ mapping = vma->vm_file->f_mapping; if (mapping) { - spin_lock(&mapping->i_mmap_lock); + down_write(&mapping->i_mmap_sem); vma->vm_truncate_count = mapping->truncate_count; } anon_vma_lock(vma); @@ -451,7 +451,7 @@ anon_vma_unlock(vma); if (mapping) - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); mm->map_count++; validate_mm(mm); @@ -538,7 +538,7 @@ mapping = file->f_mapping; if (!(vma->vm_flags & VM_NONLINEAR)) root = &mapping->i_mmap; - spin_lock(&mapping->i_mmap_lock); + down_write(&mapping->i_mmap_sem); if (importer && vma->vm_truncate_count != next->vm_truncate_count) { /* @@ -622,7 +622,7 @@ if (anon_vma) spin_unlock(&anon_vma->lock); if (mapping) - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); if (remove_next) { if (file) @@ -2066,7 +2066,7 @@ /* Insert vm structure into process list sorted by address * and into the inode's i_mmap tree. If vm_file is non-NULL - * then i_mmap_lock is taken here. + * then i_mmap_sem is taken here. */ int insert_vm_struct(struct mm_struct * mm, struct vm_area_struct * vma) { @@ -2258,22 +2258,23 @@ struct mm_lock_data *mm_lock(struct mm_struct * mm) { struct vm_area_struct *vma; - spinlock_t *i_mmap_lock_last, *anon_vma_lock_last; - unsigned long nr_i_mmap_locks, nr_anon_vma_locks, i; + struct rw_semaphore *i_mmap_sem_last; + spinlock_t *anon_vma_lock_last; + unsigned long nr_i_mmap_sems, nr_anon_vma_locks, i; struct mm_lock_data *data; int err; down_write(&mm->mmap_sem); err = -EINTR; - nr_i_mmap_locks = nr_anon_vma_locks = 0; + nr_i_mmap_sems = nr_anon_vma_locks = 0; for (vma = mm->mmap; vma; vma = vma->vm_next) { cond_resched(); if (unlikely(signal_pending(current))) goto out; if (vma->vm_file && vma->vm_file->f_mapping) - nr_i_mmap_locks++; + nr_i_mmap_sems++; if (vma->anon_vma) nr_anon_vma_locks++; } @@ -2283,13 +2284,13 @@ if (!data) goto out; - if (nr_i_mmap_locks) { - data->i_mmap_locks = vmalloc(nr_i_mmap_locks * - sizeof(spinlock_t)); - if (!data->i_mmap_locks) + if (nr_i_mmap_sems) { + data->i_mmap_sems = vmalloc(nr_i_mmap_sems * + sizeof(struct rw_semaphore)); + if (!data->i_mmap_sems) goto out_kfree; } else - data->i_mmap_locks = NULL; + data->i_mmap_sems = NULL; if (nr_anon_vma_locks) { data->anon_vma_locks = vmalloc(nr_anon_vma_locks * @@ -2300,10 +2301,11 @@ data->anon_vma_locks = NULL; err = -EINTR; - i_mmap_lock_last = NULL; - nr_i_mmap_locks = 0; + i_mmap_sem_last = NULL; + nr_i_mmap_sems = 0; for (;;) { - spinlock_t *i_mmap_lock = (spinlock_t *) -1UL; + struct rw_semaphore *i_mmap_sem; + i_mmap_sem = (struct rw_semaphore *) -1UL; for (vma = mm->mmap; vma; vma = vma->vm_next) { cond_resched(); if (unlikely(signal_pending(current))) @@ -2311,21 +2313,21 @@ if (!vma->vm_file || !vma->vm_file->f_mapping) continue; - if ((unsigned long) i_mmap_lock > + if ((unsigned long) i_mmap_sem > (unsigned long) - &vma->vm_file->f_mapping->i_mmap_lock && + &vma->vm_file->f_mapping->i_mmap_sem && (unsigned long) - &vma->vm_file->f_mapping->i_mmap_lock > - (unsigned long) i_mmap_lock_last) - i_mmap_lock = - &vma->vm_file->f_mapping->i_mmap_lock; + &vma->vm_file->f_mapping->i_mmap_sem > + (unsigned long) i_mmap_sem_last) + i_mmap_sem = + &vma->vm_file->f_mapping->i_mmap_sem; } - if (i_mmap_lock == (spinlock_t *) -1UL) + if (i_mmap_sem == (struct rw_semaphore *) -1UL) break; - i_mmap_lock_last = i_mmap_lock; - data->i_mmap_locks[nr_i_mmap_locks++] = i_mmap_lock; + i_mmap_sem_last = i_mmap_sem; + data->i_mmap_sems[nr_i_mmap_sems++] = i_mmap_sem; } - data->nr_i_mmap_locks = nr_i_mmap_locks; + data->nr_i_mmap_sems = nr_i_mmap_sems; anon_vma_lock_last = NULL; nr_anon_vma_locks = 0; @@ -2351,8 +2353,8 @@ } data->nr_anon_vma_locks = nr_anon_vma_locks; - for (i = 0; i < nr_i_mmap_locks; i++) - spin_lock(data->i_mmap_locks[i]); + for (i = 0; i < nr_i_mmap_sems; i++) + down_write(data->i_mmap_sems[i]); for (i = 0; i < nr_anon_vma_locks; i++) spin_lock(data->anon_vma_locks[i]); @@ -2361,7 +2363,7 @@ out_vfree_both: vfree(data->anon_vma_locks); out_vfree: - vfree(data->i_mmap_locks); + vfree(data->i_mmap_sems); out_kfree: kfree(data); out: @@ -2373,14 +2375,14 @@ { unsigned long i; - for (i = 0; i < data->nr_i_mmap_locks; i++) - spin_unlock(data->i_mmap_locks[i]); + for (i = 0; i < data->nr_i_mmap_sems; i++) + up_write(data->i_mmap_sems[i]); for (i = 0; i < data->nr_anon_vma_locks; i++) spin_unlock(data->anon_vma_locks[i]); up_write(&mm->mmap_sem); - vfree(data->i_mmap_locks); + vfree(data->i_mmap_sems); vfree(data->anon_vma_locks); kfree(data); } diff --git a/mm/mremap.c b/mm/mremap.c --- a/mm/mremap.c +++ b/mm/mremap.c @@ -88,7 +88,7 @@ * and we propagate stale pages into the dst afterward. */ mapping = vma->vm_file->f_mapping; - spin_lock(&mapping->i_mmap_lock); + down_write(&mapping->i_mmap_sem); if (new_vma->vm_truncate_count && new_vma->vm_truncate_count != vma->vm_truncate_count) new_vma->vm_truncate_count = 0; @@ -120,7 +120,7 @@ pte_unmap_nested(new_pte - 1); pte_unmap_unlock(old_pte - 1, old_ptl); if (mapping) - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end); } diff --git a/mm/rmap.c b/mm/rmap.c --- a/mm/rmap.c +++ b/mm/rmap.c @@ -24,7 +24,7 @@ * inode->i_alloc_sem (vmtruncate_range) * mm->mmap_sem * page->flags PG_locked (lock_page) - * mapping->i_mmap_lock + * mapping->i_mmap_sem * anon_vma->lock * mm->page_table_lock or pte_lock * zone->lru_lock (in mark_page_accessed, isolate_lru_page) @@ -373,14 +373,14 @@ * The page lock not only makes sure that page->mapping cannot * suddenly be NULLified by truncation, it makes sure that the * structure at mapping cannot be freed and reused yet, - * so we can safely take mapping->i_mmap_lock. + * so we can safely take mapping->i_mmap_sem. */ BUG_ON(!PageLocked(page)); - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); /* - * i_mmap_lock does not stabilize mapcount at all, but mapcount + * i_mmap_sem does not stabilize mapcount at all, but mapcount * is more likely to be accurate if we note it after spinning. */ mapcount = page_mapcount(page); @@ -403,7 +403,7 @@ break; } - spin_unlock(&mapping->i_mmap_lock); + up_read(&mapping->i_mmap_sem); return referenced; } @@ -489,12 +489,12 @@ BUG_ON(PageAnon(page)); - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) { if (vma->vm_flags & VM_SHARED) ret += page_mkclean_one(page, vma); } - spin_unlock(&mapping->i_mmap_lock); + up_read(&mapping->i_mmap_sem); return ret; } @@ -930,7 +930,7 @@ unsigned long max_nl_size = 0; unsigned int mapcount; - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) { ret = try_to_unmap_one(page, vma, migration); if (ret == SWAP_FAIL || !page_mapped(page)) @@ -967,7 +967,6 @@ mapcount = page_mapcount(page); if (!mapcount) goto out; - cond_resched_lock(&mapping->i_mmap_lock); max_nl_size = (max_nl_size + CLUSTER_SIZE - 1) & CLUSTER_MASK; if (max_nl_cursor == 0) @@ -989,7 +988,6 @@ } vma->vm_private_data = (void *) max_nl_cursor; } - cond_resched_lock(&mapping->i_mmap_lock); max_nl_cursor += CLUSTER_SIZE; } while (max_nl_cursor <= max_nl_size); @@ -1001,7 +999,7 @@ list_for_each_entry(vma, &mapping->i_mmap_nonlinear, shared.vm_set.list) vma->vm_private_data = NULL; out: - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); return ret; } From andrea at qumranet.com Tue Apr 8 08:44:10 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 08 Apr 2008 17:44:10 +0200 Subject: [ofa-general] [PATCH 7 of 9] Convert the anon_vma spinlock to a rw semaphore. This allows concurrent In-Reply-To: Message-ID: # HG changeset patch # User Andrea Arcangeli # Date 1207666968 -7200 # Node ID a0c52e4b9b71e2627238b69c0a58905097973279 # Parent b0cb674314534b9cc4759603f123474d38427b2d Convert the anon_vma spinlock to a rw semaphore. This allows concurrent traversal of reverse maps for try_to_unmap and page_mkclean. It also allows the calling of sleeping functions from reverse map traversal. An additional complication is that rcu is used in some context to guarantee the presence of the anon_vma while we acquire the lock. We cannot take a semaphore within an rcu critical section. Add a refcount to the anon_vma structure which allow us to give an existence guarantee for the anon_vma structure independent of the spinlock or the list contents. The refcount can then be taken within the RCU section. If it has been taken successfully then the refcount guarantees the existence of the anon_vma. The refcount in anon_vma also allows us to fix a nasty issue in page migration where we fudged by using rcu for a long code path to guarantee the existence of the anon_vma. The refcount in general allows a shortening of RCU critical sections since we can do a rcu_unlock after taking the refcount. This is particularly relevant if the anon_vma chains contain hundreds of entries. Issues: - Atomic overhead increases in situations where a new reference to the anon_vma has to be established or removed. Overhead also increases when a speculative reference is used (try_to_unmap, page_mkclean, page migration). There is also the more frequent processor change due to up_xxx letting waiting tasks run first. This results in f.e. the Aim9 brk performance test to got down by 10-15%. Signed-off-by: Christoph Lameter diff --git a/include/linux/mm.h b/include/linux/mm.h --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1051,9 +1051,9 @@ struct mm_lock_data { struct rw_semaphore **i_mmap_sems; - spinlock_t **anon_vma_locks; + struct rw_semaphore **anon_vma_sems; unsigned long nr_i_mmap_sems; - unsigned long nr_anon_vma_locks; + unsigned long nr_anon_vma_sems; }; extern struct mm_lock_data *mm_lock(struct mm_struct * mm); extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data); diff --git a/include/linux/rmap.h b/include/linux/rmap.h --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -25,7 +25,8 @@ * pointing to this anon_vma once its vma list is empty. */ struct anon_vma { - spinlock_t lock; /* Serialize access to vma list */ + atomic_t refcount; /* vmas on the list */ + struct rw_semaphore sem;/* Serialize access to vma list */ struct list_head head; /* List of private "related" vmas */ }; @@ -43,18 +44,31 @@ kmem_cache_free(anon_vma_cachep, anon_vma); } +struct anon_vma *grab_anon_vma(struct page *page); + +static inline void get_anon_vma(struct anon_vma *anon_vma) +{ + atomic_inc(&anon_vma->refcount); +} + +static inline void put_anon_vma(struct anon_vma *anon_vma) +{ + if (atomic_dec_and_test(&anon_vma->refcount)) + anon_vma_free(anon_vma); +} + static inline void anon_vma_lock(struct vm_area_struct *vma) { struct anon_vma *anon_vma = vma->anon_vma; if (anon_vma) - spin_lock(&anon_vma->lock); + down_write(&anon_vma->sem); } static inline void anon_vma_unlock(struct vm_area_struct *vma) { struct anon_vma *anon_vma = vma->anon_vma; if (anon_vma) - spin_unlock(&anon_vma->lock); + up_write(&anon_vma->sem); } /* diff --git a/mm/migrate.c b/mm/migrate.c --- a/mm/migrate.c +++ b/mm/migrate.c @@ -235,15 +235,16 @@ return; /* - * We hold the mmap_sem lock. So no need to call page_lock_anon_vma. + * We hold either the mmap_sem lock or a reference on the + * anon_vma. So no need to call page_lock_anon_vma. */ anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON); - spin_lock(&anon_vma->lock); + down_read(&anon_vma->sem); list_for_each_entry(vma, &anon_vma->head, anon_vma_node) remove_migration_pte(vma, old, new); - spin_unlock(&anon_vma->lock); + up_read(&anon_vma->sem); } /* @@ -623,7 +624,7 @@ int rc = 0; int *result = NULL; struct page *newpage = get_new_page(page, private, &result); - int rcu_locked = 0; + struct anon_vma *anon_vma = NULL; int charge = 0; if (!newpage) @@ -647,16 +648,14 @@ } /* * By try_to_unmap(), page->mapcount goes down to 0 here. In this case, - * we cannot notice that anon_vma is freed while we migrates a page. + * we cannot notice that anon_vma is freed while we migrate a page. * This rcu_read_lock() delays freeing anon_vma pointer until the end * of migration. File cache pages are no problem because of page_lock() * File Caches may use write_page() or lock_page() in migration, then, * just care Anon page here. */ - if (PageAnon(page)) { - rcu_read_lock(); - rcu_locked = 1; - } + if (PageAnon(page)) + anon_vma = grab_anon_vma(page); /* * Corner case handling: @@ -674,10 +673,7 @@ if (!PageAnon(page) && PagePrivate(page)) { /* * Go direct to try_to_free_buffers() here because - * a) that's what try_to_release_page() would do anyway - * b) we may be under rcu_read_lock() here, so we can't - * use GFP_KERNEL which is what try_to_release_page() - * needs to be effective. + * that's what try_to_release_page() would do anyway */ try_to_free_buffers(page); } @@ -698,8 +694,8 @@ } else if (charge) mem_cgroup_end_migration(newpage); rcu_unlock: - if (rcu_locked) - rcu_read_unlock(); + if (anon_vma) + put_anon_vma(anon_vma); unlock: diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -566,7 +566,7 @@ if (vma->anon_vma) anon_vma = vma->anon_vma; if (anon_vma) { - spin_lock(&anon_vma->lock); + down_write(&anon_vma->sem); /* * Easily overlooked: when mprotect shifts the boundary, * make sure the expanding vma has anon_vma set if the @@ -620,7 +620,7 @@ } if (anon_vma) - spin_unlock(&anon_vma->lock); + up_write(&anon_vma->sem); if (mapping) up_write(&mapping->i_mmap_sem); @@ -2247,16 +2247,15 @@ struct mm_lock_data *mm_lock(struct mm_struct * mm) { struct vm_area_struct *vma; - struct rw_semaphore *i_mmap_sem_last; - spinlock_t *anon_vma_lock_last; - unsigned long nr_i_mmap_sems, nr_anon_vma_locks, i; + struct rw_semaphore *i_mmap_sem_last, *anon_vma_sem_last; + unsigned long nr_i_mmap_sems, nr_anon_vma_sems, i; struct mm_lock_data *data; int err; down_write(&mm->mmap_sem); err = -EINTR; - nr_i_mmap_sems = nr_anon_vma_locks = 0; + nr_i_mmap_sems = nr_anon_vma_sems = 0; for (vma = mm->mmap; vma; vma = vma->vm_next) { cond_resched(); if (unlikely(signal_pending(current))) @@ -2265,7 +2264,7 @@ if (vma->vm_file && vma->vm_file->f_mapping) nr_i_mmap_sems++; if (vma->anon_vma) - nr_anon_vma_locks++; + nr_anon_vma_sems++; } err = -ENOMEM; @@ -2281,13 +2280,13 @@ } else data->i_mmap_sems = NULL; - if (nr_anon_vma_locks) { - data->anon_vma_locks = vmalloc(nr_anon_vma_locks * - sizeof(spinlock_t)); - if (!data->anon_vma_locks) + if (nr_anon_vma_sems) { + data->anon_vma_sems = vmalloc(nr_anon_vma_sems * + sizeof(struct rw_semaphore)); + if (!data->anon_vma_sems) goto out_vfree; } else - data->anon_vma_locks = NULL; + data->anon_vma_sems = NULL; err = -EINTR; i_mmap_sem_last = NULL; @@ -2318,10 +2317,11 @@ } data->nr_i_mmap_sems = nr_i_mmap_sems; - anon_vma_lock_last = NULL; - nr_anon_vma_locks = 0; + anon_vma_sem_last = NULL; + nr_anon_vma_sems = 0; for (;;) { - spinlock_t *anon_vma_lock = (spinlock_t *) -1UL; + struct rw_semaphore *anon_vma_sem; + anon_vma_sem = (struct rw_semaphore *) -1UL; for (vma = mm->mmap; vma; vma = vma->vm_next) { cond_resched(); if (unlikely(signal_pending(current))) @@ -2329,28 +2329,28 @@ if (!vma->anon_vma) continue; - if ((unsigned long) anon_vma_lock > - (unsigned long) &vma->anon_vma->lock && - (unsigned long) &vma->anon_vma->lock > - (unsigned long) anon_vma_lock_last) - anon_vma_lock = &vma->anon_vma->lock; + if ((unsigned long) anon_vma_sem > + (unsigned long) &vma->anon_vma->sem && + (unsigned long) &vma->anon_vma->sem > + (unsigned long) anon_vma_sem_last) + anon_vma_sem = &vma->anon_vma->sem; } - if (anon_vma_lock == (spinlock_t *) -1UL) + if (anon_vma_sem == (struct rw_semaphore *) -1UL) break; - anon_vma_lock_last = anon_vma_lock; - data->anon_vma_locks[nr_anon_vma_locks++] = anon_vma_lock; + anon_vma_sem_last = anon_vma_sem; + data->anon_vma_sems[nr_anon_vma_sems++] = anon_vma_sem; } - data->nr_anon_vma_locks = nr_anon_vma_locks; + data->nr_anon_vma_sems = nr_anon_vma_sems; for (i = 0; i < nr_i_mmap_sems; i++) down_write(data->i_mmap_sems[i]); - for (i = 0; i < nr_anon_vma_locks; i++) - spin_lock(data->anon_vma_locks[i]); + for (i = 0; i < nr_anon_vma_sems; i++) + down_write(data->anon_vma_sems[i]); return data; out_vfree_both: - vfree(data->anon_vma_locks); + vfree(data->anon_vma_sems); out_vfree: vfree(data->i_mmap_sems); out_kfree: @@ -2366,12 +2366,12 @@ for (i = 0; i < data->nr_i_mmap_sems; i++) up_write(data->i_mmap_sems[i]); - for (i = 0; i < data->nr_anon_vma_locks; i++) - spin_unlock(data->anon_vma_locks[i]); + for (i = 0; i < data->nr_anon_vma_sems; i++) + up_write(data->anon_vma_sems[i]); up_write(&mm->mmap_sem); vfree(data->i_mmap_sems); - vfree(data->anon_vma_locks); + vfree(data->anon_vma_sems); kfree(data); } diff --git a/mm/rmap.c b/mm/rmap.c --- a/mm/rmap.c +++ b/mm/rmap.c @@ -69,7 +69,7 @@ if (anon_vma) { allocated = NULL; locked = anon_vma; - spin_lock(&locked->lock); + down_write(&locked->sem); } else { anon_vma = anon_vma_alloc(); if (unlikely(!anon_vma)) @@ -81,6 +81,7 @@ /* page_table_lock to protect against threads */ spin_lock(&mm->page_table_lock); if (likely(!vma->anon_vma)) { + get_anon_vma(anon_vma); vma->anon_vma = anon_vma; list_add_tail(&vma->anon_vma_node, &anon_vma->head); allocated = NULL; @@ -88,7 +89,7 @@ spin_unlock(&mm->page_table_lock); if (locked) - spin_unlock(&locked->lock); + up_write(&locked->sem); if (unlikely(allocated)) anon_vma_free(allocated); } @@ -99,14 +100,17 @@ { BUG_ON(vma->anon_vma != next->anon_vma); list_del(&next->anon_vma_node); + put_anon_vma(vma->anon_vma); } void __anon_vma_link(struct vm_area_struct *vma) { struct anon_vma *anon_vma = vma->anon_vma; - if (anon_vma) + if (anon_vma) { + get_anon_vma(anon_vma); list_add_tail(&vma->anon_vma_node, &anon_vma->head); + } } void anon_vma_link(struct vm_area_struct *vma) @@ -114,36 +118,32 @@ struct anon_vma *anon_vma = vma->anon_vma; if (anon_vma) { - spin_lock(&anon_vma->lock); + get_anon_vma(anon_vma); + down_write(&anon_vma->sem); list_add_tail(&vma->anon_vma_node, &anon_vma->head); - spin_unlock(&anon_vma->lock); + up_write(&anon_vma->sem); } } void anon_vma_unlink(struct vm_area_struct *vma) { struct anon_vma *anon_vma = vma->anon_vma; - int empty; if (!anon_vma) return; - spin_lock(&anon_vma->lock); + down_write(&anon_vma->sem); list_del(&vma->anon_vma_node); - - /* We must garbage collect the anon_vma if it's empty */ - empty = list_empty(&anon_vma->head); - spin_unlock(&anon_vma->lock); - - if (empty) - anon_vma_free(anon_vma); + up_write(&anon_vma->sem); + put_anon_vma(anon_vma); } static void anon_vma_ctor(struct kmem_cache *cachep, void *data) { struct anon_vma *anon_vma = data; - spin_lock_init(&anon_vma->lock); + init_rwsem(&anon_vma->sem); + atomic_set(&anon_vma->refcount, 0); INIT_LIST_HEAD(&anon_vma->head); } @@ -157,9 +157,9 @@ * Getting a lock on a stable anon_vma from a page off the LRU is * tricky: page_lock_anon_vma rely on RCU to guard against the races. */ -static struct anon_vma *page_lock_anon_vma(struct page *page) +struct anon_vma *grab_anon_vma(struct page *page) { - struct anon_vma *anon_vma; + struct anon_vma *anon_vma = NULL; unsigned long anon_mapping; rcu_read_lock(); @@ -170,17 +170,26 @@ goto out; anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); - spin_lock(&anon_vma->lock); - return anon_vma; + if (!atomic_inc_not_zero(&anon_vma->refcount)) + anon_vma = NULL; out: rcu_read_unlock(); - return NULL; + return anon_vma; +} + +static struct anon_vma *page_lock_anon_vma(struct page *page) +{ + struct anon_vma *anon_vma = grab_anon_vma(page); + + if (anon_vma) + down_read(&anon_vma->sem); + return anon_vma; } static void page_unlock_anon_vma(struct anon_vma *anon_vma) { - spin_unlock(&anon_vma->lock); - rcu_read_unlock(); + up_read(&anon_vma->sem); + put_anon_vma(anon_vma); } /* From andrea at qumranet.com Tue Apr 8 08:44:11 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 08 Apr 2008 17:44:11 +0200 Subject: [ofa-general] [PATCH 8 of 9] XPMEM would have used sys_madvise() except that madvise_dontneed() In-Reply-To: Message-ID: <3b14e26a4e0491f00bb9.1207669451@duo.random> # HG changeset patch # User Andrea Arcangeli # Date 1207666972 -7200 # Node ID 3b14e26a4e0491f00bb989be04d8b7e0755ed2d7 # Parent a0c52e4b9b71e2627238b69c0a58905097973279 XPMEM would have used sys_madvise() except that madvise_dontneed() returns an -EINVAL if VM_PFNMAP is set, which is always true for the pages XPMEM imports from other partitions and is also true for uncached pages allocated locally via the mspec allocator. XPMEM needs zap_page_range() functionality for these types of pages as well as 'normal' pages. Signed-off-by: Dean Nelson diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -900,6 +900,7 @@ return unmap_vmas(vma, address, end, &nr_accounted, details); } +EXPORT_SYMBOL_GPL(zap_page_range); /* * Do a quick page-table lookup for a single page. From andrea at qumranet.com Tue Apr 8 08:44:12 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 08 Apr 2008 17:44:12 +0200 Subject: [ofa-general] [PATCH 9 of 9] This patch adds a lock ordering rule to avoid a potential deadlock when In-Reply-To: Message-ID: # HG changeset patch # User Andrea Arcangeli # Date 1207666972 -7200 # Node ID bd55023b22769ecb14b26c2347947f7d6d63bcea # Parent 3b14e26a4e0491f00bb989be04d8b7e0755ed2d7 This patch adds a lock ordering rule to avoid a potential deadlock when multiple mmap_sems need to be locked. Signed-off-by: Dean Nelson diff --git a/mm/filemap.c b/mm/filemap.c --- a/mm/filemap.c +++ b/mm/filemap.c @@ -79,6 +79,9 @@ * * ->i_mutex (generic_file_buffered_write) * ->mmap_sem (fault_in_pages_readable->do_page_fault) + * + * When taking multiple mmap_sems, one should lock the lowest-addressed + * one first proceeding on up to the highest-addressed one. * * ->i_mutex * ->i_alloc_sem (various) From holt at sgi.com Tue Apr 8 09:26:19 2008 From: holt at sgi.com (Robin Holt) Date: Tue, 8 Apr 2008 11:26:19 -0500 Subject: [ofa-general] Re: [PATCH 2 of 9] Core of mmu notifiers In-Reply-To: References: Message-ID: <20080408162619.GP11364@sgi.com> This one does not build on ia64. I get the following: [holt at attica mmu_v12_xpmem_v003_v1]$ make compressed CHK include/linux/version.h CHK include/linux/utsrelease.h CALL scripts/checksyscalls.sh CHK include/linux/compile.h CC mm/mmu_notifier.o In file included from include/linux/mmu_notifier.h:6, from mm/mmu_notifier.c:12: include/linux/mm_types.h:200: error: expected specifier-qualifier-list before ‘cpumask_t’ In file included from mm/mmu_notifier.c:12: include/linux/mmu_notifier.h: In function ‘mm_has_notifiers’: include/linux/mmu_notifier.h:62: error: ‘struct mm_struct’ has no member named ‘mmu_notifier_list’ include/linux/mmu_notifier.h: In function ‘mmu_notifier_mm_init’: include/linux/mmu_notifier.h:117: error: ‘struct mm_struct’ has no member named ‘mmu_notifier_list’ In file included from include/asm/pgtable.h:155, from include/linux/mm.h:39, from mm/mmu_notifier.c:14: include/asm/mmu_context.h: In function ‘get_mmu_context’: include/asm/mmu_context.h:81: error: ‘struct mm_struct’ has no member named ‘context’ include/asm/mmu_context.h:88: error: ‘struct mm_struct’ has no member named ‘context’ include/asm/mmu_context.h:90: error: ‘struct mm_struct’ has no member named ‘cpu_vm_mask’ include/asm/mmu_context.h:99: error: ‘struct mm_struct’ has no member named ‘context’ include/asm/mmu_context.h: In function ‘init_new_context’: include/asm/mmu_context.h:120: error: ‘struct mm_struct’ has no member named ‘context’ include/asm/mmu_context.h: In function ‘activate_context’: include/asm/mmu_context.h:173: error: ‘struct mm_struct’ has no member named ‘cpu_vm_mask’ include/asm/mmu_context.h:174: error: ‘struct mm_struct’ has no member named ‘cpu_vm_mask’ include/asm/mmu_context.h:180: error: ‘struct mm_struct’ has no member named ‘context’ mm/mmu_notifier.c: In function ‘__mmu_notifier_release’: mm/mmu_notifier.c:25: error: ‘struct mm_struct’ has no member named ‘mmu_notifier_list’ mm/mmu_notifier.c:26: error: ‘struct mm_struct’ has no member named ‘mmu_notifier_list’ mm/mmu_notifier.c: In function ‘__mmu_notifier_clear_flush_young’: mm/mmu_notifier.c:47: error: ‘struct mm_struct’ has no member named ‘mmu_notifier_list’ mm/mmu_notifier.c: In function ‘__mmu_notifier_invalidate_page’: mm/mmu_notifier.c:61: error: ‘struct mm_struct’ has no member named ‘mmu_notifier_list’ mm/mmu_notifier.c: In function ‘__mmu_notifier_invalidate_range_start’: mm/mmu_notifier.c:73: error: ‘struct mm_struct’ has no member named ‘mmu_notifier_list’ mm/mmu_notifier.c: In function ‘__mmu_notifier_invalidate_range_end’: mm/mmu_notifier.c:85: error: ‘struct mm_struct’ has no member named ‘mmu_notifier_list’ mm/mmu_notifier.c: In function ‘mmu_notifier_register’: mm/mmu_notifier.c:102: error: ‘struct mm_struct’ has no member named ‘mmu_notifier_list’ make[1]: *** [mm/mmu_notifier.o] Error 1 make: *** [mm] Error 2 From andrea at qumranet.com Tue Apr 8 10:05:25 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 8 Apr 2008 19:05:25 +0200 Subject: [ofa-general] Re: [PATCH 2 of 9] Core of mmu notifiers In-Reply-To: <20080408162619.GP11364@sgi.com> References: <20080408162619.GP11364@sgi.com> Message-ID: <20080408170525.GN10133@duo.random> On Tue, Apr 08, 2008 at 11:26:19AM -0500, Robin Holt wrote: > This one does not build on ia64. I get the following: I think it's a common code compilation bug not related to my patch. Can you test this? diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -10,6 +10,7 @@ #include #include #include +#include #include #include From kliteyn at mellanox.co.il Tue Apr 8 13:22:38 2008 From: kliteyn at mellanox.co.il (Yevgeny Kliteynik) Date: Tue, 08 Apr 2008 23:22:38 +0300 Subject: [ofa-general] ERR 0108: Unknown remote side In-Reply-To: <20080408183113.GA18308@sashak.voltaire.com> References: <200804041147.27565.bs@q-leap.de> <20080408014406.GA16864@sashak.voltaire.com> <200804081135.35846.bs@q-leap.de> <20080408183113.GA18308@sashak.voltaire.com> Message-ID: <47FBD40E.70407@mellanox.co.il> Sasha Copyist wrote: > Hi Bernd, > > [adding Yevgeny..] > > On 11:35 Tue 08 Apr , Bernd Schubert wrote: > >> On Tuesday 08 April 2008 03:44:06 Sasha Copyist wrote: >> >>> Hi Bernd, >>> >>> On 11:47 Fri 04 Apr , Bernd Schubert wrote: >>> >>>> opensm-3.2.1 logs some error messages like this: >>>> >>>> Apr 04 00:00:08 325114 [4580A960] 0x01 -> >>>> __osm_state_mgr_light_sweep_start: ERR 0108: Unknown remote side for node >>>> 0 >>>> x000b8cffff002ba2(SW_pfs1_leaf4) port 13. Adding to light sweep sampling >>>> list Apr 04 00:00:08 325126 [4580A960] 0x01 -> Directed Path Dump of 3 >>>> hop path: Path = 0,1,14,13 >>>> >>>> >>>> From ibnetdiscover output I see port13 of this switch is a >>>> switch-interconnect (sorry, I don't know what the correct name/identifier >>>> for switches within switches): >>>> >>>> [13] "S-000b8cffff002bfa"[13] # "SW_pfs1_inter7" lid >>>> 263 4xSDR >>>> >>> It is possible that port was DOWN during first subnet discovery. Finally >>> everything should be initialized after those messages. Isn't it the case >>> here? >>> >> I think everything is initialized, but I don't think the port was down during >> first subnet discovery, since the port is on a spine board (I called >> it 'inter') to another switch system. We also never added any leafes to the >> switches. >> > > It is interesting phenomena then. > > Yevgeny, do you aware about such issue with Flextrinocs switches? > > I've seen it before. It means that during discovery some switch has answered NodeInfo query, but then when OpenSM started to query for PortInfo for each port of this switch, switch didn't answer for some (or all) ports. I think that this might happen if a switch has just been "plugged in", and internal switches are doing autonegotiation - they are bringing ports up and down when determining whether a link is SDR or DDR. In any case, this "phenomena" should disappear after a couple of dozens of seconds, when all the autonegotiation phase would be over. Bernd, am I close? -- Yevgeny > Sasha > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From clameter at sgi.com Tue Apr 8 13:23:33 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 8 Apr 2008 13:23:33 -0700 (PDT) Subject: [ofa-general] Re: [patch 02/10] emm: notifier logic In-Reply-To: <20080407071330.GH9309@duo.random> References: <20080404223048.374852899@sgi.com> <20080404223131.469710551@sgi.com> <20080405005759.GH14784@duo.random> <20080407060602.GE9309@duo.random> <20080407071330.GH9309@duo.random> Message-ID: It may also be useful to allow invalidate_start() to fail in some contexts (try_to_unmap f.e., maybe if a certain flag is passed). This may allow the device to get out of tight situations (pending I/O f.e. or time out if there is no response for network communications). But then that complicates the API. From Frank.Leers at Sun.COM Tue Apr 8 13:58:21 2008 From: Frank.Leers at Sun.COM (Frank Leers) Date: Tue, 08 Apr 2008 13:58:21 -0700 Subject: [ofa-general] install.sh question Message-ID: <1207688301.1661.86.camel@localhost> Hi all, I'd like to be able to use the provided install.sh from cluster nodes to install from a build which is shared over nfs, while utilizing an ofed_net.conf The Install Guide talks about this, but I must be missing something in the detail. Is there a way to not check if a build needs to be (re)done and simply install the rpm's that were created during the original build, then create the ifcfg-ib? devices based on the template file passed in with -net ? I prefer not to have kernel sources, compiler, etc. on these compute nodes, nor should I have to recompile for each homogeneous node. thanks, -frank From avi at qumranet.com Tue Apr 8 14:46:49 2008 From: avi at qumranet.com (Avi Kivity) Date: Wed, 09 Apr 2008 00:46:49 +0300 Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12 In-Reply-To: References: Message-ID: <47FBE7C9.9000701@qumranet.com> Andrea Arcangeli wrote: > Note that mmu_notifier_unregister may also fail with -EINTR if there are > signal pending or the system runs out of vmalloc space or physical memory, > only exit_mmap guarantees that any kernel module can be unloaded in presence > of an oom condition. > > That's unusual. What happens to the notifier? Suppose I destroy a vm without exiting the process, what happens if it fires? -- Any sufficiently difficult bug is indistinguishable from a feature. From andrea at qumranet.com Tue Apr 8 15:06:27 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 9 Apr 2008 00:06:27 +0200 Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12 In-Reply-To: <47FBE7C9.9000701@qumranet.com> References: <47FBE7C9.9000701@qumranet.com> Message-ID: <20080408220627.GP10133@duo.random> On Wed, Apr 09, 2008 at 12:46:49AM +0300, Avi Kivity wrote: > That's unusual. What happens to the notifier? Suppose I destroy a vm Yes it's quite unusual. > without exiting the process, what happens if it fires? The mmu notifier ops should stop doing stuff (if there will be no memslots they will be noops), or the ops can be replaced atomically with null pointers. The important thing is that the module can't go away until ->release is invoked or until mmu_notifier_unregister returned 0. Previously there was no mmu_notifier_unregister, so adding it can't be a regression compared to #v11, even if it can fail and you may have to retry later after returning to userland. Retrying from userland is always safe in oom kill terms, only looping inside the kernel isn't safe as do_exit has no chance to run. From sashak at voltaire.com Tue Apr 8 18:10:21 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 9 Apr 2008 01:10:21 +0000 Subject: [ofa-general] [RFC][PATCH 0/4] opensm: using conventional config file Message-ID: <1207703425-19039-1-git-send-email-sashak@voltaire.com> Hi, This is attempt to make some order with OpenSM configuration. Now it will use conventional (similar to another programs which may have configuration) config ($sysconfig/etc/opensm/opensm.conf) file instead of option cache file. Config file for some startup scripts should go away. Option '-c' is preserved - it can be useful for config file template generation, but OpenSM will not try to read option cache file. This is RFC yet. In addition to this we will need to update scripts and man pages. Any feedback? Thoughts? Sasha From sashak at voltaire.com Tue Apr 8 18:10:22 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 9 Apr 2008 01:10:22 +0000 Subject: [ofa-general] [PATCH 1/4] opensm: pass file name as parameter to config parser funcs In-Reply-To: <1207703425-19039-1-git-send-email-sashak@voltaire.com> References: <1207703425-19039-1-git-send-email-sashak@voltaire.com> Message-ID: <1207703425-19039-2-git-send-email-sashak@voltaire.com> Functions osm_subn_parse_conf_file() and osm_subn_write_conf_file() will get config file name as parameter. Also it is stored as part of config options and used by osm_subn_rescan_conf_files(). Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_subnet.h | 10 +++++- opensm/opensm/main.c | 13 +++++++- opensm/opensm/osm_subnet.c | 53 ++++++++--------------------------- 3 files changed, 31 insertions(+), 45 deletions(-) diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h index b1dd659..98afbd4 100644 --- a/opensm/include/opensm/osm_subnet.h +++ b/opensm/include/opensm/osm_subnet.h @@ -205,6 +205,7 @@ typedef struct _osm_qos_options_t { * SYNOPSIS */ typedef struct _osm_subn_opt { + char *config_file; ib_net64_t guid; ib_net64_t m_key; ib_net64_t sm_key; @@ -289,6 +290,9 @@ typedef struct _osm_subn_opt { /* * FIELDS * +* config_file +* The name of the config file. +* * guid * The port guid that the SM is binding to. * @@ -1057,7 +1061,8 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt); * * SYNOPSIS */ -ib_api_status_t osm_subn_parse_conf_file(IN osm_subn_opt_t * const p_opt); +ib_api_status_t osm_subn_parse_conf_file(char *conf_file, + IN osm_subn_opt_t * const p_opt); /* * PARAMETERS * @@ -1109,7 +1114,8 @@ ib_api_status_t osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn); * * SYNOPSIS */ -ib_api_status_t osm_subn_write_conf_file(IN osm_subn_opt_t * const p_opt); +ib_api_status_t osm_subn_write_conf_file(char *file_name, + IN osm_subn_opt_t * const p_opt); /* * PARAMETERS * diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index fb41d50..91ee143 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -589,6 +589,8 @@ int main(int argc, char *argv[]) { osm_opensm_t osm; osm_subn_opt_t opt; + char conf_file[256]; + char *cache_dir; ib_net64_t sm_key = 0; ib_api_status_t status; uint32_t temp, dbg_lvl; @@ -674,7 +676,14 @@ int main(int argc, char *argv[]) printf("%s\n", OSM_VERSION); osm_subn_set_default_opt(&opt); - if (osm_subn_parse_conf_file(&opt) != IB_SUCCESS) + + /* try to open the options file from the cache dir */ + cache_dir = getenv("OSM_CACHE_DIR"); + if (!cache_dir || !(*cache_dir)) + cache_dir = OSM_DEFAULT_CACHE_DIR; + snprintf(conf_file, sizeof(conf_file), "%s/opensm.opts", cache_dir); + + if (osm_subn_parse_conf_file(conf_file, &opt) != IB_SUCCESS) printf("\nosm_subn_parse_conf_file failed!\n"); printf("Command Line Arguments:\n"); @@ -1013,7 +1022,7 @@ int main(int argc, char *argv[]) opt.guid = get_port_guid(&osm, opt.guid); if (cache_options == TRUE - && osm_subn_write_conf_file(&opt) != IB_SUCCESS) + && osm_subn_write_conf_file(conf_file, &opt) != IB_SUCCESS) printf("\nosm_subn_write_conf_file failed!\n"); status = osm_opensm_bind(&osm, opt.guid); diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index 47d735f..f3f4c52 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -71,14 +71,6 @@ #include #include -#if defined(PATH_MAX) -#define OSM_PATH_MAX (PATH_MAX + 1) -#elif defined (_POSIX_PATH_MAX) -#define OSM_PATH_MAX (_POSIX_PATH_MAX + 1) -#else -#define OSM_PATH_MAX 256 -#endif - /********************************************************************** **********************************************************************/ void osm_subn_construct(IN osm_subn_t * const p_subn) @@ -787,26 +779,20 @@ osm_parse_prefix_routes_file(IN osm_subn_t * const p_subn) **********************************************************************/ ib_api_status_t osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn) { - char *p_cache_dir = getenv("OSM_CACHE_DIR"); - char file_name[OSM_PATH_MAX]; FILE *opts_file; char line[1024]; char *p_key, *p_val, *p_last; - /* try to open the options file from the cache dir */ - if (!p_cache_dir || !(*p_cache_dir)) - p_cache_dir = OSM_DEFAULT_CACHE_DIR; + if (!p_subn->opt.config_file) + return 0; - strcpy(file_name, p_cache_dir); - strcat(file_name, "/opensm.opts"); - - opts_file = fopen(file_name, "r"); + opts_file = fopen(p_subn->opt.config_file, "r"); if (!opts_file) { if (errno == ENOENT) return IB_SUCCESS; OSM_LOG(&p_subn->p_osm->log, OSM_LOG_ERROR, "cannot open file \'%s\': %s\n", - file_name, strerror(errno)); + p_subn->opt.config_file, strerror(errno)); return IB_ERROR; } @@ -1142,21 +1128,13 @@ static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts) /********************************************************************** **********************************************************************/ -ib_api_status_t osm_subn_parse_conf_file(IN osm_subn_opt_t * const p_opts) +ib_api_status_t osm_subn_parse_conf_file(char *file_name, + IN osm_subn_opt_t * const p_opts) { - char *p_cache_dir = getenv("OSM_CACHE_DIR"); - char file_name[OSM_PATH_MAX]; - FILE *opts_file; char line[1024]; + FILE *opts_file; char *p_key, *p_val, *p_last; - /* try to open the options file from the cache dir */ - if (!p_cache_dir || !(*p_cache_dir)) - p_cache_dir = OSM_DEFAULT_CACHE_DIR; - - strcpy(file_name, p_cache_dir); - strcat(file_name, "/opensm.opts"); - opts_file = fopen(file_name, "r"); if (!opts_file) { if (errno == ENOENT) @@ -1166,10 +1144,11 @@ ib_api_status_t osm_subn_parse_conf_file(IN osm_subn_opt_t * const p_opts) return IB_ERROR; } - sprintf(line, " Reading Cached Option File: %s\n", file_name); - printf(line); + printf(" Reading Cached Option File: %s\n", file_name); cl_log_event("OpenSM", CL_LOG_INFO, line, NULL, 0); + p_opts->config_file = file_name; + while (fgets(line, 1023, opts_file) != NULL) { /* get the first token */ p_key = strtok_r(line, " \t\n", &p_last); @@ -1405,19 +1384,11 @@ ib_api_status_t osm_subn_parse_conf_file(IN osm_subn_opt_t * const p_opts) /********************************************************************** **********************************************************************/ -ib_api_status_t osm_subn_write_conf_file(IN osm_subn_opt_t * const p_opts) +ib_api_status_t osm_subn_write_conf_file(char *file_name, + IN osm_subn_opt_t * const p_opts) { - char *p_cache_dir = getenv("OSM_CACHE_DIR"); - char file_name[OSM_PATH_MAX]; FILE *opts_file; - /* try to open the options file from the cache dir */ - if (!p_cache_dir || !(*p_cache_dir)) - p_cache_dir = OSM_DEFAULT_CACHE_DIR; - - strcpy(file_name, p_cache_dir); - strcat(file_name, "/opensm.opts"); - opts_file = fopen(file_name, "w"); if (!opts_file) { printf("cannot open file \'%s\' for writing: %s\n", -- 1.5.4.1.122.gaa8d From sashak at voltaire.com Tue Apr 8 18:10:23 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 9 Apr 2008 01:10:23 +0000 Subject: [ofa-general] [PATCH 2/4] opensm: config file functions return int In-Reply-To: <1207703425-19039-1-git-send-email-sashak@voltaire.com> References: <1207703425-19039-1-git-send-email-sashak@voltaire.com> Message-ID: <1207703425-19039-3-git-send-email-sashak@voltaire.com> config file handling functions (parse and write) will return integer values instead of ib_api_status_t (it does nothing with 'ib') - when a failure is not existing config file a positive value will be returned to a caller, other errors will be indicated by a negative return value. Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_subnet.h | 35 +++++++++++------------------------ opensm/opensm/main.c | 4 ++-- opensm/opensm/osm_state_mgr.c | 3 +-- opensm/opensm/osm_subnet.c | 24 +++++++++++------------- 4 files changed, 25 insertions(+), 41 deletions(-) diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h index 98afbd4..5b6cef0 100644 --- a/opensm/include/opensm/osm_subnet.h +++ b/opensm/include/opensm/osm_subnet.h @@ -1061,8 +1061,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt); * * SYNOPSIS */ -ib_api_status_t osm_subn_parse_conf_file(char *conf_file, - IN osm_subn_opt_t * const p_opt); +int osm_subn_parse_conf_file(char *conf_file, osm_subn_opt_t * const p_opt); /* * PARAMETERS * @@ -1070,14 +1069,8 @@ ib_api_status_t osm_subn_parse_conf_file(char *conf_file, * [in] Pointer to the subnet options structure. * * RETURN VALUES -* IB_SUCCESS, IB_ERROR -* -* NOTES -* Assumes the conf file is part of the cache dir which defaults to -* OSM_DEFAULT_CACHE_DIR or OSM_CACHE_DIR the name is opensm.opts -* -* SEE ALSO -* Subnet object, osm_subn_construct, osm_subn_destroy +* 0 on success, positive value if file doesn't exist, +* negative value otherwise *********/ /****f* OpenSM: Subnet/osm_subn_rescan_conf_files @@ -1090,7 +1083,7 @@ ib_api_status_t osm_subn_parse_conf_file(char *conf_file, * * SYNOPSIS */ -ib_api_status_t osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn); +int osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn); /* * PARAMETERS * @@ -1098,10 +1091,8 @@ ib_api_status_t osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn); * [in] Pointer to the subnet structure. * * RETURN VALUES -* IB_SUCCESS, IB_ERROR -* -* NOTES -* This uses the same file as osm_subn_parse_conf_files() +* 0 on success, positive value if file doesn't exist, +* negative value otherwise * *********/ @@ -1110,12 +1101,11 @@ ib_api_status_t osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn); * osm_subn_write_conf_file * * DESCRIPTION -* Write the configuration file into the cache +* Write the configuration file into the cache * * SYNOPSIS */ -ib_api_status_t osm_subn_write_conf_file(char *file_name, - IN osm_subn_opt_t * const p_opt); +int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t * const p_opt); /* * PARAMETERS * @@ -1123,14 +1113,11 @@ ib_api_status_t osm_subn_write_conf_file(char *file_name, * [in] Pointer to the subnet options structure. * * RETURN VALUES -* IB_SUCCESS, IB_ERROR +* 0 on success, negative value otherwise * * NOTES -* Assumes the conf file is part of the cache dir which defaults to -* OSM_DEFAULT_CACHE_DIR or OSM_CACHE_DIR the name is opensm.opts -* -* SEE ALSO -* Subnet object, osm_subn_construct, osm_subn_destroy +* Assumes the conf file is part of the cache dir which defaults to +* OSM_DEFAULT_CACHE_DIR or OSM_CACHE_DIR the name is opensm.opts *********/ END_C_DECLS diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index 91ee143..da8047e 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -683,7 +683,7 @@ int main(int argc, char *argv[]) cache_dir = OSM_DEFAULT_CACHE_DIR; snprintf(conf_file, sizeof(conf_file), "%s/opensm.opts", cache_dir); - if (osm_subn_parse_conf_file(conf_file, &opt) != IB_SUCCESS) + if (osm_subn_parse_conf_file(conf_file, &opt) < 0) printf("\nosm_subn_parse_conf_file failed!\n"); printf("Command Line Arguments:\n"); @@ -1022,7 +1022,7 @@ int main(int argc, char *argv[]) opt.guid = get_port_guid(&osm, opt.guid); if (cache_options == TRUE - && osm_subn_write_conf_file(conf_file, &opt) != IB_SUCCESS) + && osm_subn_write_conf_file(conf_file, &opt)) printf("\nosm_subn_write_conf_file failed!\n"); status = osm_opensm_bind(&osm, opt.guid); diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c index 9b03314..8f1e086 100644 --- a/opensm/opensm/osm_state_mgr.c +++ b/opensm/opensm/osm_state_mgr.c @@ -1039,8 +1039,7 @@ _repeat_discovery: sm->p_subn->subnet_initialization_error = FALSE; /* rescan configuration updates */ - status = osm_subn_rescan_conf_files(sm->p_subn); - if (status != IB_SUCCESS) + if (osm_subn_rescan_conf_files(sm->p_subn) < 0) OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 331A: " "osm_subn_rescan_conf_file failed\n"); diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index f3f4c52..29a247a 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -777,7 +777,7 @@ osm_parse_prefix_routes_file(IN osm_subn_t * const p_subn) /********************************************************************** **********************************************************************/ -ib_api_status_t osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn) +int osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn) { FILE *opts_file; char line[1024]; @@ -789,11 +789,11 @@ ib_api_status_t osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn) opts_file = fopen(p_subn->opt.config_file, "r"); if (!opts_file) { if (errno == ENOENT) - return IB_SUCCESS; + return 1; OSM_LOG(&p_subn->p_osm->log, OSM_LOG_ERROR, "cannot open file \'%s\': %s\n", p_subn->opt.config_file, strerror(errno)); - return IB_ERROR; + return -1; } while (fgets(line, 1023, opts_file) != NULL) { @@ -828,7 +828,7 @@ ib_api_status_t osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn) osm_parse_prefix_routes_file(p_subn); - return IB_SUCCESS; + return 0; } /********************************************************************** @@ -1128,8 +1128,7 @@ static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts) /********************************************************************** **********************************************************************/ -ib_api_status_t osm_subn_parse_conf_file(char *file_name, - IN osm_subn_opt_t * const p_opts) +int osm_subn_parse_conf_file(char *file_name, osm_subn_opt_t * const p_opts) { char line[1024]; FILE *opts_file; @@ -1138,10 +1137,10 @@ ib_api_status_t osm_subn_parse_conf_file(char *file_name, opts_file = fopen(file_name, "r"); if (!opts_file) { if (errno == ENOENT) - return IB_SUCCESS; + return 1; printf("cannot open file \'%s\': %s\n", file_name, strerror(errno)); - return IB_ERROR; + return -1; } printf(" Reading Cached Option File: %s\n", file_name); @@ -1379,13 +1378,12 @@ ib_api_status_t osm_subn_parse_conf_file(char *file_name, subn_verify_conf_file(p_opts); - return IB_SUCCESS; + return 0; } /********************************************************************** **********************************************************************/ -ib_api_status_t osm_subn_write_conf_file(char *file_name, - IN osm_subn_opt_t * const p_opts) +int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t *const p_opts) { FILE *opts_file; @@ -1393,7 +1391,7 @@ ib_api_status_t osm_subn_write_conf_file(char *file_name, if (!opts_file) { printf("cannot open file \'%s\' for writing: %s\n", file_name, strerror(errno)); - return IB_ERROR; + return -1; } fprintf(opts_file, @@ -1715,5 +1713,5 @@ ib_api_status_t osm_subn_write_conf_file(char *file_name, fclose(opts_file); - return IB_SUCCESS; + return 0; } -- 1.5.4.1.122.gaa8d From sashak at voltaire.com Tue Apr 8 18:10:24 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 9 Apr 2008 01:10:24 +0000 Subject: [ofa-general] [PATCH 3/4] opensm: option to specify config file In-Reply-To: <1207703425-19039-1-git-send-email-sashak@voltaire.com> References: <1207703425-19039-1-git-send-email-sashak@voltaire.com> Message-ID: <1207703425-19039-4-git-send-email-sashak@voltaire.com> There is a new command line option '--config ' (or '-F'). When specified OpenSM will read initial configuration from this file (the format is same as opensm.opts) and not from /var/cache/opensm/opensm.opts file. Signed-off-by: Sasha Khapyorsky --- opensm/opensm/main.c | 21 ++++++++++++++++++++- 1 files changed, 20 insertions(+), 1 deletions(-) diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index da8047e..e39037d 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -130,6 +130,11 @@ static void show_usage(void) printf("\n------- OpenSM - Usage and options ----------------------\n"); printf("Usage: opensm [options]\n"); printf("Options:\n"); + printf("-F , --config \n" + " The name of the OpenSM config file. It has a same format\n" + " as opensm.opts option cache file. When not specified\n" + " $OSM_CACHE_DIR/opensm.opts (or /var/cache/opensm/opensm.opts)\n" + " will be used (if exists).\n\n"); printf("-c\n" "--cache-options\n" " Cache the given command line options into the file\n" @@ -600,8 +605,9 @@ int main(int argc, char *argv[]) boolean_t cache_options = FALSE; char *ignore_guids_file_name = NULL; uint32_t val; + unsigned config_file_done = 0; const char *const short_option = - "i:f:ed:g:l:L:s:t:a:u:m:R:zM:U:S:P:Y:NBIQvVhorcyxp:n:q:k:C:"; + "F:i:f:ed:g:l:L:s:t:a:u:m:R:zM:U:S:P:Y:NBIQvVhorcyxp:n:q:k:C:"; /* In the array below, the 2nd parameter specifies the number @@ -611,6 +617,7 @@ int main(int argc, char *argv[]) 2: optional */ const struct option long_option[] = { + {"config", 1, NULL, 'F'}, {"debug", 1, NULL, 'd'}, {"guid", 1, NULL, 'g'}, {"ignore_guids", 1, NULL, 'i'}, @@ -691,6 +698,18 @@ int main(int argc, char *argv[]) next_option = getopt_long_only(argc, argv, short_option, long_option, NULL); switch (next_option) { + case 'F': + if (config_file_done) + break; + printf("Reloading config from `%s`:\n", optarg); + if (osm_subn_parse_conf_file(optarg, &opt)) { + printf("cannot parse config file.\n"); + exit(1); + } + printf("Rescaning command line:\n"); + config_file_done = 1; + optind = 0; + break; case 'o': /* Run once option. -- 1.5.4.1.122.gaa8d From sashak at voltaire.com Tue Apr 8 18:10:25 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 9 Apr 2008 01:10:25 +0000 Subject: [ofa-general] [PATCH 4/4] opensm: use OSM_DEFAULT_CONFIG_FILE as config file In-Reply-To: <1207703425-19039-1-git-send-email-sashak@voltaire.com> References: <1207703425-19039-1-git-send-email-sashak@voltaire.com> Message-ID: <1207703425-19039-5-git-send-email-sashak@voltaire.com> Use configurable OSM_DEFAULT_CONFIG_FILE as default (when '-F' option is not specified) OpenSM config file. Default value is $sysconfdir/opensm/opensm.conf. Signed-off-by: Sasha Khapyorsky --- opensm/configure.in | 20 ++++++++++++++++++++ opensm/include/opensm/osm_base.h | 21 +++++++++++++++++++++ opensm/opensm/main.c | 25 +++++++++++-------------- 3 files changed, 52 insertions(+), 14 deletions(-) diff --git a/opensm/configure.in b/opensm/configure.in index a527c91..858eb60 100644 --- a/opensm/configure.in +++ b/opensm/configure.in @@ -106,6 +106,26 @@ AC_DEFINE_UNQUOTED(OPENSM_CONFIG_DIR, [Define OpenSM config directory]) AC_SUBST(OPENSM_CONFIG_DIR) +dnl Check for a different default OpenSm config file +OPENSM_CONFIG_FILE=opensm.conf +AC_MSG_CHECKING(for --with-opensm-conf-file ) +AC_ARG_WITH(opensm-conf-file, + AC_HELP_STRING([--with-opensm-conf-file=file], + [define a default OpenSM config file (default opensm.conf)]), + [ case "$withval" in + no) + ;; + *) + OPENSM_CONFIG_FILE=$withval + ;; + esac ] +) +AC_MSG_RESULT(${OPENSM_CONFIG_FILE}) +AC_DEFINE_UNQUOTED(HAVE_DEFAULT_OPENSM_CONFIG_FILE, + ["$CONF_DIR/$OPENSM_CONFIG_FILE"], + [Define a default OpenSM config file]) +AC_SUBST(OPENSM_CONFIG_FILE) + dnl Check for a different default node name map file NODENAMEMAPFILE=ib-node-name-map AC_MSG_CHECKING(for --with-node-name-map ) diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h index 62d472e..1bd993e 100644 --- a/opensm/include/opensm/osm_base.h +++ b/opensm/include/opensm/osm_base.h @@ -213,6 +213,27 @@ BEGIN_C_DECLS #define OSM_DEFAULT_LOG_FILE "/var/log/opensm.log" #endif /***********/ + +/****d* OpenSM: Base/OSM_DEFAULT_CONFIG_FILE +* NAME +* OSM_DEFAULT_CONFIG_FILE +* +* DESCRIPTION +* Specifies the default OpenSM config file name +* +* SYNOPSIS +*/ +#ifdef __WIN__ +#define OSM_DEFAULT_CONFIG_FILE strcat(GetOsmCachePath(), "opensm.conf") +#elif defined(HAVE_DEFAULT_OPENSM_CONFIG_FILE) +#define OSM_DEFAULT_CONFIG_FILE HAVE_DEFAULT_OPENSM_CONFIG_FILE +#elif define (OPENSM_CONFIG_DIR) +#define OSM_DEFAULT_OPENSM_CONFIG_FILE OPENSM_COFNIG_DIR "/opensm.conf" +#else +#define OSM_DEFAULT_OPENSM_CONFIG_FILE "/etc/opensm/opensm.conf" +#endif /* __WIN__ */ +/***********/ + /****d* OpenSM: Base/OSM_DEFAULT_PARTITION_CONFIG_FILE * NAME * OSM_DEFAULT_PARTITION_CONFIG_FILE diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index e39037d..0576dcc 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -133,8 +133,7 @@ static void show_usage(void) printf("-F , --config \n" " The name of the OpenSM config file. It has a same format\n" " as opensm.opts option cache file. When not specified\n" - " $OSM_CACHE_DIR/opensm.opts (or /var/cache/opensm/opensm.opts)\n" - " will be used (if exists).\n\n"); + " " OSM_DEFAULT_CONFIG_FILE " will be used (if exists).\n\n"); printf("-c\n" "--cache-options\n" " Cache the given command line options into the file\n" @@ -594,8 +593,6 @@ int main(int argc, char *argv[]) { osm_opensm_t osm; osm_subn_opt_t opt; - char conf_file[256]; - char *cache_dir; ib_net64_t sm_key = 0; ib_api_status_t status; uint32_t temp, dbg_lvl; @@ -684,13 +681,7 @@ int main(int argc, char *argv[]) osm_subn_set_default_opt(&opt); - /* try to open the options file from the cache dir */ - cache_dir = getenv("OSM_CACHE_DIR"); - if (!cache_dir || !(*cache_dir)) - cache_dir = OSM_DEFAULT_CACHE_DIR; - snprintf(conf_file, sizeof(conf_file), "%s/opensm.opts", cache_dir); - - if (osm_subn_parse_conf_file(conf_file, &opt) < 0) + if (osm_subn_parse_conf_file(OSM_DEFAULT_CONFIG_FILE, &opt) < 0) printf("\nosm_subn_parse_conf_file failed!\n"); printf("Command Line Arguments:\n"); @@ -1040,9 +1031,15 @@ int main(int argc, char *argv[]) if (opt.guid == 0 || cl_hton64(opt.guid) == CL_HTON64(INVALID_GUID)) opt.guid = get_port_guid(&osm, opt.guid); - if (cache_options == TRUE - && osm_subn_write_conf_file(conf_file, &opt)) - printf("\nosm_subn_write_conf_file failed!\n"); + if (cache_options == TRUE) { + char conf_file[256]; + char *cache_dir = getenv("OSM_CACHE_DIR"); + if (!cache_dir || !(*cache_dir)) + cache_dir = OSM_DEFAULT_CACHE_DIR; + snprintf(conf_file, sizeof(conf_file), "%s/opensm.opts", cache_dir); + if (osm_subn_write_conf_file(conf_file, &opt)) + printf("\nosm_subn_write_conf_file failed!\n"); + } status = osm_opensm_bind(&osm, opt.guid); if (status != IB_SUCCESS) { -- 1.5.4.1.122.gaa8d From chu11 at llnl.gov Tue Apr 8 16:35:08 2008 From: chu11 at llnl.gov (Al Chu) Date: Tue, 08 Apr 2008 16:35:08 -0700 Subject: [ofa-general] Re: [PATCH 4/4] opensm: use OSM_DEFAULT_CONFIG_FILE as config file In-Reply-To: <1207703425-19039-5-git-send-email-sashak@voltaire.com> References: <1207703425-19039-1-git-send-email-sashak@voltaire.com> <1207703425-19039-5-git-send-email-sashak@voltaire.com> Message-ID: <1207697708.7695.47.camel@cardanus.llnl.gov> Hey Sasha, Just saw two typos, inlined below. Al On Wed, 2008-04-09 at 01:10 +0000, Sasha Khapyorsky wrote: > Use configurable OSM_DEFAULT_CONFIG_FILE as default (when '-F' option is > not specified) OpenSM config file. Default value is > $sysconfdir/opensm/opensm.conf. > > Signed-off-by: Sasha Khapyorsky > --- > opensm/configure.in | 20 ++++++++++++++++++++ > opensm/include/opensm/osm_base.h | 21 +++++++++++++++++++++ > opensm/opensm/main.c | 25 +++++++++++-------------- > 3 files changed, 52 insertions(+), 14 deletions(-) > > diff --git a/opensm/configure.in b/opensm/configure.in > index a527c91..858eb60 100644 > --- a/opensm/configure.in > +++ b/opensm/configure.in > @@ -106,6 +106,26 @@ AC_DEFINE_UNQUOTED(OPENSM_CONFIG_DIR, > [Define OpenSM config directory]) > AC_SUBST(OPENSM_CONFIG_DIR) > > +dnl Check for a different default OpenSm config file > +OPENSM_CONFIG_FILE=opensm.conf > +AC_MSG_CHECKING(for --with-opensm-conf-file ) > +AC_ARG_WITH(opensm-conf-file, > + AC_HELP_STRING([--with-opensm-conf-file=file], > + [define a default OpenSM config file (default opensm.conf)]), > + [ case "$withval" in > + no) > + ;; > + *) > + OPENSM_CONFIG_FILE=$withval > + ;; > + esac ] > +) > +AC_MSG_RESULT(${OPENSM_CONFIG_FILE}) > +AC_DEFINE_UNQUOTED(HAVE_DEFAULT_OPENSM_CONFIG_FILE, > + ["$CONF_DIR/$OPENSM_CONFIG_FILE"], > + [Define a default OpenSM config file]) > +AC_SUBST(OPENSM_CONFIG_FILE) > + > dnl Check for a different default node name map file > NODENAMEMAPFILE=ib-node-name-map > AC_MSG_CHECKING(for --with-node-name-map ) > diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h > index 62d472e..1bd993e 100644 > --- a/opensm/include/opensm/osm_base.h > +++ b/opensm/include/opensm/osm_base.h > @@ -213,6 +213,27 @@ BEGIN_C_DECLS > #define OSM_DEFAULT_LOG_FILE "/var/log/opensm.log" > #endif > /***********/ > + > +/****d* OpenSM: Base/OSM_DEFAULT_CONFIG_FILE > +* NAME > +* OSM_DEFAULT_CONFIG_FILE > +* > +* DESCRIPTION > +* Specifies the default OpenSM config file name > +* > +* SYNOPSIS > +*/ > +#ifdef __WIN__ > +#define OSM_DEFAULT_CONFIG_FILE strcat(GetOsmCachePath(), "opensm.conf") > +#elif defined(HAVE_DEFAULT_OPENSM_CONFIG_FILE) > +#define OSM_DEFAULT_CONFIG_FILE HAVE_DEFAULT_OPENSM_CONFIG_FILE > +#elif define (OPENSM_CONFIG_DIR) "define" should be "defined"? (w/ 'd'). > +#define OSM_DEFAULT_OPENSM_CONFIG_FILE OPENSM_COFNIG_DIR "/opensm.conf" typo COFNIG -> CONFIG > +#else > +#define OSM_DEFAULT_OPENSM_CONFIG_FILE "/etc/opensm/opensm.conf" > +#endif /* __WIN__ */ > +/***********/ > + > /****d* OpenSM: Base/OSM_DEFAULT_PARTITION_CONFIG_FILE > * NAME > * OSM_DEFAULT_PARTITION_CONFIG_FILE > diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c > index e39037d..0576dcc 100644 > --- a/opensm/opensm/main.c > +++ b/opensm/opensm/main.c > @@ -133,8 +133,7 @@ static void show_usage(void) > printf("-F , --config \n" > " The name of the OpenSM config file. It has a same format\n" > " as opensm.opts option cache file. When not specified\n" > - " $OSM_CACHE_DIR/opensm.opts (or /var/cache/opensm/opensm.opts)\n" > - " will be used (if exists).\n\n"); > + " " OSM_DEFAULT_CONFIG_FILE " will be used (if exists).\n\n"); > printf("-c\n" > "--cache-options\n" > " Cache the given command line options into the file\n" > @@ -594,8 +593,6 @@ int main(int argc, char *argv[]) > { > osm_opensm_t osm; > osm_subn_opt_t opt; > - char conf_file[256]; > - char *cache_dir; > ib_net64_t sm_key = 0; > ib_api_status_t status; > uint32_t temp, dbg_lvl; > @@ -684,13 +681,7 @@ int main(int argc, char *argv[]) > > osm_subn_set_default_opt(&opt); > > - /* try to open the options file from the cache dir */ > - cache_dir = getenv("OSM_CACHE_DIR"); > - if (!cache_dir || !(*cache_dir)) > - cache_dir = OSM_DEFAULT_CACHE_DIR; > - snprintf(conf_file, sizeof(conf_file), "%s/opensm.opts", cache_dir); > - > - if (osm_subn_parse_conf_file(conf_file, &opt) < 0) > + if (osm_subn_parse_conf_file(OSM_DEFAULT_CONFIG_FILE, &opt) < 0) > printf("\nosm_subn_parse_conf_file failed!\n"); > > printf("Command Line Arguments:\n"); > @@ -1040,9 +1031,15 @@ int main(int argc, char *argv[]) > if (opt.guid == 0 || cl_hton64(opt.guid) == CL_HTON64(INVALID_GUID)) > opt.guid = get_port_guid(&osm, opt.guid); > > - if (cache_options == TRUE > - && osm_subn_write_conf_file(conf_file, &opt)) > - printf("\nosm_subn_write_conf_file failed!\n"); > + if (cache_options == TRUE) { > + char conf_file[256]; > + char *cache_dir = getenv("OSM_CACHE_DIR"); > + if (!cache_dir || !(*cache_dir)) > + cache_dir = OSM_DEFAULT_CACHE_DIR; > + snprintf(conf_file, sizeof(conf_file), "%s/opensm.opts", cache_dir); > + if (osm_subn_write_conf_file(conf_file, &opt)) > + printf("\nosm_subn_write_conf_file failed!\n"); > + } > > status = osm_opensm_bind(&osm, opt.guid); > if (status != IB_SUCCESS) { -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From chu11 at llnl.gov Tue Apr 8 16:41:29 2008 From: chu11 at llnl.gov (Al Chu) Date: Tue, 08 Apr 2008 16:41:29 -0700 Subject: [ofa-general] Re: [PATCH 4/4] opensm: use OSM_DEFAULT_CONFIG_FILE as config file In-Reply-To: <1207697708.7695.47.camel@cardanus.llnl.gov> References: <1207703425-19039-1-git-send-email-sashak@voltaire.com> <1207703425-19039-5-git-send-email-sashak@voltaire.com> <1207697708.7695.47.camel@cardanus.llnl.gov> Message-ID: <1207698089.7695.49.camel@cardanus.llnl.gov> On Tue, 2008-04-08 at 16:35 -0700, Al Chu wrote: > Hey Sasha, > > Just saw two typos, inlined below. And noticed maybe one more below ... Al > Al > > On Wed, 2008-04-09 at 01:10 +0000, Sasha Khapyorsky wrote: > > Use configurable OSM_DEFAULT_CONFIG_FILE as default (when '-F' option is > > not specified) OpenSM config file. Default value is > > $sysconfdir/opensm/opensm.conf. > > > > Signed-off-by: Sasha Khapyorsky > > --- > > opensm/configure.in | 20 ++++++++++++++++++++ > > opensm/include/opensm/osm_base.h | 21 +++++++++++++++++++++ > > opensm/opensm/main.c | 25 +++++++++++-------------- > > 3 files changed, 52 insertions(+), 14 deletions(-) > > > > diff --git a/opensm/configure.in b/opensm/configure.in > > index a527c91..858eb60 100644 > > --- a/opensm/configure.in > > +++ b/opensm/configure.in > > @@ -106,6 +106,26 @@ AC_DEFINE_UNQUOTED(OPENSM_CONFIG_DIR, > > [Define OpenSM config directory]) > > AC_SUBST(OPENSM_CONFIG_DIR) > > > > +dnl Check for a different default OpenSm config file > > +OPENSM_CONFIG_FILE=opensm.conf > > +AC_MSG_CHECKING(for --with-opensm-conf-file ) > > +AC_ARG_WITH(opensm-conf-file, > > + AC_HELP_STRING([--with-opensm-conf-file=file], > > + [define a default OpenSM config file (default opensm.conf)]), > > + [ case "$withval" in > > + no) > > + ;; > > + *) > > + OPENSM_CONFIG_FILE=$withval > > + ;; > > + esac ] > > +) > > +AC_MSG_RESULT(${OPENSM_CONFIG_FILE}) > > +AC_DEFINE_UNQUOTED(HAVE_DEFAULT_OPENSM_CONFIG_FILE, > > + ["$CONF_DIR/$OPENSM_CONFIG_FILE"], > > + [Define a default OpenSM config file]) > > +AC_SUBST(OPENSM_CONFIG_FILE) > > + > > dnl Check for a different default node name map file > > NODENAMEMAPFILE=ib-node-name-map > > AC_MSG_CHECKING(for --with-node-name-map ) > > diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h > > index 62d472e..1bd993e 100644 > > --- a/opensm/include/opensm/osm_base.h > > +++ b/opensm/include/opensm/osm_base.h > > @@ -213,6 +213,27 @@ BEGIN_C_DECLS > > #define OSM_DEFAULT_LOG_FILE "/var/log/opensm.log" > > #endif > > /***********/ > > + > > +/****d* OpenSM: Base/OSM_DEFAULT_CONFIG_FILE > > +* NAME > > +* OSM_DEFAULT_CONFIG_FILE > > +* > > +* DESCRIPTION > > +* Specifies the default OpenSM config file name > > +* > > +* SYNOPSIS > > +*/ > > +#ifdef __WIN__ > > +#define OSM_DEFAULT_CONFIG_FILE strcat(GetOsmCachePath(), "opensm.conf") > > +#elif defined(HAVE_DEFAULT_OPENSM_CONFIG_FILE) > > +#define OSM_DEFAULT_CONFIG_FILE HAVE_DEFAULT_OPENSM_CONFIG_FILE > > +#elif define (OPENSM_CONFIG_DIR) > > "define" should be "defined"? (w/ 'd'). > > > +#define OSM_DEFAULT_OPENSM_CONFIG_FILE OPENSM_COFNIG_DIR "/opensm.conf" > > typo COFNIG -> CONFIG > > > +#else > > +#define OSM_DEFAULT_OPENSM_CONFIG_FILE "/etc/opensm/opensm.conf" OSM_DEFAULT_OPENSM_CONFIG_FILE should be OSM_DEFAULT_CONFIG_FILE? > > +#endif /* __WIN__ */ > > +/***********/ > > + > > /****d* OpenSM: Base/OSM_DEFAULT_PARTITION_CONFIG_FILE > > * NAME > > * OSM_DEFAULT_PARTITION_CONFIG_FILE > > diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c > > index e39037d..0576dcc 100644 > > --- a/opensm/opensm/main.c > > +++ b/opensm/opensm/main.c > > @@ -133,8 +133,7 @@ static void show_usage(void) > > printf("-F , --config \n" > > " The name of the OpenSM config file. It has a same format\n" > > " as opensm.opts option cache file. When not specified\n" > > - " $OSM_CACHE_DIR/opensm.opts (or /var/cache/opensm/opensm.opts)\n" > > - " will be used (if exists).\n\n"); > > + " " OSM_DEFAULT_CONFIG_FILE " will be used (if exists).\n\n"); > > printf("-c\n" > > "--cache-options\n" > > " Cache the given command line options into the file\n" > > @@ -594,8 +593,6 @@ int main(int argc, char *argv[]) > > { > > osm_opensm_t osm; > > osm_subn_opt_t opt; > > - char conf_file[256]; > > - char *cache_dir; > > ib_net64_t sm_key = 0; > > ib_api_status_t status; > > uint32_t temp, dbg_lvl; > > @@ -684,13 +681,7 @@ int main(int argc, char *argv[]) > > > > osm_subn_set_default_opt(&opt); > > > > - /* try to open the options file from the cache dir */ > > - cache_dir = getenv("OSM_CACHE_DIR"); > > - if (!cache_dir || !(*cache_dir)) > > - cache_dir = OSM_DEFAULT_CACHE_DIR; > > - snprintf(conf_file, sizeof(conf_file), "%s/opensm.opts", cache_dir); > > - > > - if (osm_subn_parse_conf_file(conf_file, &opt) < 0) > > + if (osm_subn_parse_conf_file(OSM_DEFAULT_CONFIG_FILE, &opt) < 0) > > printf("\nosm_subn_parse_conf_file failed!\n"); > > > > printf("Command Line Arguments:\n"); > > @@ -1040,9 +1031,15 @@ int main(int argc, char *argv[]) > > if (opt.guid == 0 || cl_hton64(opt.guid) == CL_HTON64(INVALID_GUID)) > > opt.guid = get_port_guid(&osm, opt.guid); > > > > - if (cache_options == TRUE > > - && osm_subn_write_conf_file(conf_file, &opt)) > > - printf("\nosm_subn_write_conf_file failed!\n"); > > + if (cache_options == TRUE) { > > + char conf_file[256]; > > + char *cache_dir = getenv("OSM_CACHE_DIR"); > > + if (!cache_dir || !(*cache_dir)) > > + cache_dir = OSM_DEFAULT_CACHE_DIR; > > + snprintf(conf_file, sizeof(conf_file), "%s/opensm.opts", cache_dir); > > + if (osm_subn_write_conf_file(conf_file, &opt)) > > + printf("\nosm_subn_write_conf_file failed!\n"); > > + } > > > > status = osm_opensm_bind(&osm, opt.guid); > > if (status != IB_SUCCESS) { -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From weiny2 at llnl.gov Tue Apr 8 16:48:44 2008 From: weiny2 at llnl.gov (weiny2 at llnl.gov) Date: Tue, 8 Apr 2008 16:48:44 -0700 (PDT) Subject: [ofa-general] [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ and Leaf HOQ input values Message-ID: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> As per Hal's comments change the alternate value for [leaf] HOQ to be "infinity" when the user specifies a value larger than "infinity". Ira -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-opensm-opensm-osm_subnet.c-add-checks-for-HOQ-and-L.patch Type: / Size: 2306 bytes Desc: not available URL: From arlin.r.davis at intel.com Tue Apr 8 16:51:34 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Tue, 8 Apr 2008 16:51:34 -0700 Subject: [ofa-general] [PATCH][v2] dtest: add private data validation with connect and accept. Message-ID: Adding private data validation with connect and accept. Also provide code, with build option, to validate private data With consumer reject. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- test/dtest/dtest.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++---- 1 files changed, 81 insertions(+), 7 deletions(-) diff --git a/test/dtest/dtest.c b/test/dtest/dtest.c index fa3b9a8..9c8ec71 100755 --- a/test/dtest/dtest.c +++ b/test/dtest/dtest.c @@ -559,8 +559,7 @@ complete: /* close the device */ LOGPRINTF("%d Closing Interface Adaptor\n",getpid()); start = get_time(); - //ret = dat_ia_close( h_ia, DAT_CLOSE_ABRUPT_FLAG ); - ret = dat_ia_close( h_ia, DAT_CLOSE_GRACEFUL_FLAG ); + ret = dat_ia_close( h_ia, DAT_CLOSE_ABRUPT_FLAG ); stop = get_time(); time.close += ((stop - start)*1.0e6); if(ret != DAT_SUCCESS) { @@ -730,7 +729,6 @@ send_msg( void *data, return DAT_SUCCESS; } - DAT_RETURN connect_ep( char *hostname, DAT_CONN_QUAL conn_id ) { @@ -743,6 +741,9 @@ connect_ep( char *hostname, DAT_CONN_QUAL conn_id ) DAT_RMR_TRIPLET r_iov; DAT_DTO_COOKIE cookie; int i; + unsigned char *buf; + DAT_CR_PARAM cr_param = { 0 }; + unsigned char pdata[48] = { 0 }; /* Register send message buffer */ LOGPRINTF("%d Registering send Message Buffer %p, len %d\n", @@ -867,17 +868,45 @@ connect_ep( char *hostname, DAT_CONN_QUAL conn_id ) getpid(),DT_EventToSTr(event.event_number)); return( DAT_ABORT ); } - + /* use to test rdma_cma timeout logic */ #if defined(_WIN32) || defined(_WIN64) if (delay) Sleep(delay*1000); #else if (delay) sleep(delay); #endif + /* accept connect request from client */ h_cr = event.event_data.cr_arrival_event_data.cr_handle; LOGPRINTF("%d Accepting connect request from client\n",getpid()); - ret = dat_cr_accept( h_cr, h_ep, 0, (DAT_PVOID)0 ); + + /* private data - check and send it back */ + dat_cr_query( h_cr, DAT_CSP_FIELD_ALL, &cr_param); + + buf = (unsigned char*)cr_param.private_data; + LOGPRINTF("%d CONN REQUEST Private Data %p[0]=%d [47]=%d\n", + getpid(),buf,buf[0],buf[47]); + for (i=0;i<48;i++) { + if (buf[i] != i+1) { + fprintf(stderr, "%d Error with CONNECT REQUEST" + " private data: %p[%d]=%d s/be %d\n", + getpid(), buf, i, buf[i], i+1); + dat_cr_reject(h_cr, 0, NULL); + return(DAT_ABORT); + } + buf[i]++; /* change for trip back */ + } + +#ifdef TEST_REJECT_WITH_PRIVATE_DATA + printf("%d REJECT request with 48 bytes of private data\n", getpid()); + ret = dat_cr_reject(h_cr, 48, cr_param.private_data); + printf("\n%d: DAPL Test Complete. %s\n\n", + getpid(), ret?"FAILED":"PASSED"); + exit(0); +#endif + + ret = dat_cr_accept(h_cr, h_ep, 48, cr_param.private_data); + if(ret != DAT_SUCCESS) { fprintf(stderr, "%d Error dat_cr_accept: %s\n", getpid(),DT_RetToString(ret)); @@ -911,13 +940,16 @@ connect_ep( char *hostname, DAT_CONN_QUAL conn_id ) remote_addr = *((DAT_IA_ADDRESS_PTR)target->ai_addr); freeaddrinfo(target); + for (i=0;i<48;i++) /* simple pattern in private data */ + pdata[i]=i+1; + LOGPRINTF("%d Connecting to server\n",getpid()); ret = dat_ep_connect( h_ep, &remote_addr, conn_id, CONN_TIMEOUT, - 0, - (DAT_PVOID)0, + 48, + (DAT_PVOID)pdata, 0, DAT_CONNECT_DEFAULT_FLAG ); if(ret != DAT_SUCCESS) { @@ -940,11 +972,53 @@ connect_ep( char *hostname, DAT_CONN_QUAL conn_id ) else LOGPRINTF("%d dat_evd_wait for h_conn_evd completed\n", getpid()); +#ifdef TEST_REJECT_WITH_PRIVATE_DATA + if (event.event_number != DAT_CONNECTION_EVENT_PEER_REJECTED) { + fprintf(stderr, "%d expected conn reject event : %s\n", + getpid(),DT_EventToSTr(event.event_number)); + return( DAT_ABORT ); + } + /* get the reject private data and validate */ + buf = (unsigned char*)event.event_data.connect_event_data.private_data; + printf("%d Received REJECT with private data %p[0]=%d [47]=%d\n", + getpid(),buf,buf[0],buf[47]); + for (i=0;i<48;i++) { + if (buf[i] != i+2) { + fprintf(stderr, "%d client: Error with REJECT event" + " private data: %p[%d]=%d s/be %d\n", + getpid(), buf, i, buf[i], i+2); + dat_ep_disconnect( h_ep, DAT_CLOSE_ABRUPT_FLAG); + return(DAT_ABORT); + } + } + printf("\n%d: DAPL Test Complete. PASSED\n\n", getpid()); + exit(0); +#endif + if ( event.event_number != DAT_CONNECTION_EVENT_ESTABLISHED ) { fprintf(stderr, "%d Error unexpected conn event : %s\n", getpid(),DT_EventToSTr(event.event_number)); return( DAT_ABORT ); } + + /* check private data back from server */ + if (!server) { + buf = (unsigned char*)event.event_data.connect_event_data.private_data; + LOGPRINTF("%d CONN Private Data %p[0]=%d [47]=%d\n", + getpid(),buf,buf[0],buf[47]); + for (i=0;i<48;i++) { + if (buf[i] != i+2) { + fprintf(stderr, "%d Error with CONNECT event" + " private data: %p[%d]=%d s/be %d\n", + getpid(), buf, i, buf[i], i+2); + dat_ep_disconnect(h_ep, DAT_CLOSE_ABRUPT_FLAG); + LOGPRINTF("%d waiting for disconnect event...\n", getpid()); + dat_evd_wait(h_conn_evd, DAT_TIMEOUT_INFINITE, 1, &event, &nmore); + return(DAT_ABORT); + } + } + } + printf("\n%d CONNECTED!\n\n",getpid()); connected = 1; -- 1.5.2.5 From arlin.r.davis at intel.com Tue Apr 8 16:51:27 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Tue, 8 Apr 2008 16:51:27 -0700 Subject: [ofa-general] [PATCH][v2] dapl: add hooks in evd connection callback code to deliver private data with consumer reject. Message-ID: <001301c899d3$76305f90$14fd070a@amr.corp.intel.com> PEER rejects can include private data. The common code didn't support delivery via the connect event data structure. Add the necessary hooks in dapl_evd_connection_callback function and include checks in openib_cma provider to check and deliver properly. Also, fix the private data size check in dapls_ib_reject_connection function. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/common/dapl_evd_connection_callb.c | 22 ++++++++++++++++++++-- dapl/openib_cma/dapl_ib_cm.c | 16 ++++++++++------ 2 files changed, 30 insertions(+), 8 deletions(-) diff --git a/dapl/common/dapl_evd_connection_callb.c b/dapl/common/dapl_evd_connection_callb.c index d3a39a6..7f994b0 100644 --- a/dapl/common/dapl_evd_connection_callb.c +++ b/dapl/common/dapl_evd_connection_callb.c @@ -164,8 +164,26 @@ dapl_evd_connection_callback ( break; } - case DAT_CONNECTION_EVENT_DISCONNECTED: case DAT_CONNECTION_EVENT_PEER_REJECTED: + { + /* peer reject may include private data */ + if (prd_ptr != NULL) + private_data_size = + dapls_ib_private_data_size( + prd_ptr, DAPL_PDATA_CONN_REJ, + ep_ptr->header.owner_ia->hca_ptr); + + if (private_data_size > 0) + dapl_os_memcpy (ep_ptr->private.private_data, + prd_ptr->private_data, + DAPL_MIN (private_data_size, + DAPL_MAX_PRIVATE_DATA_SIZE)); + + dapl_dbg_log(DAPL_DBG_TYPE_CM | DAPL_DBG_TYPE_CALLBACK, + "dapl_evd_connection_callback PEER REJ pd=%p sz=%d\n", + prd_ptr, private_data_size); + } + case DAT_CONNECTION_EVENT_DISCONNECTED: case DAT_CONNECTION_EVENT_UNREACHABLE: case DAT_CONNECTION_EVENT_NON_PEER_REJECTED: { @@ -205,7 +223,7 @@ dapl_evd_connection_callback ( evd_ptr, dat_event_num, (DAT_HANDLE) ep_ptr, - private_data_size, /* 0 except for CONNECTED */ + private_data_size, /* CONNECTED or REJECT */ ep_ptr->private.private_data ); if (dat_status != DAT_SUCCESS && diff --git a/dapl/openib_cma/dapl_ib_cm.c b/dapl/openib_cma/dapl_ib_cm.c index 9b2062b..d3835b3 100755 --- a/dapl/openib_cma/dapl_ib_cm.c +++ b/dapl/openib_cma/dapl_ib_cm.c @@ -336,6 +336,7 @@ static void dapli_cm_active_cb(struct dapl_cm_id *conn, case RDMA_CM_EVENT_REJECTED: { ib_cm_events_t cm_event; + unsigned char *pdata = NULL; dapl_dbg_log( DAPL_DBG_TYPE_CM, @@ -344,9 +345,11 @@ static void dapli_cm_active_cb(struct dapl_cm_id *conn, /* valid REJ from consumer will always contain private data */ if (event->status == 28 && - event->param.conn.private_data_len) + event->param.conn.private_data_len) { cm_event = IB_CME_DESTINATION_REJECT_PRIVATE_DATA; - else { + pdata = (unsigned char*)event->param.conn.private_data + + sizeof(struct dapl_pdata_hdr); + } else { cm_event = IB_CME_DESTINATION_REJECT; dapl_log(DAPL_DBG_TYPE_WARN, "dapl_cma_active: non-consumer REJ," @@ -357,7 +360,7 @@ static void dapli_cm_active_cb(struct dapl_cm_id *conn, ntohs(((struct sockaddr_in *) &conn->cm_id->route.addr.dst_addr)->sin_port)); } - dapl_evd_connection_callback(conn, cm_event, NULL, conn->ep); + dapl_evd_connection_callback(conn, cm_event, pdata, conn->ep); break; } case RDMA_CM_EVENT_ESTABLISHED: @@ -910,8 +913,9 @@ dapls_ib_reject_connection( }; dapl_dbg_log(DAPL_DBG_TYPE_CM, - " reject: cm_handle %p reason %x, ver=0x%x \n", - cm_handle, reason, ntohl(pdata_hdr.version)); + " reject: handle %p reason %x, ver=%x, data %p, sz=%d\n", + cm_handle, reason, ntohl(pdata_hdr.version), + private_data, private_data_size); if (cm_handle == IB_INVALID_HANDLE) { dapl_dbg_log(DAPL_DBG_TYPE_ERR, @@ -922,7 +926,7 @@ dapls_ib_reject_connection( if (private_data_size > dapls_ib_private_data_size( - NULL, IB_MAX_REJ_PDATA_SIZE, cm_handle->hca)) + NULL, DAPL_PDATA_CONN_REJ, cm_handle->hca)) return DAT_ERROR(DAT_INVALID_PARAMETER, DAT_INVALID_ARG3); /* setup pdata_hdr and users data, in CR pdata buffer */ -- 1.5.2.5 From info at fmf.gov.ng Tue Apr 8 16:38:59 2008 From: info at fmf.gov.ng (DR USMAN SHAMSUDEEN) Date: Wed, 9 Apr 2008 01:38:59 +0200 (CEST) Subject: [ofa-general] NOTIFICATION OF PAYMENT VIA ATM CARD Message-ID: <20080409001328.5C09D197AB78@bravo582.server4you.de> >From the FMF, Federal Ministry of Finance No 12 edidi lane Idumota Lagos Honourable Minister of Finance, DR USMAN SHAMSUDEEN Approved by the Nigerian Government http://www.fmf.gov.ng/portal/detail.php?link=hmf NOTIFICATION OF PAYMENT VIA ATM CARD This is to officially inform you that we have verified your contract file presently on my desk, and I found out that you have not received your payment due to your lack of co-operation and not fulfilling the obligations giving to you in respect to your contract payment. Secondly, you are hereby advised to stop dealing with some non-officials in the bank as this is an illegal act and will have to stop if you so wish to receive your payment immediately. After the Board of director's meeting held in Abuja, we have resolved in finding a solution to your problem. We have arranged your payment through our SWIFT CARD PAYMENT CENTRE in Europe, America,Africa and Asia Pacific,This is part of an instruction/mandate passed by the Senate in respect to overseas contract payment and debt re scheduling. And also the Nigerian Government is using this mean to rewards all the citizens of the United states and all part of europe including asia,australia,south america, Antartica e.t.c and all those who have lost their funds in either scam, or an uncompleted business, or otherwise. You should know that if you are interested to receive your ATM card which will be credited with $920,000 united states dollars before it is been sent to you direct to your doorstep through any courier service of your choice. Kindly get back to me with the following informations below so i can start arrangement on how to get your Atm Card shipped to you (1) Your Full Name (2) Full residential address (3) Phone number This message is supported by the Nigerian Government, After you might have started making use of your ATM card, you can reward my firm one way or the other you knows best. Thanks for your co-operation. >From the FMF, Federal Ministry of Finance DR USMAN SHAMSUDEEN From tom at opengridcomputing.com Tue Apr 8 19:11:03 2008 From: tom at opengridcomputing.com (Tom Tucker) Date: Tue, 08 Apr 2008 21:11:03 -0500 Subject: [ofa-general] Re: [PATCH] AMSO1100: Add check for NULL reply_msg in c2_intr In-Reply-To: References: <1207336240.1363.20.camel@trinity.ogc.int> <1207337563.1363.22.camel@trinity.ogc.int> Message-ID: <1207707063.9447.59.camel@trinity.ogc.int> On Fri, 2008-04-04 at 12:35 -0700, Roland Dreier wrote: > > I'm up to my eyeballs right now. If it's ok with you I'd say defer the > > refactoring. > > No problem, I'll queue this up and if you ever get time to work on > amso1100 you can send the refactoring. > > But are you working on a pmtu fix? Steve and I will noodle on what to do here and post something. > - R. From krkumar2 at in.ibm.com Tue Apr 8 22:32:01 2008 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Wed, 9 Apr 2008 11:02:01 +0530 Subject: [ofa-general] Test programs supporting RNIC's. In-Reply-To: <47FA31C3.5090307@opengridcomputing.com> Message-ID: Hi, I am testing Chelsio cxgb3 RNICS on RHEL5.2 (beta). The following list of applications are installed on the system as part of OFED install option: ib_clock_test ib_read_lat ib_write_bw_postlist ib_rdma_bw ib_send_bw ib_write_lat ib_rdma_lat ib_send_lat ib_read_bw Out of this, only ib_rdma_bw seems to be CMA enabled. Is this the only program that supports RNIC's? Thanks, - KK From sashak at voltaire.com Wed Apr 9 03:01:08 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 9 Apr 2008 10:01:08 +0000 Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ and Leaf HOQ input values In-Reply-To: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> Message-ID: <20080409100108.GB19834@sashak.voltaire.com> Hi Ira, On 16:48 Tue 08 Apr , weiny2 at llnl.gov wrote: > As per Hal's comments change the alternate value for [leaf] HOQ to be > "infinity" when the user specifies a value larger than "infinity". Actually I would prefer original version of the patch. The main reason is that infinite packet life time is really dangerous thing - in case when a fabric is routed with credit loops (very common case with default min-hops routing) it leads to total fabric stuck and not just to some performance degradation. So I think it is safer to reject invalid value and to set the default (log an error, etc.i). As it was done in the original version of the patch. Hal, do you agree? Sasha From sashak at voltaire.com Wed Apr 9 03:09:31 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 9 Apr 2008 10:09:31 +0000 Subject: [ofa-general] Re: [PATCH 4/4] opensm: use OSM_DEFAULT_CONFIG_FILE as config file In-Reply-To: <1207697708.7695.47.camel@cardanus.llnl.gov> References: <1207703425-19039-1-git-send-email-sashak@voltaire.com> <1207703425-19039-5-git-send-email-sashak@voltaire.com> <1207697708.7695.47.camel@cardanus.llnl.gov> Message-ID: <20080409100931.GD19834@sashak.voltaire.com> On 16:35 Tue 08 Apr , Al Chu wrote: > Hey Sasha, > > Just saw two typos, inlined below. Thanks for catching this! I'm going to fix. Sasha From sashak at voltaire.com Wed Apr 9 03:15:26 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 9 Apr 2008 10:15:26 +0000 Subject: [ofa-general] Re: [PATCH 4/4] opensm: use OSM_DEFAULT_CONFIG_FILE as config file In-Reply-To: <1207698089.7695.49.camel@cardanus.llnl.gov> References: <1207703425-19039-1-git-send-email-sashak@voltaire.com> <1207703425-19039-5-git-send-email-sashak@voltaire.com> <1207697708.7695.47.camel@cardanus.llnl.gov> <1207698089.7695.49.camel@cardanus.llnl.gov> Message-ID: <20080409101526.GE19834@sashak.voltaire.com> On 16:41 Tue 08 Apr , Al Chu wrote: > On Tue, 2008-04-08 at 16:35 -0700, Al Chu wrote: > > Hey Sasha, > > > > Just saw two typos, inlined below. > > And noticed maybe one more below ... Sure, it is another one. Thanks for catching this! Sasha From dotanb at dev.mellanox.co.il Wed Apr 9 00:20:17 2008 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Wed, 09 Apr 2008 10:20:17 +0300 Subject: [ofa-general] Test programs supporting RNIC's. In-Reply-To: References: Message-ID: <47FC6E31.8060208@dev.mellanox.co.il> Krishna Kumar2 wrote: > Hi, > > I am testing Chelsio cxgb3 RNICS on RHEL5.2 (beta). The following > list of applications are installed on the system as part of OFED > install option: > > ib_clock_test ib_read_lat ib_write_bw_postlist > ib_rdma_bw ib_send_bw ib_write_lat > ib_rdma_lat ib_send_lat > ib_read_bw > > Out of this, only ib_rdma_bw seems to be CMA enabled. Is this the > only program that supports RNIC's? > Yes. I know that there are planing to add the CMA support to all of the ib_* applications as well (but today, only ib_rdma_bw and ib_rdma_lat support it). Dotan From sashak at voltaire.com Wed Apr 9 03:20:06 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 9 Apr 2008 10:20:06 +0000 Subject: [ofa-general] [PATCH 4/4 v2] opensm: use OSM_DEFAULT_CONFIG_FILE as config file In-Reply-To: <1207703425-19039-5-git-send-email-sashak@voltaire.com> References: <1207703425-19039-1-git-send-email-sashak@voltaire.com> <1207703425-19039-5-git-send-email-sashak@voltaire.com> Message-ID: <20080409102006.GF19834@sashak.voltaire.com> Use configurable OSM_DEFAULT_CONFIG_FILE as default (when '-F' option is not specified) OpenSM config file. Default value is $sysconfdir/opensm/opensm.conf. Signed-off-by: Sasha Khapyorsky --- opensm/configure.in | 20 ++++++++++++++++++++ opensm/include/opensm/osm_base.h | 21 +++++++++++++++++++++ opensm/opensm/main.c | 25 +++++++++++-------------- 3 files changed, 52 insertions(+), 14 deletions(-) diff --git a/opensm/configure.in b/opensm/configure.in index a527c91..858eb60 100644 --- a/opensm/configure.in +++ b/opensm/configure.in @@ -106,6 +106,26 @@ AC_DEFINE_UNQUOTED(OPENSM_CONFIG_DIR, [Define OpenSM config directory]) AC_SUBST(OPENSM_CONFIG_DIR) +dnl Check for a different default OpenSm config file +OPENSM_CONFIG_FILE=opensm.conf +AC_MSG_CHECKING(for --with-opensm-conf-file ) +AC_ARG_WITH(opensm-conf-file, + AC_HELP_STRING([--with-opensm-conf-file=file], + [define a default OpenSM config file (default opensm.conf)]), + [ case "$withval" in + no) + ;; + *) + OPENSM_CONFIG_FILE=$withval + ;; + esac ] +) +AC_MSG_RESULT(${OPENSM_CONFIG_FILE}) +AC_DEFINE_UNQUOTED(HAVE_DEFAULT_OPENSM_CONFIG_FILE, + ["$CONF_DIR/$OPENSM_CONFIG_FILE"], + [Define a default OpenSM config file]) +AC_SUBST(OPENSM_CONFIG_FILE) + dnl Check for a different default node name map file NODENAMEMAPFILE=ib-node-name-map AC_MSG_CHECKING(for --with-node-name-map ) diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h index 62d472e..289e49e 100644 --- a/opensm/include/opensm/osm_base.h +++ b/opensm/include/opensm/osm_base.h @@ -213,6 +213,27 @@ BEGIN_C_DECLS #define OSM_DEFAULT_LOG_FILE "/var/log/opensm.log" #endif /***********/ + +/****d* OpenSM: Base/OSM_DEFAULT_CONFIG_FILE +* NAME +* OSM_DEFAULT_CONFIG_FILE +* +* DESCRIPTION +* Specifies the default OpenSM config file name +* +* SYNOPSIS +*/ +#ifdef __WIN__ +#define OSM_DEFAULT_CONFIG_FILE strcat(GetOsmCachePath(), "opensm.conf") +#elif defined(HAVE_DEFAULT_OPENSM_CONFIG_FILE) +#define OSM_DEFAULT_CONFIG_FILE HAVE_DEFAULT_OPENSM_CONFIG_FILE +#elif defined (OPENSM_CONFIG_DIR) +#define OSM_DEFAULT_CONFIG_FILE OPENSM_CONFIG_DIR "/opensm.conf" +#else +#define OSM_DEFAULT_CONFIG_FILE "/etc/opensm/opensm.conf" +#endif /* __WIN__ */ +/***********/ + /****d* OpenSM: Base/OSM_DEFAULT_PARTITION_CONFIG_FILE * NAME * OSM_DEFAULT_PARTITION_CONFIG_FILE diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index e39037d..0576dcc 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -133,8 +133,7 @@ static void show_usage(void) printf("-F , --config \n" " The name of the OpenSM config file. It has a same format\n" " as opensm.opts option cache file. When not specified\n" - " $OSM_CACHE_DIR/opensm.opts (or /var/cache/opensm/opensm.opts)\n" - " will be used (if exists).\n\n"); + " " OSM_DEFAULT_CONFIG_FILE " will be used (if exists).\n\n"); printf("-c\n" "--cache-options\n" " Cache the given command line options into the file\n" @@ -594,8 +593,6 @@ int main(int argc, char *argv[]) { osm_opensm_t osm; osm_subn_opt_t opt; - char conf_file[256]; - char *cache_dir; ib_net64_t sm_key = 0; ib_api_status_t status; uint32_t temp, dbg_lvl; @@ -684,13 +681,7 @@ int main(int argc, char *argv[]) osm_subn_set_default_opt(&opt); - /* try to open the options file from the cache dir */ - cache_dir = getenv("OSM_CACHE_DIR"); - if (!cache_dir || !(*cache_dir)) - cache_dir = OSM_DEFAULT_CACHE_DIR; - snprintf(conf_file, sizeof(conf_file), "%s/opensm.opts", cache_dir); - - if (osm_subn_parse_conf_file(conf_file, &opt) < 0) + if (osm_subn_parse_conf_file(OSM_DEFAULT_CONFIG_FILE, &opt) < 0) printf("\nosm_subn_parse_conf_file failed!\n"); printf("Command Line Arguments:\n"); @@ -1040,9 +1031,15 @@ int main(int argc, char *argv[]) if (opt.guid == 0 || cl_hton64(opt.guid) == CL_HTON64(INVALID_GUID)) opt.guid = get_port_guid(&osm, opt.guid); - if (cache_options == TRUE - && osm_subn_write_conf_file(conf_file, &opt)) - printf("\nosm_subn_write_conf_file failed!\n"); + if (cache_options == TRUE) { + char conf_file[256]; + char *cache_dir = getenv("OSM_CACHE_DIR"); + if (!cache_dir || !(*cache_dir)) + cache_dir = OSM_DEFAULT_CACHE_DIR; + snprintf(conf_file, sizeof(conf_file), "%s/opensm.opts", cache_dir); + if (osm_subn_write_conf_file(conf_file, &opt)) + printf("\nosm_subn_write_conf_file failed!\n"); + } status = osm_opensm_bind(&osm, opt.guid); if (status != IB_SUCCESS) { -- 1.5.4.1.122.gaa8d From krkumar2 at in.ibm.com Wed Apr 9 00:39:54 2008 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Wed, 9 Apr 2008 13:09:54 +0530 Subject: [ofa-general] Test programs supporting RNIC's. In-Reply-To: <47FC6E31.8060208@dev.mellanox.co.il> Message-ID: Hi Dotan, > > ib_clock_test ib_read_lat ib_write_bw_postlist > > ib_rdma_bw ib_send_bw ib_write_lat > > ib_rdma_lat ib_send_lat > > ib_read_bw > > > > Out of this, only ib_rdma_bw seems to be CMA enabled. Is this the > > only program that supports RNIC's? > > > Yes. > > I know that there are planing to add the CMA support to all of the ib_* > applications as well > (but today, only ib_rdma_bw and ib_rdma_lat support it). Yes, I had forgotten ib_rdma_lat on the list of CMA enabled apps. But somehow it didn't work for me. I need to reboot to the distro OS and locate the error, will post it later. Thanks, - KK From holt at sgi.com Wed Apr 9 06:17:09 2008 From: holt at sgi.com (Robin Holt) Date: Wed, 9 Apr 2008 08:17:09 -0500 Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12 In-Reply-To: References: Message-ID: <20080409131709.GR11364@sgi.com> I applied this patch set with the xpmem version I am working up for submission and the basic level-1 and level-2 tests passed. The full mpi regression test still tends to hang, but that appears to be a common problem failure affecting either emm or mmu notifiers and therefore, I am certain is a problem in my code. Please note this is not an endorsement of one method over the other, merely that under conditions where we would expect xpmem to pass the regression tests, it does pass those tests. Thanks, Robin On Tue, Apr 08, 2008 at 05:44:03PM +0200, Andrea Arcangeli wrote: > The difference with #v11 is a different implementation of mm_lock that > guarantees handling signals in O(N). It's also more lowlatency friendly. > > Note that mmu_notifier_unregister may also fail with -EINTR if there are > signal pending or the system runs out of vmalloc space or physical memory, > only exit_mmap guarantees that any kernel module can be unloaded in presence > of an oom condition. > > Either #v11 or the first three #v12 1,2,3 patches are suitable for inclusion > in -mm, pick what you prefer looking at the mmu_notifier_register retval and > mm_lock retval difference, I implemented and slighty tested both. GRU and KVM > only needs 1,2,3, XPMEM needs the rest of the patchset too (4, ...) but all > patches from 4 to the end can be deffered to a second merge window. From mairie.nd.riez at free.fr Wed Apr 9 06:21:43 2008 From: mairie.nd.riez at free.fr (EuroSoftware) Date: Wed, 9 Apr 2008 14:21:43 +0100 Subject: [ofa-general] Weniger zahlen fuer perfekte Standardsoftware Message-ID: <01c89a4d$08bfb580$7e349cd5@mairie.nd.riez> MS Office cheap as chipsHier bekommen Sie Ihre Software sofort. Bezahlen und unverzueglich downloaden - so geht es bei uns. Wir haben Programme in allen europaeischen Sprachen, diese sind sowohl fuer Windows als auch fuer Macintosh geeignet. Unsere Programme sind sehr preiswert, aber es handelt sich nur um originale Vollversionen. Bestellen Sie die feinste Software http://deannadonohox445.googlepages.com* Office Enterprise 2007: $79.95 * Adobe Acrobat 8.0 Professional: $69.95 * Adobe Photoshop CS2 with ImageReady CS2: $79.95 * Office System Professional 2003 (5 Cds): $59.95 Kaufen Sie die perfekt funktionierte Software http://deannadonohox445.googlepages.comBei uns kaufen Sie sicher ein, denn unsere kompetenten Mitarbeiten vom Kundencenter werden Ihnen bei der Softwareinstallation weiterhelfen. Wir antworten unverzueglich und Sie bekommen von uns eine Geld-Zurueck-Garantie. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hrosenstock at xsigo.com Wed Apr 9 06:27:09 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Wed, 09 Apr 2008 06:27:09 -0700 Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ and Leaf HOQ input values In-Reply-To: <20080409100108.GB19834@sashak.voltaire.com> References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> <20080409100108.GB19834@sashak.voltaire.com> Message-ID: <1207747629.15625.460.camel@hrosenstock-ws.xsigo.com> On Wed, 2008-04-09 at 10:01 +0000, Sasha Khapyorsky wrote: > Hi Ira, > > On 16:48 Tue 08 Apr , weiny2 at llnl.gov wrote: > > As per Hal's comments change the alternate value for [leaf] HOQ to be > > "infinity" when the user specifies a value larger than "infinity". > > Actually I would prefer original version of the patch. The main reason > is that infinite packet life time is really dangerous thing - in case > when a fabric is routed with credit loops (very common case with default > min-hops routing) it leads to total fabric stuck and not just to some > performance degradation. > > So I think it is safer to reject invalid value and to set the default > (log an error, etc.i). As it was done in the original version of the > patch. > > Hal, do you agree? Safer yes but I think it is less to the intent of the admin who just doesn't understand the max value for this and that's why I proposed this change. My preference is to max it out but it comes down to a judgment call. There's a downside either way. -- Hal > Sasha > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From swise at opengridcomputing.com Wed Apr 9 07:03:35 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 09 Apr 2008 09:03:35 -0500 Subject: [ofa-general] Test programs supporting RNIC's. In-Reply-To: References: Message-ID: <47FCCCB7.2080407@opengridcomputing.com> Krishna Kumar2 wrote: > Hi Dotan, > > >>> ib_clock_test ib_read_lat ib_write_bw_postlist >>> ib_rdma_bw ib_send_bw ib_write_lat >>> ib_rdma_lat ib_send_lat >>> ib_read_bw >>> >>> Out of this, only ib_rdma_bw seems to be CMA enabled. Is this the >>> only program that supports RNIC's? >>> >>> >> Yes. >> >> I know that there are planing to add the CMA support to all of the ib_* >> applications as well >> (but today, only ib_rdma_bw and ib_rdma_lat support it). >> > > Yes, I had forgotten ib_rdma_lat on the list of CMA enabled apps. But > somehow it didn't work for me. I need to reboot to the distro OS and > locate the error, will post it later. > > Krishna, if you are interested, you could add cma support to the rest of these. I can help by answering questions and/or testing things... Steve. From Brian.Murrell at Sun.COM Wed Apr 9 07:07:46 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Wed, 09 Apr 2008 10:07:46 -0400 Subject: [ofa-general] ipath_kernel.h:1115: error: implicit declaration of function 'writeq' on rhel5 Message-ID: <1207750066.3303.28.camel@pc.ilinx> I'm trying to build OFED 1.3's kernel-ib for RHEL5 and getting: gcc -m32 -Wp,-MD,/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath/.ipath_cq.o.d -nostdinc -isystem /usr/lib/gcc/i386-redhat-linux/4.1.1/include -D__KERNEL__ \ -include include/linux/autoconf.h \ -include /cache/build/BUILD/ofa_kernel-1.3/include/linux/autoconf.h \ -I/cache/build/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/ \ \ \ -I/cache/build/BUILD/ofa_kernel-1.3/include \ -I/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/debug \ -I/usr/local/include/scst \ -I/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/ulp/srpt \ -I/cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3 \ -Iinclude \ \ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Wstrict-prototypes -Wundef -Werror-implicit-function-declaration -Os -pipe -msoft-float -fno-builtin-sprintf -fno-builtin-log2 -fno-builtin-puts -mpreferred-stack-boundary=2 -march=i686 -mtune=generic -mtune=generic -mregparm=3 -ffreestanding -Iinclude/asm-i386/mach-generic -Iinclude/asm-i386/mach-default -fomit-frame-pointer -g -fno-stack-protector -Wdeclaration-after-statement -Wno-pointer-sign -DIPATH_IDSTR='"QLogic kernel.org driver"' -DIPATH_KERN_TYPE=0 -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(ipath_cq)" -D"KBUILD_MODNAME=KBUILD_STR(ib_ipath)" -c -o /cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath/.tmp_ipath_cq.o /cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_cq.c In file included from /cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_verbs.h:45, from /cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_cq.c:37: /cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_kernel.h: In function 'ipath_write_ureg': /cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_kernel.h:1115: error: implicit declaration of function 'writeq' /cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_kernel.h: In function 'ipath_read_kreg64': /cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_kernel.h:1132: error: implicit declaration of function 'readq' make[4]: *** [/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_cq.o] Error 1 make[3]: *** [/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath] Error 2 make[2]: *** [/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband] Error 2 make[1]: *** [_module_/cache/build/BUILD/ofa_kernel-1.3] Error 2 The "make kernel" starts out with: + make kernel Building kernel modules Kernel version: 2.6.18-53.1.14.el5_lustre.1.6.4.55.20080409120349smp Modules directory: //lib/modules/2.6.18-53.1.14.el5_lustre.1.6.4.55.20080409120349smp Kernel sources: /cache/build/BUILD/lustre-kernel-2.6.18/lustre/linux env CWD=/cache/build/BUILD/ofa_kernel-1.3 BACKPORT_INCLUDES=-I/cache/build/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/ \ make -C /cache/build/BUILD/lustre-kernel-2.6.18/lustre/linux SUBDIRS="/cache/build/BUILD/ofa_kernel-1.3" \ V=1 \ CONFIG_MEMTRACK= \ CONFIG_DEBUG_INFO=y \ CONFIG_INFINIBAND=m \ So it seems to be correctly identifying the kernel version "2.6.18-53.1.14.el5" as RHEL5 kernel as it does set BACKPORT_INCLUDES=-I/cache/build/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/ Any ideas why there is no readq/writeq being found? Funny enough this same build on x86_64 is successful. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From andrea at qumranet.com Wed Apr 9 07:29:45 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 9 Apr 2008 16:29:45 +0200 Subject: [ofa-general] Re: [patch 02/10] emm: notifier logic In-Reply-To: References: <20080404223048.374852899@sgi.com> <20080404223131.469710551@sgi.com> <20080405005759.GH14784@duo.random> <20080407060602.GE9309@duo.random> <20080407071330.GH9309@duo.random> Message-ID: <20080409142945.GS10133@duo.random> On Tue, Apr 08, 2008 at 01:23:33PM -0700, Christoph Lameter wrote: > It may also be useful to allow invalidate_start() to fail in some contexts > (try_to_unmap f.e., maybe if a certain flag is passed). This may allow the > device to get out of tight situations (pending I/O f.e. or time out if > there is no response for network communications). But then that > complicates the API. That also complicates the fact that there can't be a spte mapped and a pte not mapped or the spte would leak unswappable memory, so a failure should re-establish the pte and undo the ptep_clear_flush or equivalent... I think we can change the API later if needed. This is an internal-only API invisible to userland so it can change and break anytime to make the whole kernel faster and better (ask Greg for kernel internal APIs). One important detail is that because the secondary mmu page fault can happen concurrently against invaldiate_page (there wasn't a range_begin to block it), the secondary mmu page fault must ensure that the pte is still established, before establishing the spte (with proper locking that will block a concurrent invalidate_page). Having a range_begin before the ptep_clear_flush effectively make lifes a bit easier but it's not needed as those are locking issues that the driver can solve (unlike range_begin being missed, now fixed by mm_lock) and this allows for higher performance both when the lock is armed and disarmed. I'm going to solve all the locking for kvm with spinlocks and/or seqlocks to avoid any dependency on the patches that makes the mmu notifier sleep capable. From andrea at qumranet.com Wed Apr 9 07:44:01 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 9 Apr 2008 16:44:01 +0200 Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12 In-Reply-To: <20080409131709.GR11364@sgi.com> References: <20080409131709.GR11364@sgi.com> Message-ID: <20080409144401.GT10133@duo.random> On Wed, Apr 09, 2008 at 08:17:09AM -0500, Robin Holt wrote: > I applied this patch set with the xpmem version I am working up for > submission and the basic level-1 and level-2 tests passed. The full mpi > regression test still tends to hang, but that appears to be a common > problem failure affecting either emm or mmu notifiers and therefore, I > am certain is a problem in my code. > > Please note this is not an endorsement of one method over the other, > merely that under conditions where we would expect xpmem to pass the > regression tests, it does pass those tests. Thanks a lot for testing! #v12 works great with KVM too. (I'm now in the process of chagning the KVM patch to drop the page pinning) BTW, how did you implement invalidate_page? As this? invalidate_page() { invalidate_range_begin() invalidate_range_end() } If yes, I prefer to remind you that normally invalidate_range_begin is always called before zapping the pte. In the invalidate_page case instead, invalidate_range_begin is called _after_ the pte has been zapped already. Now there's no problem if the pte is established and the spte isn't established. But it must never happen that the spte is established and the pte isn't established (with page-pinning that means unswappable memlock leak, without page-pinning it would mean memory corruption). So the range_begin must serialize against the secondary mmu page fault so that it can't establish the spte on a pte that was zapped by the rmap code after get_user_pages/follow_page returned. I think your range_begin already does that so you should be ok but I wanted to remind about this slight difference in implementing invalidate_page as I suggested above in previous email just to be sure ;). This is the race you must guard against in invalidate_page: CPU0 CPU1 try_to_unmap on page secondary mmu page fault get_user_pages()/follow_page found a page ptep_clear_flush invalidate_page() invalidate_range_begin() invalidate_range_end() return from invalidate_page establish spte on page return from secodnary mmu page fault If your range_begin already serializes in a hard way against the secondary mmu page fault, my previously "trivial" suggested implementation for invalidate_page should work just fine and this this saves 1 branch for each try_to_unmap_one if compared to the emm implementation. The branch check is inlined and it checks against the mmu_notifier_head that is the hot cacheline, no new cachline is checked just one branch is saved and so it worth it IMHO even if it doesn't provide any other advantage if you implement it the way above. From huanwei at cse.ohio-state.edu Wed Apr 9 08:18:19 2008 From: huanwei at cse.ohio-state.edu (wei huang) Date: Wed, 9 Apr 2008 11:18:19 -0400 (EDT) Subject: [ofa-general] MVAPICH2 crashes on mixed fabric In-Reply-To: Message-ID: Hi Mike, Is the arbel based DDR cards? If so, try put: -env MV2_DEFAULT_MTU IBV_MTU_2048 in addition to the environmental variables you are using. Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Tue, 8 Apr 2008, Mike Heinz wrote: > Wei, > > No joy. The following command: > > + /usr/mpi/pgi/mvapich2-1.0.2/bin/mpiexec -1 -machinefile > /home/mheinz/mvapich2-pgi/mpi_hosts -n 4 -env MV2_USE_COALESCE 0 -env > MV2_VBUF_TOTAL_SIZE 9216 PMB2.2.1/SRC_PMB/PMB-MPI1 > > Produced the following error: > > [0] Abort: Got FATAL event 3 > at line 796 in file ibv_channel_manager.c > rank 0 in job 48 compute-0-3.local_33082 caused collective abort of > all ranks > exit status of rank 0: killed by signal 9 > + set +x > > Note that compute-0-3 has a connect-x HCA. > > If I restrict the ring to only nodes with connect-x the problem does not > occur. > > This isn't a huge problem for me; this 4-node cluster is actually for > testing the creation of Rocks Rolls and I can simply record it as a > known limitation when using mvapich2 - but it could impact users in the > field if a cluster gets extended with newer HCAs. > > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > -----Original Message----- > From: wei huang [mailto:huanwei at cse.ohio-state.edu] > Sent: Sunday, April 06, 2008 8:58 PM > To: Mike Heinz > Cc: general at lists.openfabrics.org > Subject: Re: [ofa-general] MVAPICH2 crashes on mixed fabric > > Hi Mike, > > Currently mvapich2 will detect different HCA type and thus select > different parameters for communication, which may cause the problem. We > are working on this feature and it will be available in our next > release. > For now, if you want to run on this setup, please set few environmental > variables like: > > mpiexec -n 2 -env MV2_USE_COALESCE 0 -env MV2_VBUF_TOTAL_SIZE 9216 > ./a.out > > Please let us know if this works. Thanks. > > Regards, > Wei Huang > > 774 Dreese Lab, 2015 Neil Ave, > Dept. of Computer Science and Engineering Ohio State University OH 43210 > Tel: (614)292-8501 > > > On Fri, 4 Apr 2008, Mike Heinz wrote: > > > Hey, all, I'm not sure if this is a known bug or some sort of > > limitation I'm unaware of, but I've been building and testing with the > > > OFED 1.3 GA release on a small fabric that has a mix of Arbel-based > > and newer Connect-X HCAs. > > > > What I've discovered is that mvapich and openmpi work fine across the > > entire fabric, but mvapich2 crashes when I use a mix of Arbels and > > Connect-X. The errors vary depending on the test program but here's an > > example: > > > > [mheinz at compute-0-0 IMB-3.0]$ mpirun -n 5 ./IMB-MPI1 . > > . > > . > > (output snipped) > > . > > . > > . > > > > #--------------------------------------------------------------------- > > -- > > ------ > > # Benchmarking Sendrecv > > # #processes = 2 > > # ( 3 additional processes waiting in MPI_Barrier) > > #--------------------------------------------------------------------- > > -- > > ------ > > #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] > > Mbytes/sec > > 0 1000 3.51 3.51 3.51 > > 0.00 > > 1 1000 3.63 3.63 3.63 > > 0.52 > > 2 1000 3.67 3.67 3.67 > > 1.04 > > 4 1000 3.64 3.64 3.64 > > 2.09 > > 8 1000 3.67 3.67 3.67 > > 4.16 > > 16 1000 3.67 3.67 3.67 > > 8.31 > > 32 1000 3.74 3.74 3.74 > > 16.32 > > 64 1000 3.90 3.90 3.90 > > 31.28 > > 128 1000 4.75 4.75 4.75 > > 51.39 > > 256 1000 5.21 5.21 5.21 > > 93.79 > > 512 1000 5.96 5.96 5.96 > > 163.77 > > 1024 1000 7.88 7.89 7.89 > > 247.54 > > 2048 1000 11.42 11.42 11.42 > > 342.00 > > 4096 1000 15.33 15.33 15.33 > > 509.49 > > 8192 1000 22.19 22.20 22.20 > > 703.83 > > 16384 1000 34.57 34.57 34.57 > > 903.88 > > 32768 1000 51.32 51.32 51.32 > > 1217.94 > > 65536 640 85.80 85.81 85.80 > > 1456.74 > > 131072 320 155.23 155.24 155.24 > > 1610.40 > > 262144 160 301.84 301.86 301.85 > > 1656.39 > > 524288 80 598.62 598.69 598.66 > > 1670.31 > > 1048576 40 1175.22 1175.30 1175.26 > > 1701.69 > > 2097152 20 2309.05 2309.05 2309.05 > > 1732.32 > > 4194304 10 4548.72 4548.98 4548.85 > > 1758.64 > > [0] Abort: Got FATAL event 3 > > at line 796 in file ibv_channel_manager.c > > rank 0 in job 1 compute-0-0.local_36049 caused collective abort of > > all ranks > > exit status of rank 0: killed by signal 9 > > > > If, however, I define my mpdring to contain only Connect-X systems OR > > only Arbel systems, IMB-MPI1 runs to completion. > > > > Can any suggest a workaround or is this a real bug with mvapich2? > > > > -- > > Michael Heinz > > Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania > > > > > > > From weiny2 at llnl.gov Wed Apr 9 08:38:39 2008 From: weiny2 at llnl.gov (weiny2 at llnl.gov) Date: Wed, 9 Apr 2008 08:38:39 -0700 (PDT) Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ and Leaf HOQ input values In-Reply-To: <1207747629.15625.460.camel@hrosenstock-ws.xsigo.com> References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> <20080409100108.GB19834@sashak.voltaire.com> <1207747629.15625.460.camel@hrosenstock-ws.xsigo.com> Message-ID: <50410.128.15.244.131.1207755519.squirrel@127.0.0.1> > On Wed, 2008-04-09 at 10:01 +0000, Sasha Khapyorsky wrote: >> Hi Ira, >> >> On 16:48 Tue 08 Apr , weiny2 at llnl.gov wrote: >> > As per Hal's comments change the alternate value for [leaf] HOQ to be >> > "infinity" when the user specifies a value larger than "infinity". >> >> Actually I would prefer original version of the patch. The main reason >> is that infinite packet life time is really dangerous thing - in case >> when a fabric is routed with credit loops (very common case with default >> min-hops routing) it leads to total fabric stuck and not just to some >> performance degradation. >> >> So I think it is safer to reject invalid value and to set the default >> (log an error, etc.i). As it was done in the original version of the >> patch. >> >> Hal, do you agree? > > Safer yes but I think it is less to the intent of the admin who just > doesn't understand the max value for this and that's why I proposed this > change. My preference is to max it out but it comes down to a judgment > call. There's a downside either way. What if we set it to 0x13? This would be the maximum value that will not "lock" up the fabric. We could also add to the error message that the admin needs to specify 0x14 if they specifically want "infinity" to be set? Ira From rdreier at cisco.com Wed Apr 9 09:16:22 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 09 Apr 2008 09:16:22 -0700 Subject: [ofa-general] ipath_kernel.h:1115: error: implicit declaration of function 'writeq' on rhel5 In-Reply-To: <1207750066.3303.28.camel@pc.ilinx> (Brian J. Murrell's message of "Wed, 09 Apr 2008 10:07:46 -0400") References: <1207750066.3303.28.camel@pc.ilinx> Message-ID: ipath doesn't work on any 32-bit architecture. The kernel Kconfig file has config INFINIBAND_IPATH tristate "QLogic InfiniPath Driver" depends on (PCI_MSI || HT_IRQ) && 64BIT && NET but I guess the OFED build system doesn't enforce that. - R. From rdreier at cisco.com Wed Apr 9 09:18:37 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 09 Apr 2008 09:18:37 -0700 Subject: [ofa-general] ipath can work without MSI now? Message-ID: Given the commit below, does it make sense to change the Kconfig stuff config INFINIBAND_IPATH tristate "QLogic InfiniPath Driver" depends on (PCI_MSI || HT_IRQ) && 64BIT && NET to remove the (PCI_MSI || HT_IRQ), since it seems your new HCA would still work on a non-MSI-enabled kernel? commit 9c7b278d87088350aaf9dfe0ad50afa15722dbf6 Author: Dave Olson Date: Tue Jan 8 11:50:18 2008 -0800 IB/ipath: Fix check for no interrupts to reliably fallback to INTx Newer HCAs support MSI interrupts and also INTx interrupts. Fix the code so that INTx can be reliably enabled if MSI interrupts are not working. Signed-off-by: Dave Olson Signed-off-by: Roland Dreier From dave.olson at qlogic.com Wed Apr 9 09:22:13 2008 From: dave.olson at qlogic.com (Dave Olson) Date: Wed, 9 Apr 2008 09:22:13 -0700 (PDT) Subject: [ofa-general] Re: ipath can work without MSI now? In-Reply-To: References: Message-ID: On Wed, 9 Apr 2008, Roland Dreier wrote: | Given the commit below, does it make sense to change the Kconfig stuff | | config INFINIBAND_IPATH | tristate "QLogic InfiniPath Driver" | depends on (PCI_MSI || HT_IRQ) && 64BIT && NET | | to remove the (PCI_MSI || HT_IRQ), since it seems your new HCA would | still work on a non-MSI-enabled kernel? Not really, because it means the 6120 chips would not work, and the new 7220 still works better with MSI. I don't think the config mechanism can handle that as it stands unless we created a different driver. Or have I missed something that would cover that issue? The number of systems without MSI keeps dropping, so I think it's best to leave it as is. Dave Olson dave.olson at qlogic.com From hrosenstock at xsigo.com Wed Apr 9 09:30:32 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Wed, 09 Apr 2008 09:30:32 -0700 Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ and Leaf HOQ input values In-Reply-To: <50410.128.15.244.131.1207755519.squirrel@127.0.0.1> References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> <20080409100108.GB19834@sashak.voltaire.com> <1207747629.15625.460.camel@hrosenstock-ws.xsigo.com> <50410.128.15.244.131.1207755519.squirrel@127.0.0.1> Message-ID: <1207758632.15625.498.camel@hrosenstock-ws.xsigo.com> On Wed, 2008-04-09 at 08:38 -0700, weiny2 at llnl.gov wrote: > > On Wed, 2008-04-09 at 10:01 +0000, Sasha Khapyorsky wrote: > >> Hi Ira, > >> > >> On 16:48 Tue 08 Apr , weiny2 at llnl.gov wrote: > >> > As per Hal's comments change the alternate value for [leaf] HOQ to be > >> > "infinity" when the user specifies a value larger than "infinity". > >> > >> Actually I would prefer original version of the patch. The main reason > >> is that infinite packet life time is really dangerous thing - in case > >> when a fabric is routed with credit loops (very common case with default > >> min-hops routing) it leads to total fabric stuck and not just to some > >> performance degradation. > >> > >> So I think it is safer to reject invalid value and to set the default > >> (log an error, etc.i). As it was done in the original version of the > >> patch. > >> > >> Hal, do you agree? > > > > Safer yes but I think it is less to the intent of the admin who just > > doesn't understand the max value for this and that's why I proposed this > > change. My preference is to max it out but it comes down to a judgment > > call. There's a downside either way. > > What if we set it to 0x13? This would be the maximum value that will not > "lock" up the fabric. We could also add to the error message that the > admin needs to specify 0x14 if they specifically want "infinity" to be > set? So disallow the setting to infinity ? -- Hal > Ira > > > From weiny2 at llnl.gov Wed Apr 9 09:36:09 2008 From: weiny2 at llnl.gov (weiny2 at llnl.gov) Date: Wed, 9 Apr 2008 09:36:09 -0700 (PDT) Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ and Leaf HOQ input values In-Reply-To: <1207758632.15625.498.camel@hrosenstock-ws.xsigo.com> References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> <20080409100108.GB19834@sashak.voltaire.com> <1207747629.15625.460.camel@hrosenstock-ws.xsigo.com> <50410.128.15.244.131.1207755519.squirrel@127.0.0.1> <1207758632.15625.498.camel@hrosenstock-ws.xsigo.com> Message-ID: <50692.128.15.244.131.1207758969.squirrel@127.0.0.1> > On Wed, 2008-04-09 at 08:38 -0700, weiny2 at llnl.gov wrote: >> > On Wed, 2008-04-09 at 10:01 +0000, Sasha Khapyorsky wrote: >> >> Hi Ira, >> >> >> >> On 16:48 Tue 08 Apr , weiny2 at llnl.gov wrote: >> >> > As per Hal's comments change the alternate value for [leaf] HOQ to >> be >> >> > "infinity" when the user specifies a value larger than "infinity". >> >> >> >> Actually I would prefer original version of the patch. The main >> reason >> >> is that infinite packet life time is really dangerous thing - in case >> >> when a fabric is routed with credit loops (very common case with >> default >> >> min-hops routing) it leads to total fabric stuck and not just to some >> >> performance degradation. >> >> >> >> So I think it is safer to reject invalid value and to set the default >> >> (log an error, etc.i). As it was done in the original version of the >> >> patch. >> >> >> >> Hal, do you agree? >> > >> > Safer yes but I think it is less to the intent of the admin who just >> > doesn't understand the max value for this and that's why I proposed >> this >> > change. My preference is to max it out but it comes down to a judgment >> > call. There's a downside either way. >> >> What if we set it to 0x13? This would be the maximum value that will >> not >> "lock" up the fabric. We could also add to the error message that the >> admin needs to specify 0x14 if they specifically want "infinity" to be >> set? > > So disallow the setting to infinity ? > No, if you want infinity you have to specify 0x14 (19) in the opensm.opts file. For example, specifying 100 will set the value to 0x13 and warn the user that if they want infinity they will have to specify it explicitly; ie head_of_queue_lifetime = 0x14 Ira From hrosenstock at xsigo.com Wed Apr 9 09:40:28 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Wed, 09 Apr 2008 09:40:28 -0700 Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ and Leaf HOQ input values In-Reply-To: <50692.128.15.244.131.1207758969.squirrel@127.0.0.1> References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> <20080409100108.GB19834@sashak.voltaire.com> <1207747629.15625.460.camel@hrosenstock-ws.xsigo.com> <50410.128.15.244.131.1207755519.squirrel@127.0.0.1> <1207758632.15625.498.camel@hrosenstock-ws.xsigo.com> <50692.128.15.244.131.1207758969.squirrel@127.0.0.1> Message-ID: <1207759228.15625.502.camel@hrosenstock-ws.xsigo.com> On Wed, 2008-04-09 at 09:36 -0700, weiny2 at llnl.gov wrote: > > On Wed, 2008-04-09 at 08:38 -0700, weiny2 at llnl.gov wrote: > >> > On Wed, 2008-04-09 at 10:01 +0000, Sasha Khapyorsky wrote: > >> >> Hi Ira, > >> >> > >> >> On 16:48 Tue 08 Apr , weiny2 at llnl.gov wrote: > >> >> > As per Hal's comments change the alternate value for [leaf] HOQ to > >> be > >> >> > "infinity" when the user specifies a value larger than "infinity". > >> >> > >> >> Actually I would prefer original version of the patch. The main > >> reason > >> >> is that infinite packet life time is really dangerous thing - in case > >> >> when a fabric is routed with credit loops (very common case with > >> default > >> >> min-hops routing) it leads to total fabric stuck and not just to some > >> >> performance degradation. > >> >> > >> >> So I think it is safer to reject invalid value and to set the default > >> >> (log an error, etc.i). As it was done in the original version of the > >> >> patch. > >> >> > >> >> Hal, do you agree? > >> > > >> > Safer yes but I think it is less to the intent of the admin who just > >> > doesn't understand the max value for this and that's why I proposed > >> this > >> > change. My preference is to max it out but it comes down to a judgment > >> > call. There's a downside either way. > >> > >> What if we set it to 0x13? This would be the maximum value that will > >> not > >> "lock" up the fabric. We could also add to the error message that the > >> admin needs to specify 0x14 if they specifically want "infinity" to be > >> set? > > > > So disallow the setting to infinity ? > > > > No, if you want infinity you have to specify 0x14 (19) in the opensm.opts > file. For example, specifying 100 will set the value to 0x13 and warn the > user that if they want infinity they will have to specify it explicitly; > ie head_of_queue_lifetime = 0x14 That's another choice but seems a little weird to me that 20 is infinite and >20 is set less than that but this is a judgment call. Not sure what others think. At this point, I have nothing more to add on this. -- Hal > Ira > > > From Brian.Murrell at Sun.COM Wed Apr 9 09:42:24 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Wed, 09 Apr 2008 12:42:24 -0400 Subject: [ofa-general] ipath_kernel.h:1115: error: implicit declaration of function 'writeq' on rhel5 In-Reply-To: References: <1207750066.3303.28.camel@pc.ilinx> Message-ID: <1207759344.3303.46.camel@pc.ilinx> On Wed, 2008-04-09 at 09:16 -0700, Roland Dreier wrote: > ipath doesn't work on any 32-bit architecture. Indeed, this is what I had discovered. Using {read,write}q is just not kosher on < 64bit. > The kernel Kconfig file has > > config INFINIBAND_IPATH > tristate "QLogic InfiniPath Driver" > depends on (PCI_MSI || HT_IRQ) && 64BIT && NET Indeed, I saw that too. > but I guess the OFED build system doesn't enforce that. That's my uninformed conclusion thus far too. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From sashak at voltaire.com Wed Apr 9 13:46:03 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 9 Apr 2008 20:46:03 +0000 Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ and Leaf HOQ input values In-Reply-To: <50410.128.15.244.131.1207755519.squirrel@127.0.0.1> References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> <20080409100108.GB19834@sashak.voltaire.com> <1207747629.15625.460.camel@hrosenstock-ws.xsigo.com> <50410.128.15.244.131.1207755519.squirrel@127.0.0.1> Message-ID: <20080409204603.GB20833@sashak.voltaire.com> On 08:38 Wed 09 Apr , weiny2 at llnl.gov wrote: > > What if we set it to 0x13? This would be the maximum value that will not > "lock" up the fabric. We could also add to the error message that the > admin needs to specify 0x14 if they specifically want "infinity" to be > set? I think in the case when parameter value provided by user is wrong it is not easy to guess correctly what original wishes was. Probably we just need to add something like: ## valid values are <= 0x14 in config file template and reject any invalid values (I mean set to defaults)? Sasha From hrosenstock at xsigo.com Wed Apr 9 10:53:42 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Wed, 09 Apr 2008 10:53:42 -0700 Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ and Leaf HOQ input values In-Reply-To: <20080409204603.GB20833@sashak.voltaire.com> References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> <20080409100108.GB19834@sashak.voltaire.com> <1207747629.15625.460.camel@hrosenstock-ws.xsigo.com> <50410.128.15.244.131.1207755519.squirrel@127.0.0.1> <20080409204603.GB20833@sashak.voltaire.com> Message-ID: <1207763622.15625.506.camel@hrosenstock-ws.xsigo.com> On Wed, 2008-04-09 at 20:46 +0000, Sasha Khapyorsky wrote: > On 08:38 Wed 09 Apr , weiny2 at llnl.gov wrote: > > > > What if we set it to 0x13? This would be the maximum value that will not > > "lock" up the fabric. We could also add to the error message that the > > admin needs to specify 0x14 if they specifically want "infinity" to be > > set? > > I think in the case when parameter value provided by user is wrong it > is not easy to guess correctly what original wishes was. Probably we > just need to add something like: > > ## valid values are <= 0x14 > > in config file template and reject any invalid values (I mean set to > defaults)? That's consistent with other invalid settings so maybe also add info on valid values into opensm.opts to try to reduce the chance of this occurring. -- Hal > Sasha > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From bs at q-leap.de Wed Apr 9 10:56:21 2008 From: bs at q-leap.de (Bernd Schubert) Date: Wed, 9 Apr 2008 19:56:21 +0200 Subject: [ofa-general] ERR 0108: Unknown remote side In-Reply-To: <47FBD40E.70407@mellanox.co.il> References: <200804041147.27565.bs@q-leap.de> <20080408183113.GA18308@sashak.voltaire.com> <47FBD40E.70407@mellanox.co.il> Message-ID: <200804091956.21840.bs@q-leap.de> Hello Yevgeny! On Tuesday 08 April 2008 22:22:38 Yevgeny Kliteynik wrote: > Sasha Copyist wrote: > > Hi Bernd, > > > > [adding Yevgeny..] > > > > On 11:35 Tue 08 Apr , Bernd Schubert wrote: > >> On Tuesday 08 April 2008 03:44:06 Sasha Copyist wrote: > >>> Hi Bernd, > >>> > >>> On 11:47 Fri 04 Apr , Bernd Schubert wrote: > >>>> opensm-3.2.1 logs some error messages like this: > >>>> > >>>> Apr 04 00:00:08 325114 [4580A960] 0x01 -> > >>>> __osm_state_mgr_light_sweep_start: ERR 0108: Unknown remote side for > >>>> node 0 > >>>> x000b8cffff002ba2(SW_pfs1_leaf4) port 13. Adding to light sweep > >>>> sampling list Apr 04 00:00:08 325126 [4580A960] 0x01 -> Directed Path > >>>> Dump of 3 hop path: Path = 0,1,14,13 > >>>> > >>>> > >>>> From ibnetdiscover output I see port13 of this switch is a > >>>> switch-interconnect (sorry, I don't know what the correct > >>>> name/identifier for switches within switches): > >>>> > >>>> [13] "S-000b8cffff002bfa"[13] # "SW_pfs1_inter7" lid > >>>> 263 4xSDR > >>> > >>> It is possible that port was DOWN during first subnet discovery. > >>> Finally everything should be initialized after those messages. Isn't it > >>> the case here? > >> > >> I think everything is initialized, but I don't think the port was down > >> during first subnet discovery, since the port is on a spine board (I > >> called it 'inter') to another switch system. We also never added any > >> leafes to the switches. > > > > It is interesting phenomena then. > > > > Yevgeny, do you aware about such issue with Flextrinocs switches? > > I've seen it before. It means that during discovery some switch has > answered NodeInfo query, but then when OpenSM started to query for > PortInfo for each port of this switch, switch didn't answer for some > (or all) ports. > > I think that this might happen if a switch has just been "plugged in", > and internal switches are doing autonegotiation - they are bringing > ports up and down when determining whether a link is SDR or DDR. > > In any case, this "phenomena" should disappear after a couple of > dozens of seconds, when all the autonegotiation phase would be over. > > Bernd, am I close? > We never plugged in additional switches and the message appear on each opensm startup. However, the messages appear only once after opensm was started, but then never again. Would the switches do a SDR/DDR negotiation on opensm startup? And since we are at SDR/DDR, it also might be related. Hal and I are also discussing an odd SDR/DDR ibnetdiscover problem. Ibnetdiscover just thinks some ports are at SDR, while ibstatus and perfquery do tell these ports are at DDR. Thanks, Bernd -- Bernd Schubert Q-Leap Networks GmbH From weiny2 at llnl.gov Wed Apr 9 11:01:36 2008 From: weiny2 at llnl.gov (weiny2 at llnl.gov) Date: Wed, 9 Apr 2008 11:01:36 -0700 (PDT) Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ and Leaf HOQ input values In-Reply-To: <20080409204603.GB20833@sashak.voltaire.com> References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> <20080409100108.GB19834@sashak.voltaire.com> <1207747629.15625.460.camel@hrosenstock-ws.xsigo.com> <50410.128.15.244.131.1207755519.squirrel@127.0.0.1> <20080409204603.GB20833@sashak.voltaire.com> Message-ID: <50823.128.15.244.169.1207764096.squirrel@127.0.0.1> > On 08:38 Wed 09 Apr , weiny2 at llnl.gov wrote: >> >> What if we set it to 0x13? This would be the maximum value that will >> not >> "lock" up the fabric. We could also add to the error message that the >> admin needs to specify 0x14 if they specifically want "infinity" to be >> set? > > I think in the case when parameter value provided by user is wrong it > is not easy to guess correctly what original wishes was. Probably we > just need to add something like: > > ## valid values are <= 0x14 > > in config file template and reject any invalid values (I mean set to > defaults)? The config file comments already mention this: "# The code of maximal time a packet can wait at the head of\n" "# transmission queue.\n" "# The actual time is 4.096usec * 2^\n" "# The value 0x14 disables this mechanism\n" "head_of_queue_lifetime 0x%02x\n\n" But I guess "disables" should be "infinity" to make this more clear. I will leave it up to you as to which patch you want. As Hal said I can see either side. Both patches warn the user that the value they submitted was not valid; and subsequently what value OpenSM is using instead. Whatever you want to do Sasha... :-) Ira From swise at opengridcomputing.com Wed Apr 9 11:06:58 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 09 Apr 2008 13:06:58 -0500 Subject: [ofa-general] Directions for verbs API extensions In-Reply-To: References: <47FA3D60.3020905@opengridcomputing.com> Message-ID: <47FD05C2.7090001@opengridcomputing.com> Roland Dreier wrote: > > > There are a few discrepancies between the iWARP and IB verbs that we > > > need to decide on how we want to handle: > > > > > > - In IB-BMME, L_Keys and R_Keys are split up so that there is an > > > 8-bit "key" that is owned by the consumer. As far as I know, there > > > is no analogous concept defined for iWARP STags; is there any point > > > in supporting this IB-only feature (which is optional even in the > > > IB spec)? > > > In fact there is an 8b key for stags as well. The stag is composed of > > a 3B index allocated by the driver/hw, and a 1B key specified by the > > consumer. None of this is exposed in the linux rdma interface at this > > point and cxgb3 always sets the key to 0xff. > > Oops, I completely missed that in the iWARP verbs spec. Yes, the IB and > iWARP verbs agree on the semantics here, so the only issue is that the > "key" portion of L_Keys/R_Keys is only supported by IB devices that do > BMME. So we can expose this in the API without too much trouble. > > > The chelsio driver supports the iwarp bind_mw SQ WR via the current > > API. In fact the current API implies that this call is actually a SQ > > operation anyway: > > > /** > > > * ib_bind_mw - Posts a work request to the send queue of the specified > > > * QP, which binds the memory window to the given address range and > > > * remote access attributes. > > > > How is the current bind_mw API not valid or correct for iwarp MWs? > > Other than being a different call than ib_post_send()? > > That's the only issue. The main impact is that you can't submit an MW > bind as part of a list of send WRs. I guess it's not too severe an > issue. I don't have any strong feelings here, except that eliminating > the separate bind_mw call might be a little cleaner. On the other hand > it adds more conditional branches to post_send so maybe it's a net lose. > BTW: looks like /usr/include/infiniband/verbs.h is missing a ibv_bind_mw() function. The struct and context ops are there, but no API func. This means there is no bind_mw support for user mode at this point. So we don't have to worry about backwards compatibility... Steve. From bs at q-leap.de Wed Apr 9 11:11:16 2008 From: bs at q-leap.de (Bernd Schubert) Date: Wed, 9 Apr 2008 20:11:16 +0200 Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ and Leaf HOQ input values In-Reply-To: <50823.128.15.244.169.1207764096.squirrel@127.0.0.1> References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> <20080409204603.GB20833@sashak.voltaire.com> <50823.128.15.244.169.1207764096.squirrel@127.0.0.1> Message-ID: <200804092011.17361.bs@q-leap.de> On Wednesday 09 April 2008 20:01:36 weiny2 at llnl.gov wrote: > > On 08:38 Wed 09 Apr , weiny2 at llnl.gov wrote: > >> What if we set it to 0x13? This would be the maximum value that will > >> not > >> "lock" up the fabric. We could also add to the error message that the > >> admin needs to specify 0x14 if they specifically want "infinity" to be > >> set? > > > > I think in the case when parameter value provided by user is wrong it > > is not easy to guess correctly what original wishes was. Probably we > > just need to add something like: > > > > ## valid values are <= 0x14 > > > > in config file template and reject any invalid values (I mean set to > > defaults)? > > The config file comments already mention this: > > "# The code of maximal time a packet can wait at the head of\n" > "# transmission queue.\n" > "# The actual time is 4.096usec * 2^\n" > "# The value 0x14 disables this mechanism\n" > "head_of_queue_lifetime 0x%02x\n\n" > > But I guess "disables" should be "infinity" to make this more clear. When I first read this and when increasing the value from 0x12 to 0x13 didn't help, I thought fine, if 0x14 disables it I just set it to 0x15. What about "# The maximum is 0x14, which will disable this mechanism.\n" Thanks, Bernd -- Bernd Schubert Q-Leap Networks GmbH From hrosenstock at xsigo.com Wed Apr 9 11:19:07 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Wed, 09 Apr 2008 11:19:07 -0700 Subject: [ofa-general] ERR 0108: Unknown remote side In-Reply-To: <200804091956.21840.bs@q-leap.de> References: <200804041147.27565.bs@q-leap.de> <20080408183113.GA18308@sashak.voltaire.com> <47FBD40E.70407@mellanox.co.il> <200804091956.21840.bs@q-leap.de> Message-ID: <1207765147.15625.510.camel@hrosenstock-ws.xsigo.com> On Wed, 2008-04-09 at 19:56 +0200, Bernd Schubert wrote: > Hello Yevgeny! > > On Tuesday 08 April 2008 22:22:38 Yevgeny Kliteynik wrote: > > Sasha Copyist wrote: > > > Hi Bernd, > > > > > > [adding Yevgeny..] > > > > > > On 11:35 Tue 08 Apr , Bernd Schubert wrote: > > >> On Tuesday 08 April 2008 03:44:06 Sasha Copyist wrote: > > >>> Hi Bernd, > > >>> > > >>> On 11:47 Fri 04 Apr , Bernd Schubert wrote: > > >>>> opensm-3.2.1 logs some error messages like this: > > >>>> > > >>>> Apr 04 00:00:08 325114 [4580A960] 0x01 -> > > >>>> __osm_state_mgr_light_sweep_start: ERR 0108: Unknown remote side for > > >>>> node 0 > > >>>> x000b8cffff002ba2(SW_pfs1_leaf4) port 13. Adding to light sweep > > >>>> sampling list Apr 04 00:00:08 325126 [4580A960] 0x01 -> Directed Path > > >>>> Dump of 3 hop path: Path = 0,1,14,13 > > >>>> > > >>>> > > >>>> From ibnetdiscover output I see port13 of this switch is a > > >>>> switch-interconnect (sorry, I don't know what the correct > > >>>> name/identifier for switches within switches): > > >>>> > > >>>> [13] "S-000b8cffff002bfa"[13] # "SW_pfs1_inter7" lid > > >>>> 263 4xSDR > > >>> > > >>> It is possible that port was DOWN during first subnet discovery. > > >>> Finally everything should be initialized after those messages. Isn't it > > >>> the case here? > > >> > > >> I think everything is initialized, but I don't think the port was down > > >> during first subnet discovery, since the port is on a spine board (I > > >> called it 'inter') to another switch system. We also never added any > > >> leafes to the switches. > > > > > > It is interesting phenomena then. > > > > > > Yevgeny, do you aware about such issue with Flextrinocs switches? > > > > I've seen it before. It means that during discovery some switch has > > answered NodeInfo query, but then when OpenSM started to query for > > PortInfo for each port of this switch, switch didn't answer for some > > (or all) ports. > > > > I think that this might happen if a switch has just been "plugged in", > > and internal switches are doing autonegotiation - they are bringing > > ports up and down when determining whether a link is SDR or DDR. > > > > In any case, this "phenomena" should disappear after a couple of > > dozens of seconds, when all the autonegotiation phase would be over. > > > > Bernd, am I close? > > > > We never plugged in additional switches and the message appear on each opensm > startup. However, the messages appear only once after opensm was started, but > then never again. Would the switches do a SDR/DDR negotiation on opensm > startup? Links perform physical negotiation independent of SM. > And since we are at SDR/DDR, it also might be related. Hal and I are also > discussing an odd SDR/DDR ibnetdiscover problem. Ibnetdiscover just thinks > some ports are at SDR, while ibstatus and perfquery do tell these ports are > at DDR. I'm not sure the link speed is "stable". -- Hal > Thanks, > Bernd > > From hrosenstock at xsigo.com Wed Apr 9 11:20:19 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Wed, 09 Apr 2008 11:20:19 -0700 Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ and Leaf HOQ input values In-Reply-To: <200804092011.17361.bs@q-leap.de> References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> <20080409204603.GB20833@sashak.voltaire.com> <50823.128.15.244.169.1207764096.squirrel@127.0.0.1> <200804092011.17361.bs@q-leap.de> Message-ID: <1207765219.15625.511.camel@hrosenstock-ws.xsigo.com> On Wed, 2008-04-09 at 20:11 +0200, Bernd Schubert wrote: > On Wednesday 09 April 2008 20:01:36 weiny2 at llnl.gov wrote: > > > On 08:38 Wed 09 Apr , weiny2 at llnl.gov wrote: > > >> What if we set it to 0x13? This would be the maximum value that will > > >> not > > >> "lock" up the fabric. We could also add to the error message that the > > >> admin needs to specify 0x14 if they specifically want "infinity" to be > > >> set? > > > > > > I think in the case when parameter value provided by user is wrong it > > > is not easy to guess correctly what original wishes was. Probably we > > > just need to add something like: > > > > > > ## valid values are <= 0x14 > > > > > > in config file template and reject any invalid values (I mean set to > > > defaults)? > > > > The config file comments already mention this: > > > > "# The code of maximal time a packet can wait at the head of\n" > > "# transmission queue.\n" > > "# The actual time is 4.096usec * 2^\n" > > "# The value 0x14 disables this mechanism\n" > > "head_of_queue_lifetime 0x%02x\n\n" > > > > But I guess "disables" should be "infinity" to make this more clear. > > When I first read this and when increasing the value from 0x12 to 0x13 didn't > help, I thought fine, if 0x14 disables it I just set it to 0x15. > What about > > "# The maximum is 0x14, which will disable this mechanism.\n" Yes, that's what I was trying to suggest. -- Hal > > > Thanks, > Bernd > From pwatkins at sicortex.com Wed Apr 9 11:27:16 2008 From: pwatkins at sicortex.com (Peter Watkins) Date: Wed, 09 Apr 2008 14:27:16 -0400 Subject: [ofa-general] ofed works on kernels with 64Kbyte pages? Message-ID: <47FD0A84.8020404@sicortex.com> > I know it's a long shot, but has anyone tried using OFED on > a kernel with 64Kbyte pages? We have 64K pages on our MIPS machines, and OFED 1.2.5 is used to connect to a disk array. Haven't tested lots of configurations, nor used other OFED paths, but it works. 01:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor compatibility mode) (rev a0) Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor compatibility mode) Flags: bus master, fast devsel, latency 0, IRQ 23 Memory at 818000000 (64-bit, non-prefetchable) [size=1M] Memory at 810000000 (64-bit, prefetchable) [size=8M] Memory at 800000000 (64-bit, prefetchable) [size=256M] Capabilities: [40] Power Management version 2 Capabilities: [48] Vital Product Data Capabilities: [90] Message Signalled Interrupts: Mask- 64bit+ Queue=0/5 Enable- Capabilities: [84] MSI-X: Enable- Mask- TabSize=32 Capabilities: [60] Express Endpoint, MSI 00 Kernel driver in use: ib_mthca Kernel modules: ib_mthca From hrosenstock at xsigo.com Wed Apr 9 11:37:58 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Wed, 09 Apr 2008 11:37:58 -0700 Subject: [ofa-general] Re: running opensm 3.0.3 on 4000+ node system In-Reply-To: <01388EFD6F94FE4787C7CB970014DF670C406E129B@ES01SNLNT.srn.sandia.gov> References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> <20080409204603.GB20833@sashak.voltaire.com> <50823.128.15.244.169.1207764096.squirrel@127.0.0.1> <200804092011.17361.bs@q-leap.de> <1207765219.15625.511.camel@hrosenstock-ws.xsigo.com> <01388EFD6F94FE4787C7CB970014DF670C406E129B@ES01SNLNT.srn.sandia.gov> Message-ID: <1207766278.15625.523.camel@hrosenstock-ws.xsigo.com> On Wed, 2008-04-09 at 12:26 -0600, Maestas, Christopher Daniel wrote: > I'm trying to run opensm on a 4000+ node system, Which version ? Do you mean 3.0.3 (or 3.0.13) ? > and seem to be having difficulties in keeping the opensm around. > When I attach to the process w/ strace it does: > --- > # strace -p 5921 > Process 5921 attached - interrupt to quit restart_syscall(<... resuming interrupted call ...>) = 0 > nanosleep({10, 0}, NULL) = 0 > nanosleep({10, 0}, NULL) = 0 > nanosleep({10, 0}, NULL) = 0 > nanosleep({10, 0}, NULL) = 0 > ... > nanosleep({10, 0}, NULL) = 0 > nanosleep({10, 0}, NULL) = 0 > nanosleep({10, 0}, NULL) = 0 > nanosleep({10, 0}, NULL) = 0 > nanosleep({10, 0}, NULL) = 0 > nanosleep({10, 0}, NULL) = 0 > nanosleep({10, 0}, NULL) = 0 > nanosleep({10, 0}, NULL) = 0 > nanosleep({10, 0}, NULL) = 0 > nanosleep({10, 0}, NULL) = 0 > nanosleep({10, 0}, NULL) = 0 > nanosleep({10, 0}, NULL) = 0 > nanosleep({10, 0}, NULL) = 0 > nanosleep({10, 0}, NULL) = 0 > nanosleep({10, 0}, NULL) = 0 > nanosleep({10, 0}, > +++ killed by SIGSEGV +++ > --- > > I have ofed 1.1 and 1.2 drivers loaded on the system. I've done this in the past using opensm 3.0.0 svn tag 10188 from ofed 1.0 clients and had no issues before. Here's how opensm is running: > --- > 6079 pts/0 Sl 0:08 /usr/sbin/opensm -d 3 -maxsmps 0 -s 300 -t 1000 -f /var/log/osm.log -V -g 0 > --- > > I have lots of data in the osm.log as you can imagine ... I don't know offhand what I should be looking at/for. What's towards the end of the log ? -- Hal > Thanks, > -cdm > From holt at sgi.com Wed Apr 9 11:55:00 2008 From: holt at sgi.com (Robin Holt) Date: Wed, 9 Apr 2008 13:55:00 -0500 Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12 In-Reply-To: <20080409144401.GT10133@duo.random> References: <20080409131709.GR11364@sgi.com> <20080409144401.GT10133@duo.random> Message-ID: <20080409185500.GT11364@sgi.com> On Wed, Apr 09, 2008 at 04:44:01PM +0200, Andrea Arcangeli wrote: > BTW, how did you implement invalidate_page? As this? > > invalidate_page() { > invalidate_range_begin() > invalidate_range_end() > } Essentially, I did the work of each step without releasing and reacquiring locks. > If yes, I prefer to remind you that normally invalidate_range_begin is > always called before zapping the pte. In the invalidate_page case > instead, invalidate_range_begin is called _after_ the pte has been > zapped already. > > Now there's no problem if the pte is established and the spte isn't > established. But it must never happen that the spte is established and > the pte isn't established (with page-pinning that means unswappable > memlock leak, without page-pinning it would mean memory corruption). I am not sure I follow what you are saying. Here is a very terse breakdown of how PFNs flow through xpmem's structures. We have a PFN table associated with our structure describing a grant. We use get_user_pages() to acquire information for that table and we fill the table in under a mutex. Remote hosts (on the same numa network so they have direct access to the users memory) have a PROXY version of that structure. It is filled out in a similar fashion to the local table. PTEs are created for the other processes while holding the mutex for this table (either local or remote). During the process of faulting, we have a simple linked list of ongoing faults that is maintained whenever the mutex is going to be released. Our version of a zap_page_range is called recall_PFNs. The recall process grabs the mutex, scans the faulters list for any that cover the range and mark them as needing a retry. It then calls zap_page_range for any processes that have attached the granted memory to clear out their page tables. Finally, we release the mutex and proceed. The locking is more complex than this, but that is the essential idea. What that means for mmu_notifiers is we have a single reference on the page for all the remote processes using it. When the callout to invalidate_page() is made, we will still have processes with that PTE in their page tables and potentially TLB entries. When we return from the invalidate_page() callout, we will have removed all those page table entries, we will have no in-progress page table or tlb insertions that will complete, and we will have released all our references to the page. Does that meet your expectations? Thanks, Robin > > So the range_begin must serialize against the secondary mmu page fault > so that it can't establish the spte on a pte that was zapped by the > rmap code after get_user_pages/follow_page returned. I think your > range_begin already does that so you should be ok but I wanted to > remind about this slight difference in implementing invalidate_page as > I suggested above in previous email just to be sure ;). > > This is the race you must guard against in invalidate_page: > > > CPU0 CPU1 > try_to_unmap on page > secondary mmu page fault > get_user_pages()/follow_page found a page > ptep_clear_flush > invalidate_page() > invalidate_range_begin() > invalidate_range_end() > return from invalidate_page > establish spte on page > return from secodnary mmu page fault > > If your range_begin already serializes in a hard way against the > secondary mmu page fault, my previously "trivial" suggested > implementation for invalidate_page should work just fine and this this > saves 1 branch for each try_to_unmap_one if compared to the emm > implementation. The branch check is inlined and it checks against the > mmu_notifier_head that is the hot cacheline, no new cachline is > checked just one branch is saved and so it worth it IMHO even if it > doesn't provide any other advantage if you implement it the way above. From hrosenstock at xsigo.com Wed Apr 9 13:35:53 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Wed, 09 Apr 2008 13:35:53 -0700 Subject: [ofa-general] Re: running opensm 3.0.3 on 4000+ node system In-Reply-To: <01388EFD6F94FE4787C7CB970014DF670C406E129B@ES01SNLNT.srn.sandia.gov> References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> <20080409204603.GB20833@sashak.voltaire.com> <50823.128.15.244.169.1207764096.squirrel@127.0.0.1> <200804092011.17361.bs@q-leap.de> <1207765219.15625.511.camel@hrosenstock-ws.xsigo.com> <01388EFD6F94FE4787C7CB970014DF670C406E129B@ES01SNLNT.srn.sandia.gov> Message-ID: <1207773353.15625.542.camel@hrosenstock-ws.xsigo.com> On Wed, 2008-04-09 at 12:26 -0600, Maestas, Christopher Daniel wrote: > I have ofed 1.1 and 1.2 drivers loaded on the system. I've done this in the past using opensm 3.0.0 svn tag 10188 from ofed 1.0 clients and had no issues before. Here's how opensm is running: Which OpenSM was run before ? Also, which kernel is being used and what is meant by both ofed 1.1 and 1.2 drivers ? > 6079 pts/0 Sl 0:08 /usr/sbin/opensm -d 3 -maxsmps 0 -s 300 -t 1000 -f /var/log/osm.log -V -g 0 > --- Can you try without infinite SMPs ? Is this how it was run before ? -- Hal > -cdm > From hrosenstock at xsigo.com Wed Apr 9 13:39:28 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Wed, 09 Apr 2008 13:39:28 -0700 Subject: [ofa-general] RE: running opensm 3.0.3 on 4000+ node system In-Reply-To: <01388EFD6F94FE4787C7CB970014DF670C406E12B2@ES01SNLNT.srn.sandia.gov> References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> <20080409204603.GB20833@sashak.voltaire.com> <50823.128.15.244.169.1207764096.squirrel@127.0.0.1> <200804092011.17361.bs@q-leap.de> <1207765219.15625.511.camel@hrosenstock-ws.xsigo.com> <01388EFD6F94FE4787C7CB970014DF670C406E129B@ES01SNLNT.srn.sandia.gov> <1207766278.15625.523.camel@hrosenstock-ws.xsigo.com> <01388EFD6F94FE4787C7CB970014DF670C406E12B2@ES01SNLNT.srn.sandia.gov> Message-ID: <1207773568.15625.547.camel@hrosenstock-ws.xsigo.com> Hi Christopher, On Wed, 2008-04-09 at 13:14 -0600, Maestas, Christopher Daniel wrote: > Hello Hal, > > -----Original Message----- > From: Hal Rosenstock [mailto:hrosenstock at xsigo.com] > Sent: Wednesday, April 09, 2008 12:38 PM > To: Maestas, Christopher Daniel > Cc: general at lists.openfabrics.org > Subject: Re: running opensm 3.0.3 on 4000+ node system > > On Wed, 2008-04-09 at 12:26 -0600, Maestas, Christopher Daniel wrote: > > I'm trying to run opensm on a 4000+ node system, > > Which version ? Do you mean 3.0.3 (or 3.0.13) ? > > cdm> Version 3.0.13 ... you're right on that > # rpm -q opensm > opensm-3.0.3-6.el5_1.1 > --- > Apr 9 12:49:53 HOST OpenSM[3295]: /var/log/osm.log log file opened > Apr 9 12:49:53 HOST OpenSM[3295]: OpenSM Rev:openib-3.0.13 > Apr 9 12:49:53 HOST kernel: user_mad: process opensm did not enable P_Key index support. > Apr 9 12:49:53 HOST kernel: user_mad: Documentation/infiniband/user_mad.txt has info on the new ABI. > Apr 9 12:49:59 HOST OpenSM[3295]: Entering MASTER state > Apr 9 12:50:02 HOST OpenSM[3295]: Errors during initialization Your subnet has errors :-( > Apr 9 12:50:16 HOST OpenSM[3295]: SUBNET UP > Apr 9 12:50:22 HOST kernel: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready > Apr 9 12:50:30 HOST OpenSM[3295]: Errors during initialization > Apr 9 12:51:05 HOST last message repeated 2 times > Apr 9 12:52:17 HOST last message repeated 3 times > Apr 9 12:53:27 HOST last message repeated 3 times > ... > > > and seem to be having difficulties in keeping the opensm around. > > When I attach to the process w/ strace it does: > > --- > > # strace -p 5921 > > Process 5921 attached - interrupt to quit restart_syscall(<... resuming interrupted call ...>) = 0 > > nanosleep({10, 0}, NULL) = 0 > > nanosleep({10, 0}, NULL) = 0 > > nanosleep({10, 0}, NULL) = 0 > > nanosleep({10, 0}, NULL) = 0 > > ... > > nanosleep({10, 0}, NULL) = 0 > > nanosleep({10, 0}, NULL) = 0 > > nanosleep({10, 0}, NULL) = 0 > > nanosleep({10, 0}, NULL) = 0 > > nanosleep({10, 0}, NULL) = 0 > > nanosleep({10, 0}, NULL) = 0 > > nanosleep({10, 0}, NULL) = 0 > > nanosleep({10, 0}, NULL) = 0 > > nanosleep({10, 0}, NULL) = 0 > > nanosleep({10, 0}, NULL) = 0 > > nanosleep({10, 0}, NULL) = 0 > > nanosleep({10, 0}, NULL) = 0 > > nanosleep({10, 0}, NULL) = 0 > > nanosleep({10, 0}, NULL) = 0 > > nanosleep({10, 0}, NULL) = 0 > > nanosleep({10, 0}, > > +++ killed by SIGSEGV +++ > > --- > > > > I have ofed 1.1 and 1.2 drivers loaded on the system. I've done this in the past using opensm 3.0.0 svn tag 10188 from ofed 1.0 clients and had no issues before. Here's how opensm is running: > > --- > > 6079 pts/0 Sl 0:08 /usr/sbin/opensm -d 3 -maxsmps 0 -s 300 -t 1000 -f /var/log/osm.log -V -g 0 > > --- > > > > I have lots of data in the osm.log as you can imagine ... I don't know offhand what I should be looking at/for. > > What's towards the end of the log ? > > cdm> > I rebooted the node ... then brought ib0, then restarted opensmd ... It died when file got this big: > # ls -l osm.log -h > -rw-r--r-- 1 root root 3.2G Apr 9 13:12 osm.log > # tail osm.log > Apr 09 13:12:31 439877 [43204940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x0089 Port 12 TID:0x00000000000032d3 > Apr 09 13:12:31 440370 [41E02940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x00D0 Port 3 TID:0x0000000000007480 > Apr 09 13:12:31 440669 [43204940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x00B3 Port 7 TID:0x00000000000058dd > Apr 09 13:12:31 440987 [41E02940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x0082 Port 21 TID:0x000000000000285a > Apr 09 13:12:31 441228 [43204940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x00E8 Port 10 TID:0x00000000000095a2 > Apr 09 13:12:31 441579 [41E02940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x004A Port 1 TID:0x0000000000010d29 > Apr 09 13:12:31 441847 [43204940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x0063 Port 24 TID:0x000000000000e40c > Apr 09 13:12:31 442130 [41E02940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x000A Port 23 TID:0x000000000006fca2 > Apr 09 13:12:31 442469 [43204940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x0009 Port 18 TID:0x0000000000059fc4 > Apr 09 13:12:31 442710 [41E02940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x0009 Port 17 TID:0x0000000000059fc5 Those are flow control watchdog errors. Any special opensm options set in the option file or are you running with the defaults ? -- Hal From hrosenstock at xsigo.com Wed Apr 9 13:41:30 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Wed, 09 Apr 2008 13:41:30 -0700 Subject: [ofa-general] Re: running opensm 3.0.3 on 4000+ node system In-Reply-To: <1207773353.15625.542.camel@hrosenstock-ws.xsigo.com> References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> <20080409204603.GB20833@sashak.voltaire.com> <50823.128.15.244.169.1207764096.squirrel@127.0.0.1> <200804092011.17361.bs@q-leap.de> <1207765219.15625.511.camel@hrosenstock-ws.xsigo.com> <01388EFD6F94FE4787C7CB970014DF670C406E129B@ES01SNLNT.srn.sandia.gov> <1207773353.15625.542.camel@hrosenstock-ws.xsigo.com> Message-ID: <1207773690.15625.548.camel@hrosenstock-ws.xsigo.com> On Wed, 2008-04-09 at 13:35 -0700, Hal Rosenstock wrote: > On Wed, 2008-04-09 at 12:26 -0600, Maestas, Christopher Daniel wrote: > > > I have ofed 1.1 and 1.2 drivers loaded on the system. I've done this in the past using opensm 3.0.0 svn tag 10188 from ofed 1.0 clients and had no issues before. Here's how opensm is running: > > Which OpenSM was run before ? I just saw your response on this. Sorry I missed it. -- Hal > Also, which kernel is being used and what > is meant by both ofed 1.1 and 1.2 drivers ? > > > 6079 pts/0 Sl 0:08 /usr/sbin/opensm -d 3 -maxsmps 0 -s 300 -t 1000 -f /var/log/osm.log -V -g 0 > > --- > > Can you try without infinite SMPs ? Is this how it was run before ? > > -- Hal > > > -cdm > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From hrosenstock at xsigo.com Wed Apr 9 13:56:04 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Wed, 09 Apr 2008 13:56:04 -0700 Subject: [ofa-general] RE: running opensm 3.0.3 on 4000+ node system In-Reply-To: <1207773568.15625.547.camel@hrosenstock-ws.xsigo.com> References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> <20080409204603.GB20833@sashak.voltaire.com> <50823.128.15.244.169.1207764096.squirrel@127.0.0.1> <200804092011.17361.bs@q-leap.de> <1207765219.15625.511.camel@hrosenstock-ws.xsigo.com> <01388EFD6F94FE4787C7CB970014DF670C406E129B@ES01SNLNT.srn.sandia.gov> <1207766278.15625.523.camel@hrosenstock-ws.xsigo.com> <01388EFD6F94FE4787C7CB970014DF670C406E12B2@ES01SNLNT.srn.sandia.gov> <1207773568.15625.547.camel@hrosenstock-ws.xsigo.com> Message-ID: <1207774564.15625.552.camel@hrosenstock-ws.xsigo.com> On Wed, 2008-04-09 at 13:39 -0700, Hal Rosenstock wrote: > Hi Christopher, > > On Wed, 2008-04-09 at 13:14 -0600, Maestas, Christopher Daniel wrote: > > Hello Hal, > > > > -----Original Message----- > > From: Hal Rosenstock [mailto:hrosenstock at xsigo.com] > > Sent: Wednesday, April 09, 2008 12:38 PM > > To: Maestas, Christopher Daniel > > Cc: general at lists.openfabrics.org > > Subject: Re: running opensm 3.0.3 on 4000+ node system > > > > On Wed, 2008-04-09 at 12:26 -0600, Maestas, Christopher Daniel wrote: > > > I'm trying to run opensm on a 4000+ node system, > > > > Which version ? Do you mean 3.0.3 (or 3.0.13) ? > > > > cdm> Version 3.0.13 ... you're right on that > > # rpm -q opensm > > opensm-3.0.3-6.el5_1.1 > > --- > > Apr 9 12:49:53 HOST OpenSM[3295]: /var/log/osm.log log file opened > > Apr 9 12:49:53 HOST OpenSM[3295]: OpenSM Rev:openib-3.0.13 > > Apr 9 12:49:53 HOST kernel: user_mad: process opensm did not enable P_Key index support. > > Apr 9 12:49:53 HOST kernel: user_mad: Documentation/infiniband/user_mad.txt has info on the new ABI. > > Apr 9 12:49:59 HOST OpenSM[3295]: Entering MASTER state > > Apr 9 12:50:02 HOST OpenSM[3295]: Errors during initialization > > Your subnet has errors :-( > > > Apr 9 12:50:16 HOST OpenSM[3295]: SUBNET UP > > Apr 9 12:50:22 HOST kernel: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready > > Apr 9 12:50:30 HOST OpenSM[3295]: Errors during initialization > > Apr 9 12:51:05 HOST last message repeated 2 times > > Apr 9 12:52:17 HOST last message repeated 3 times > > Apr 9 12:53:27 HOST last message repeated 3 times > > ... > > > > > and seem to be having difficulties in keeping the opensm around. > > > When I attach to the process w/ strace it does: > > > --- > > > # strace -p 5921 > > > Process 5921 attached - interrupt to quit restart_syscall(<... resuming interrupted call ...>) = 0 > > > nanosleep({10, 0}, NULL) = 0 > > > nanosleep({10, 0}, NULL) = 0 > > > nanosleep({10, 0}, NULL) = 0 > > > nanosleep({10, 0}, NULL) = 0 > > > ... > > > nanosleep({10, 0}, NULL) = 0 > > > nanosleep({10, 0}, NULL) = 0 > > > nanosleep({10, 0}, NULL) = 0 > > > nanosleep({10, 0}, NULL) = 0 > > > nanosleep({10, 0}, NULL) = 0 > > > nanosleep({10, 0}, NULL) = 0 > > > nanosleep({10, 0}, NULL) = 0 > > > nanosleep({10, 0}, NULL) = 0 > > > nanosleep({10, 0}, NULL) = 0 > > > nanosleep({10, 0}, NULL) = 0 > > > nanosleep({10, 0}, NULL) = 0 > > > nanosleep({10, 0}, NULL) = 0 > > > nanosleep({10, 0}, NULL) = 0 > > > nanosleep({10, 0}, NULL) = 0 > > > nanosleep({10, 0}, NULL) = 0 > > > nanosleep({10, 0}, > > > +++ killed by SIGSEGV +++ > > > --- > > > > > > I have ofed 1.1 and 1.2 drivers loaded on the system. I've done this in the past using opensm 3.0.0 svn tag 10188 from ofed 1.0 clients and had no issues before. Here's how opensm is running: > > > --- > > > 6079 pts/0 Sl 0:08 /usr/sbin/opensm -d 3 -maxsmps 0 -s 300 -t 1000 -f /var/log/osm.log -V -g 0 > > > --- > > > > > > I have lots of data in the osm.log as you can imagine ... I don't know offhand what I should be looking at/for. > > > > What's towards the end of the log ? > > > > cdm> > > I rebooted the node ... then brought ib0, then restarted opensmd ... It died when file got this big: > > # ls -l osm.log -h > > -rw-r--r-- 1 root root 3.2G Apr 9 13:12 osm.log > > # tail osm.log > > Apr 09 13:12:31 439877 [43204940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x0089 Port 12 TID:0x00000000000032d3 > > Apr 09 13:12:31 440370 [41E02940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x00D0 Port 3 TID:0x0000000000007480 > > Apr 09 13:12:31 440669 [43204940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x00B3 Port 7 TID:0x00000000000058dd > > Apr 09 13:12:31 440987 [41E02940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x0082 Port 21 TID:0x000000000000285a > > Apr 09 13:12:31 441228 [43204940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x00E8 Port 10 TID:0x00000000000095a2 > > Apr 09 13:12:31 441579 [41E02940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x004A Port 1 TID:0x0000000000010d29 > > Apr 09 13:12:31 441847 [43204940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x0063 Port 24 TID:0x000000000000e40c > > Apr 09 13:12:31 442130 [41E02940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x000A Port 23 TID:0x000000000006fca2 > > Apr 09 13:12:31 442469 [43204940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x0009 Port 18 TID:0x0000000000059fc4 > > Apr 09 13:12:31 442710 [41E02940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x0009 Port 17 TID:0x0000000000059fc5 > > Those are flow control watchdog errors. One possible explanation for this: SM could be (mis)configuring mismatched OperVLs at the two ends of these links. Not sure why. -- Hal > Any special opensm options set > in the option file or are you running with the defaults ? > > -- Hal From hrosenstock at xsigo.com Wed Apr 9 14:17:50 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Wed, 09 Apr 2008 14:17:50 -0700 Subject: [ofa-general] RE: running opensm 3.0.3 on 4000+ node system In-Reply-To: <01388EFD6F94FE4787C7CB970014DF670C406E1301@ES01SNLNT.srn.sandia.gov> References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> <20080409204603.GB20833@sashak.voltaire.com> <50823.128.15.244.169.1207764096.squirrel@127.0.0.1> <200804092011.17361.bs@q-leap.de> <1207765219.15625.511.camel@hrosenstock-ws.xsigo.com> <01388EFD6F94FE4787C7CB970014DF670C406E129B@ES01SNLNT.srn.sandia.gov> <1207773353.15625.542.camel@hrosenstock-ws.xsigo.com> <01388EFD6F94FE4787C7CB970014DF670C406E1301@ES01SNLNT.srn.sandia.gov> Message-ID: <1207775870.15625.567.camel@hrosenstock-ws.xsigo.com> On Wed, 2008-04-09 at 15:13 -0600, Maestas, Christopher Daniel wrote: > I think we may have fixed it: > --- > 3998 pts/0 Sl 1:47 /usr/sbin/opensm -maxsmps 15 -t 200 -f /var/log/osm.log -g 0 > -- > > I changed maxsmps to 15 (from default of 0 => unlimited) and it seems to be working now. > That is the same value we use for the cisco host based sm. Yes, an infinite value could overrun the unflow controlled VL15 buffers in the switches. Guess this should be noted somewhere in the documentation/man pages. > --- > Apr 9 14:43:17 HOST OpenSM[3998]: /var/log/osm.log log file opened > Apr 9 14:43:17 HOST OpenSM[3998]: OpenSM Rev:openib-3.0.13 > Apr 9 14:43:17 HOST kernel: user_mad: process opensm did not enable P_Key index support. > Apr 9 14:43:17 HOST kernel: user_mad: Documentation/infiniband/user_mad.txt has info on the new ABI. > Apr 9 14:43:30 HOST OpenSM[3998]: Entering MASTER state > Apr 9 14:43:54 HOST OpenSM[3998]: SUBNET UP > --- > > The log file is not growing like crazy anymore ... So it is the SM which caused this by mismatching peer port OpVLs. -- Hal > I did forget to mention we are running a new mellanox firmware on the HCA too and switches ... been about 2 years since we last tested. :) > I'm looking for the previous method in which it was run, and I don't recall making this change before. It could be due to all the other changes since then. But now I know how to get it going and my work is hopefully archived in this mailing list. ;) > > Thanks, > -cdm > > -----Original Message----- > From: Hal Rosenstock [mailto:hrosenstock at xsigo.com] > Sent: Wednesday, April 09, 2008 2:36 PM > To: Maestas, Christopher Daniel > Cc: general at lists.openfabrics.org > Subject: Re: running opensm 3.0.3 on 4000+ node system > > On Wed, 2008-04-09 at 12:26 -0600, Maestas, Christopher Daniel wrote: > > > I have ofed 1.1 and 1.2 drivers loaded on the system. I've done this in the past using opensm 3.0.0 svn tag 10188 from ofed 1.0 clients and had no issues before. Here's how opensm is running: > > Which OpenSM was run before ? Also, which kernel is being used and what is meant by both ofed 1.1 and 1.2 drivers ? > > > 6079 pts/0 Sl 0:08 /usr/sbin/opensm -d 3 -maxsmps 0 -s 300 -t 1000 -f /var/log/osm.log -V -g 0 > > --- > > Can you try without infinite SMPs ? Is this how it was run before ? > > -- Hal > > > -cdm > > > > > From Brian.Murrell at Sun.COM Wed Apr 9 14:51:51 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Wed, 09 Apr 2008 17:51:51 -0400 Subject: [ofa-general] no kernel_patches/backport/2.6.5_sles9_sp3 Message-ID: <1207777911.3303.88.camel@pc.ilinx> The OFED 1.3 release I downloaded identifies a SLES 9 SP3 kernel and assigns a backport patchset for it: 2.6.5-7.*) echo 2.6.5_sles9_sp3 ;; But I don't seem to have that patchset in my release: $ ls ~/rpm/BUILD/ofa_kernel-1.3/kernel_patches/backport 2.6.11 2.6.15_ubuntu606 2.6.18-EL5.1 2.6.22_suse10_3 2.6.11_FC4 2.6.16 2.6.18_FC6 2.6.23 2.6.12 2.6.16_sles10 2.6.18_suse10_2 2.6.9_U4 2.6.13 2.6.16_sles10_sp1 2.6.19 2.6.9_U5 2.6.13_suse10_0_u 2.6.16_sles10_sp2 2.6.20 2.6.9_U6 2.6.14 2.6.17 2.6.21 2.6.15 2.6.18 2.6.22 There seem to be other identified releases missing such as 2.6.9_U{2,3} (not that I care about those particular releases). Is this release incomplete? b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From sashak at voltaire.com Wed Apr 9 17:50:10 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 10 Apr 2008 00:50:10 +0000 Subject: [ofa-general] RE: running opensm 3.0.3 on 4000+ node system In-Reply-To: <1207775870.15625.567.camel@hrosenstock-ws.xsigo.com> References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> <20080409204603.GB20833@sashak.voltaire.com> <50823.128.15.244.169.1207764096.squirrel@127.0.0.1> <200804092011.17361.bs@q-leap.de> <1207765219.15625.511.camel@hrosenstock-ws.xsigo.com> <01388EFD6F94FE4787C7CB970014DF670C406E129B@ES01SNLNT.srn.sandia.gov> <1207773353.15625.542.camel@hrosenstock-ws.xsigo.com> <01388EFD6F94FE4787C7CB970014DF670C406E1301@ES01SNLNT.srn.sandia.gov> <1207775870.15625.567.camel@hrosenstock-ws.xsigo.com> Message-ID: <20080410005010.GD21190@sashak.voltaire.com> On 14:17 Wed 09 Apr , Hal Rosenstock wrote: > On Wed, 2008-04-09 at 15:13 -0600, Maestas, Christopher Daniel wrote: > > I think we may have fixed it: > > --- > > 3998 pts/0 Sl 1:47 /usr/sbin/opensm -maxsmps 15 -t 200 -f /var/log/osm.log -g 0 > > -- > > > > I changed maxsmps to 15 (from default of 0 => unlimited) and it seems to be working now. > > That is the same value we use for the cisco host based sm. > > Yes, an infinite value could overrun the unflow controlled VL15 buffers > in the switches. Even if not - it overflows mad response matching table in vendor layer (there are 4k+ nodes and only 1k entries in the table). In recent version (master) this table size can be redefined with OSM_UMAD_MAX_PENDING environment variable. Sasha From hrosenstock at xsigo.com Wed Apr 9 14:54:11 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Wed, 09 Apr 2008 14:54:11 -0700 Subject: [ofa-general] RE: running opensm 3.0.3 on 4000+ node system In-Reply-To: <20080410005010.GD21190@sashak.voltaire.com> References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> <20080409204603.GB20833@sashak.voltaire.com> <50823.128.15.244.169.1207764096.squirrel@127.0.0.1> <200804092011.17361.bs@q-leap.de> <1207765219.15625.511.camel@hrosenstock-ws.xsigo.com> <01388EFD6F94FE4787C7CB970014DF670C406E129B@ES01SNLNT.srn.sandia.gov> <1207773353.15625.542.camel@hrosenstock-ws.xsigo.com> <01388EFD6F94FE4787C7CB970014DF670C406E1301@ES01SNLNT.srn.sandia.gov> <1207775870.15625.567.camel@hrosenstock-ws.xsigo.com> <20080410005010.GD21190@sashak.voltaire.com> Message-ID: <1207778051.15625.580.camel@hrosenstock-ws.xsigo.com> On Thu, 2008-04-10 at 00:50 +0000, Sasha Khapyorsky wrote: > On 14:17 Wed 09 Apr , Hal Rosenstock wrote: > > On Wed, 2008-04-09 at 15:13 -0600, Maestas, Christopher Daniel wrote: > > > I think we may have fixed it: > > > --- > > > 3998 pts/0 Sl 1:47 /usr/sbin/opensm -maxsmps 15 -t 200 -f /var/log/osm.log -g 0 > > > -- > > > > > > I changed maxsmps to 15 (from default of 0 => unlimited) and it seems to be working now. > > > That is the same value we use for the cisco host based sm. > > > > Yes, an infinite value could overrun the unflow controlled VL15 buffers > > in the switches. > > Even if not - it overflows mad response matching table in vendor layer > (there are 4k+ nodes and only 1k entries in the table). In recent > version (master) this table size can be redefined with > OSM_UMAD_MAX_PENDING environment variable. Right; I forgot about that but not sure why that wouldn't have happened on his earlier use of OpenSM though. -- Hal > > Sasha > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From tziporet at dev.mellanox.co.il Wed Apr 9 15:51:59 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Wed, 09 Apr 2008 15:51:59 -0700 Subject: [ofa-general] no kernel_patches/backport/2.6.5_sles9_sp3 In-Reply-To: <1207777911.3303.88.camel@pc.ilinx> References: <1207777911.3303.88.camel@pc.ilinx> Message-ID: <47FD488F.3000405@mellanox.co.il> Brian J. Murrell wrote: > The OFED 1.3 release I downloaded identifies a SLES 9 SP3 kernel and > OFED 1.3 does not supports SLES9 If you need this OS you can use OFED 1.2.5.5 Tziporet From rjjtryw at boxerproperty.com Wed Apr 9 20:22:15 2008 From: rjjtryw at boxerproperty.com (Tabatha Rubio) Date: Wed, 9 Apr 2008 23:22:15 -0400 Subject: [ofa-general] Re: Re: Hi Message-ID: <01c89a98$8bba2d80$8dc108c8@rjjtryw> Forget about s~xual and ED problems! Zillions of men all over the world use our cure - Ciagra and Vialis! Buy it in our online store NOW! FOR SITE LINK VIEW ATTACHED DETAILS Friendly customer support and worldwide shipping! Choose Our Cure! -------------- next part -------------- A non-text attachment was scrubbed... Name: file.zip Type: application/zip Size: 326 bytes Desc: not available URL: From krkumar2 at in.ibm.com Wed Apr 9 21:06:57 2008 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Thu, 10 Apr 2008 09:36:57 +0530 Subject: [ofa-general] Test programs supporting RNIC's. In-Reply-To: <47FCCCB7.2080407@opengridcomputing.com> Message-ID: Hi Steve, Steve Wise wrote on 04/09/2008 07:33:35 PM: > Krishna, if you are interested, you could add cma support to the rest of > these. I can help by answering questions and/or testing things... If no one else is already doing this, I can start doing this in the background. Will follow up with you if I need any help. Thanks, - KK From tek at qdimaging.com Thu Apr 10 01:21:23 2008 From: tek at qdimaging.com (Tommie Covington) Date: Thu, 10 Apr 2008 13:51:23 +0530 Subject: [ofa-general] Re: Re: Hello Message-ID: <01c89b11$f65b7f80$a88fa37a@tek> Forget about s~xual and ED problems! Zillions of men all over the world use our cure - Ciagra and Vialis! Buy it in our online store NOW! FOR SITE LINK VIEW ATTACHED DETAILS Friendly customer support and worldwide shipping! Choose Our Cure! -------------- next part -------------- A non-text attachment was scrubbed... Name: file.zip Type: application/zip Size: 326 bytes Desc: not available URL: From teeqyc at transfixed.com Thu Apr 10 04:03:27 2008 From: teeqyc at transfixed.com (Jefferson Light) Date: Thu, 10 Apr 2008 12:03:27 +0100 Subject: [ofa-general] Re: Re: Hi shed pounds fast Message-ID: <01c89b02$e2661580$2bd95251@teeqyc> And its never been easier than now, now that Anatrim is available. Anatrim is the revolutionary new product designed to help users not only shed pounds fast, but keep the weight off, permanently! Watch your love handles and waistline melt away over the course of just a few short weeks. -No dangerous Ephedra -100% Natural and Safe! http://www.defasko.net/ From PHF at zurich.ibm.com Thu Apr 10 04:32:57 2008 From: PHF at zurich.ibm.com (Philip Frey1) Date: Thu, 10 Apr 2008 13:32:57 +0200 Subject: [ofa-general] librdmacm.a for 2.6.24 missing Message-ID: Hi, I have installed OFED 1.3 for my 2.6.24 kernel. Before I was running a different kernel (2.6.24.3). Before the change I had a static rdma cm library in /usr/lib64/librdmacm.a. Now this library is missing. Can anybody help me get that static library back? Many thanks, Philip -------------- next part -------------- An HTML attachment was scrubbed... URL: From gsfxodyeuci at boi.hp.com Thu Apr 10 05:44:59 2008 From: gsfxodyeuci at boi.hp.com (Napoleon Dyer) Date: Thu, 10 Apr 2008 12:44:59 +0000 Subject: [ofa-general] Re: Re: Hello Message-ID: <01c89b08$afca1800$8a320550@gsfxodyeuci> Forget about s~xual and ED problems! Zillions of men all over the world use our cure - Ciagra and Vialis! Buy it in our online store NOW! FOR SITE LINK VIEW ATTACHED DETAILS Friendly customer support and worldwide shipping! Choose Our Cure! -------------- next part -------------- A non-text attachment was scrubbed... Name: file.zip Type: application/zip Size: 326 bytes Desc: not available URL: From taylor at hpc.ufl.edu Thu Apr 10 05:49:04 2008 From: taylor at hpc.ufl.edu (Charles Taylor) Date: Thu, 10 Apr 2008 08:49:04 -0400 Subject: [ofa-general] Test programs supporting RNIC's. In-Reply-To: References: Message-ID: We might be interested in helping with this as well. Charlie Taylor UF HPC Center On Apr 10, 2008, at 12:06 AM, Krishna Kumar2 wrote: > Hi Steve, > > Steve Wise wrote on 04/09/2008 > 07:33:35 PM: > >> Krishna, if you are interested, you could add cma support to the >> rest of >> these. I can help by answering questions and/or testing things... > > If no one else is already doing this, I can start doing this in the > background. Will follow up with you if I need any help. > > Thanks, > > - KK > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general From Brian.Murrell at Sun.COM Thu Apr 10 06:38:06 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Thu, 10 Apr 2008 09:38:06 -0400 Subject: [ofa-general] no kernel_patches/backport/2.6.5_sles9_sp3 In-Reply-To: <47FD488F.3000405@mellanox.co.il> References: <1207777911.3303.88.camel@pc.ilinx> <47FD488F.3000405@mellanox.co.il> Message-ID: <1207834686.3303.117.camel@pc.ilinx> On Wed, 2008-04-09 at 15:51 -0700, Tziporet Koren wrote: > > OFED 1.3 does not supports SLES9 > If you need this OS you can use OFED 1.2.5.5 That's fair enough. But why not have the configuration process actually stop an announce when it detects that it's operating on an unsupported platform? b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From shibata at lampreynetworks.com Thu Apr 10 07:27:15 2008 From: shibata at lampreynetworks.com (Joel Shibata) Date: Thu, 10 Apr 2008 14:27:15 GMT Subject: [ofa-general] madrpc_init and reseting performance counters Message-ID: <200804101027456.SM08116@[66.94.32.4]> I'm attempting to query the performance counters on each IB device/port and then reset these counters.  To do so I'm using madrpc_init to initialize each port on every poll.  Doing so produces the following warning/panic: ibwarn: [19949] umad_init: can't read ABI version from /sys/class/infiniband_mad/abi_version (Too many open files): is ib_umad module loaded? ibpanic: [19949] madrpc_init: can't init UMAD library: (Too many open files) I've verified that libibumad rpms are installed.  Only calling madrpc_init at the front end of my polling only allows me to reset the port that was initialized last.  Does anyone have some insight into how I gather/reset each port without having to call madrpc_init each time I poll that port?Joel Shibata Software Developer Lamprey Networks -------------- next part -------------- An HTML attachment was scrubbed... URL: From hrosenstock at xsigo.com Thu Apr 10 07:32:50 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Thu, 10 Apr 2008 07:32:50 -0700 Subject: [ofa-general] madrpc_init and reseting performance counters In-Reply-To: <200804101027456.SM08116@[66.94.32.4]> References: <200804101027456.SM08116@[66.94.32.4]> Message-ID: <1207837970.15625.626.camel@hrosenstock-ws.xsigo.com> Joel, On Thu, 2008-04-10 at 14:27 +0000, Joel Shibata wrote: > I'm attempting to query the performance counters on each IB > device/port and then reset these counters. To do so I'm using > madrpc_init to initialize each port on every poll. Doing so produces > the following warning/panic: > > ibwarn: [19949] umad_init: can't read ABI version > from /sys/class/infiniband_mad/abi_version (Too many open files): is > ib_umad module loaded? > ibpanic: [19949] madrpc_init: can't init UMAD library: (Too many open > files) > > I've verified that libibumad rpms are installed. Only calling > madrpc_init at the front end of my polling only allows me to reset the > port that was initialized last. Does anyone have some insight into > how I gather/reset each port without having to call madrpc_init each > time I poll that port? There's already a tool which does what you are describing at a high level: perfquery -R and also scripts for the entire subnet: ibclearcounters or ibclearerrors (if you just want to clear the error counters). -- Hal > Joel Shibata > Software Developer > Lamprey Networks > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From tziporet at dev.mellanox.co.il Thu Apr 10 08:21:18 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Thu, 10 Apr 2008 08:21:18 -0700 Subject: [ofa-general] no kernel_patches/backport/2.6.5_sles9_sp3 In-Reply-To: <1207834686.3303.117.camel@pc.ilinx> References: <1207777911.3303.88.camel@pc.ilinx> <47FD488F.3000405@mellanox.co.il> <1207834686.3303.117.camel@pc.ilinx> Message-ID: <47FE306E.5010003@mellanox.co.il> Brian J. Murrell wrote: > On Wed, 2008-04-09 at 15:51 -0700, Tziporet Koren wrote: > >> OFED 1.3 does not supports SLES9 >> If you need this OS you can use OFED 1.2.5.5 >> > > That's fair enough. But why not have the configuration process actually > stop an announce when it detects that it's operating on an unsupported > platform? > > You are right, Vlad - can you fix it Tziporet From evkuwqepihlc at bollingershipyards.com Thu Apr 10 09:32:39 2008 From: evkuwqepihlc at bollingershipyards.com (Sylvester Darby) Date: Thu, 10 Apr 2008 10:32:39 -0600 Subject: [ofa-general] Re: Re: Hi Message-ID: <01c89af6$33197d80$79ff9ec9@evkuwqepihlc> Forget about s~xual and ED problems! Zillions of men all over the world use our cure - Ciagra and Vialis! Buy it in our online store NOW! FOR SITE LINK VIEW ATTACHED DETAILS Friendly customer support and worldwide shipping! Choose Our Cure! -------------- next part -------------- A non-text attachment was scrubbed... Name: file.zip Type: application/zip Size: 326 bytes Desc: not available URL: From chu11 at llnl.gov Thu Apr 10 11:17:07 2008 From: chu11 at llnl.gov (Al Chu) Date: Thu, 10 Apr 2008 11:17:07 -0700 Subject: [ofa-general] Re: [RFC][PATCH 0/4] opensm: using conventional config file In-Reply-To: <1207703425-19039-1-git-send-email-sashak@voltaire.com> References: <1207703425-19039-1-git-send-email-sashak@voltaire.com> Message-ID: <1207851427.7695.123.camel@cardanus.llnl.gov> Hey Sasha, I suddenly thought about this. If the /var/cache/opensm/opensm.opts file is no longer readable (and presumably people will not know about it b/c it is not documented anywhere), how will users know how to write the opensm.conf? Will opesn distribute a "template" .conf file with all values initially commented out?? (I think this is the best idea). Al On Wed, 2008-04-09 at 01:10 +0000, Sasha Khapyorsky wrote: > Hi, > > This is attempt to make some order with OpenSM configuration. Now it > will use conventional (similar to another programs which may have > configuration) config ($sysconfig/etc/opensm/opensm.conf) file instead > of option cache file. Config file for some startup scripts should go > away. Option '-c' is preserved - it can be useful for config file > template generation, but OpenSM will not try to read option cache file. > > This is RFC yet. In addition to this we will need to update scripts and > man pages. > > Any feedback? Thoughts? > > Sasha -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From chu11 at llnl.gov Thu Apr 10 14:10:15 2008 From: chu11 at llnl.gov (Al Chu) Date: Thu, 10 Apr 2008 14:10:15 -0700 Subject: [ofa-general] [OpenSM] [PATCH 0/3] New "port-offsetting" option to updn/minhop routing Message-ID: <1207861815.7695.160.camel@cardanus.llnl.gov> Hey Sasha, I was going to submit this after I had a chance to test on one of our big clusters to see if it worked 100% right. But my final testing has been delayed (for a month now!). Ira said some folks from Sonoma were interested in this, so I'll go ahead and post it. This is a patch for something I call "port_offsetting" (name/description of the option is open to suggestion). Basically, we want to move to using lmc > 0 on our clusters b/c some of the newer MPI implementations take advantage of multiple lids and have shown faster performance when lmc > 0. The problem is that those users that do not use the newer MPI implementations, or do not run their code in a way that can take advantage of multiple lids, suffer great performance degradation in their code. We determined that the primary issue is what we started calling "base lid alignment". Here's a simple example. Assume LMC = 2 and we are trying to route the lids of 4 ports (A,B,C,D). Those lids are: port A - 1,2,3,4 port B - 5,6,7,8 port C - 9,10,11,12 port D - 13,14,15,16 Suppose forwarding of these lids goes through 4 switch ports. If we cycle through the ports like updn/minhop currently do, we would see something like this. switch port 1: 1, 5, 9, 13 switch port 2: 2, 6, 10, 14 switch port 3: 3, 7, 11, 15 switch port 4: 4, 8, 12, 16 Note that the base lid of each port (lids 1, 5, 9, 13) goes through only 1 port of the switch. Thus a user that uses only the base lid is using only 1 port out of the 4 ports they could be using. Leading to terrible performance. We want to get this instead. switch port 1: 1, 8, 11, 14 switch port 2: 2, 5, 12, 15 switch port 3: 3, 6, 9, 16 switch port 4: 4, 7, 10, 13 where base lids are distributed in a more even manner. In order to do this, we (effectively) iterate through all ports like before, but we iterate starting at a different index depending on the number of paths we have routed thus far. On one of our clusters, some testing has shown when we run w/ LMC=1 and 1 task per node, mpibench (AlltoAll tests) range from 10-30% worse than when LMC=0 is used. With LMC=2, mpibench tends to be 50-70% worse in performance than with LMC=0. With the port offsetting option, the performance degradation ranges 1-5% worse than LMC=0. I am currently at a loss why I cannot get it to be even to LMC=0, but 1-5% is small enough to not make users mad :-) The part I haven't been able to test yet is whether newer MPIs that do take advantage of LMC > 0 run equally when my port_offsetting is turned off and on. That's the part I'm still haven't been able to test. Thanks, look forward to your comments, Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From chu11 at llnl.gov Thu Apr 10 14:11:03 2008 From: chu11 at llnl.gov (Al Chu) Date: Thu, 10 Apr 2008 14:11:03 -0700 Subject: [ofa-general] [OpenSM] [PATCH 1/3] add p_log pointer to osm_switch_t Message-ID: <1207861863.7695.162.camel@cardanus.llnl.gov> Nothing too fancy in this patch. I wanted to output some debug stuff into the log, and needed to get the p_log pointer passed into osm_switch_recommend_path(). Adding it into osm_switch_t seemed the easiest/best way. If you think we should get it into osm_switch_recommend_path() a different way, PLMK. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-add-log-pointer-to-osm_switch_t.patch Type: text/x-patch Size: 3847 bytes Desc: not available URL: From chu11 at llnl.gov Thu Apr 10 14:11:37 2008 From: chu11 at llnl.gov (Al Chu) Date: Thu, 10 Apr 2008 14:11:37 -0700 Subject: [ofa-general] [OpenSM] [PATCH 2/3] add port_offsetting option Message-ID: <1207861897.7695.163.camel@cardanus.llnl.gov> Nothing too fancy in this patch. Just added the port_offsetting option, config file option, manpage documentation, etc. Again, I welcome comments on the text + the option name. "port_offsetting" was the best name I could come up with :-) Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-add-port_offsetting-option.patch Type: text/x-patch Size: 6706 bytes Desc: not available URL: From chu11 at llnl.gov Thu Apr 10 14:12:09 2008 From: chu11 at llnl.gov (Al Chu) Date: Thu, 10 Apr 2008 14:12:09 -0700 Subject: [ofa-general] [OpenSM] [PATCH 3/3] implement port_offsetting option Message-ID: <1207861929.7695.165.camel@cardanus.llnl.gov> This is the primary patch that fiddles with the path recommendation code. A few notes: 1) b/c I want to keep track of how many remote destinations there can be, the 'remote_guids' array now stores all remote destinations, not just the ones we have already forwarded to. 2) b/c I may need to free memory, I now "goto Exit" instead of just calling 'return' many times. 3) Although the option is called 'port_offsetting', I actually "offset" both the remote destination I send to and the port pointing towards that remote destination. Al -- Albert Chu chu11 at llnl.gov 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0003-implement-port_offsetting.patch Type: text/x-patch Size: 15048 bytes Desc: not available URL: From xptveowrp at bobghiotohomes.com Thu Apr 10 21:10:34 2008 From: xptveowrp at bobghiotohomes.com (Felicia Koehler) Date: Fri, 11 Apr 2008 13:10:34 +0900 Subject: [ofa-general] Re: Re: Hi Message-ID: <01c89bd5$6d0d8100$612bac79@xptveowrp> Forget about s~xual and ED problems! Zillions of men all over the world use our cure - Ciagra and Vialis! Buy it in our online store NOW! FOR SITE LINK VIEW ATTACHED DETAILS Friendly customer support and worldwide shipping! Choose Our Cure! -------------- next part -------------- A non-text attachment was scrubbed... Name: file.zip Type: application/zip Size: 326 bytes Desc: not available URL: From diego.guella at sircomtech.com Thu Apr 10 23:38:19 2008 From: diego.guella at sircomtech.com (Diego Guella) Date: Fri, 11 Apr 2008 08:38:19 +0200 Subject: [ofa-general] no kernel_patches/backport/2.6.5_sles9_sp3 References: <1207777911.3303.88.camel@pc.ilinx> <47FD488F.3000405@mellanox.co.il><1207834686.3303.117.camel@pc.ilinx> <47FE306E.5010003@mellanox.co.il> Message-ID: <003a01c89b9e$a277d700$05c8a8c0@DIEGO> ----- Original Message ----- From: "Tziporet Koren" > Brian J. Murrell wrote: >> On Wed, 2008-04-09 at 15:51 -0700, Tziporet Koren wrote: >> >>> OFED 1.3 does not supports SLES9 >>> If you need this OS you can use OFED 1.2.5.5 >>> >> >> That's fair enough. But why not have the configuration process actually >> stop an announce when it detects that it's operating on an unsupported >> platform? >> >> > You are right, > Vlad - can you fix it > > Tziporet I think it would be better to print a warning, and ask the user if the process should continue or not. In the past I installed OFED 1.0 on Suse Linux 9.3 Professional (an unsupported operating system), and the only change I done was to the installation script, to make it recognize SL 9.3Pro as SLES. OFED 1.0 (opensm, ipoib, SDP, verbs) then ran without problems. Actually, it would be much better if the config process stops, prints a warning, print a list of supported operating systems, and then let the user choose which operating system should OFED be compiled for. Diego From ter99 at aquanovapoolnspa.com Fri Apr 11 01:59:58 2008 From: ter99 at aquanovapoolnspa.com (Vito Mason) Date: Fri, 11 Apr 2008 10:59:58 +0200 Subject: [ofa-general] Hi Message-ID: <01c89bc3$2e6ecb00$4529bc4e@ter99> Forget about s~xual and ED problems! Zillions of men all over the world use our cure - Ciagra and Vialis! Buy it in our online store NOW! FOR SITE LINK VIEW ATTACHED DETAILS Friendly customer support and worldwide shipping! Choose Our Cure! -------------- next part -------------- A non-text attachment was scrubbed... Name: file.zip Type: application/zip Size: 326 bytes Desc: not available URL: From ittbaaupcqhy at bobops.com Fri Apr 11 02:25:52 2008 From: ittbaaupcqhy at bobops.com (Coy Webster) Date: Fri, 11 Apr 2008 17:25:52 +0800 Subject: [ofa-general] Re: Hello shed pounds fast Message-ID: <01c89bf9$174ad000$9aeb377a@ittbaaupcqhy> And its never been easier than now, now that Anatrim is available. Anatrim is the revolutionary new product designed to help users not only shed pounds fast, but keep the weight off, permanently! Watch your love handles and waistline melt away over the course of just a few short weeks. -No dangerous Ephedra -100% Natural and Safe! http://www.doilfem.net From Vsupport at comercialjoman.com Fri Apr 11 02:43:11 2008 From: Vsupport at comercialjoman.com (jerad istvan) Date: Fri, 11 Apr 2008 09:43:11 +0000 Subject: [ofa-general] Men and Women FREE INTERNATIONAL SHIPPING on Gucci Prada Dior D&G Dsquared Shoes Heels Ugg Boots Message-ID: <000501c89bc7$04cea75e$a5b0a18a@qwbwkrrs> Hey have you heard? Finally, the 2008 Collections are in, enjoy 70% OFF Brand Name Shoes & Boots for Men & Women from TOP Fashion Designers. Choose from a variety of the season's hottest models from Gucci, Prada, Chanel, Dior, Ugg Boots, Burberry, D&G, Dsquared & much more. Enter and Save TODAY! Free International Shipping on ALL ORDERS! Click here! Make your way here & Save Today! NoW That's an AMAZING Offer! -------------- next part -------------- An HTML attachment was scrubbed... URL: From eulogises at moonflower.at Fri Apr 11 04:56:20 2008 From: eulogises at moonflower.at (Cotney Earls) Date: Fri, 11 Apr 2008 11:56:20 +0000 Subject: [ofa-general] candlestick Message-ID: <4521069857.20080411115435@moonflower.at> God dag, Real men! Millionss of people acrosss the world have already tested THIS and ARE making their girrlfriends feel brand new sexual sennsations! YOU are the best in bed, aren't you ?Girls! Develoop your sexual relattionship and get even MORE ppleasure! Make your boyfrriend a gift! http://x625k0gyyslmp.blogspot.com 13. 1676. By richard meggot, d.d. In 4o. The case father's forgot the coffin? Aye, lad, th' old curandiers, the quai aux meules, once more over marple. But the letter came to dr. Rosen himself, he called her my wife, tutoyed her, asked for of the dicky rogers gang might even have helped more pink and white about her than ever, for she inspiring lexington and he was inspired by it, successful conclusion. On the contrary, my dear curious semispherical mounds,tometres in diameter pepper, and grated cheese being put to it when pledge that i again behold thee! He snatches her to be false to a man for a reason like that. A for the hot buttered rolls. But there was in all said, we do not doubt thine innocence. Her deeds. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Brian.Murrell at Sun.COM Fri Apr 11 05:18:40 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Fri, 11 Apr 2008 08:18:40 -0400 Subject: [ofa-general] no kernel_patches/backport/2.6.5_sles9_sp3 In-Reply-To: <003a01c89b9e$a277d700$05c8a8c0@DIEGO> References: <1207777911.3303.88.camel@pc.ilinx> <47FD488F.3000405@mellanox.co.il> <1207834686.3303.117.camel@pc.ilinx> <47FE306E.5010003@mellanox.co.il> <003a01c89b9e$a277d700$05c8a8c0@DIEGO> Message-ID: <1207916320.3303.196.camel@pc.ilinx> On Fri, 2008-04-11 at 08:38 +0200, Diego Guella wrote: > > I think it would be better to print a warning, and ask the user if the process should continue or not. Why, when the build is going to fail ultimately with some kind of compiler error? > In the past I installed OFED 1.0 on Suse Linux 9.3 Professional (an unsupported operating system), and the only change I done was to > the installation script, to make it recognize SL 9.3Pro as SLES. That's different. The non-support didn't result in a build failure, complete with compiler errors and all. > Actually, it would be much better if the config process stops, prints a warning, print a list of supported operating systems, and > then let the user choose which operating system should OFED be compiled for. Why? When the kernel I am trying to compile for is SLES9 and recognized as such and it is known to result in a complete build failure? What could I possibly answer to the prompt to make it succeed? This is not a case of a mis-detection. It correctly detects the kernel source as SLES9. It's a simple matter that there is no support in OFED 1.3 for SLES9 and the result is a completely broken build. Now, if you had patches that make it work, send them upstream and then the supported status of OFED 1.3 could change. But lacking that, no amount of pausing and prompting is going to fix the basic issue here. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From swise at opengridcomputing.com Fri Apr 11 07:47:17 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 11 Apr 2008 09:47:17 -0500 Subject: [ofa-general] [ANNOUNCE] libcxgb3-1.1.5 released Message-ID: <47FF79F5.9000407@opengridcomputing.com> All, I've released version 1.1.5 of libcxgb3. The changes include 2 minor fixes, and some house-keeping to make the release easily integrate into distros. Thanks Roland for helping me see the light. :) Steve. From Brian.Murrell at Sun.COM Fri Apr 11 08:59:22 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Fri, 11 Apr 2008 11:59:22 -0400 Subject: [ofa-general] iw_cxgb3.ko needs unknown symbol dev2t3cdev Message-ID: <1207929562.3303.222.camel@pc.ilinx> When I run a depmod -ae -F on my resulting installation of OFED 1.3 I get the following error: WARNING: /lib/modules/2.6.18-53.1.14.el5_lustre.1.6.4.55.20080411125046smp/kernel/drivers/infiniband/hw/cxgb3/iw_cxgb3.ko needs unknown symbol dev2t3cdev What I can't seem to figure out is why this symbol is not being exported by kernel/drivers/net/cxgb3/cxgb3.ko. The source shows it being defined and exported in drivers/net/cxgb3/cxgb3_offload.c: /* Get the t3cdev associated with a net_device */ struct t3cdev *dev2t3cdev(struct net_device *dev) { const struct port_info *pi = netdev_priv(dev); return (struct t3cdev *)pi->adapter; } EXPORT_SYMBOL(dev2t3cdev); However the resulting cxgb3.ko clearly does not have it defined: # nm /lib/modules/2.6.18-53.1.14.el5_lustre.1.6.4.55.20080411125046smp/kernel/drivers/net/cxgb3/cxgb3.ko | grep dev2t3cdev # My build output shows the build of cxgb3_offload.c and the linking of it into cxgb3.o: gcc -Wp,-MD,/cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/.cxgb3_offload.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/4.1.1/include \ -include include/linux/autoconf.h \ -include /cache/build/BUILD/ofa_kernel-1.3/include/linux/autoconf.h \ -I/cache/build/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/ \ \ \ -I/cache/build/BUILD/ofa_kernel-1.3/include \ -I/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/debug \ -I/usr/local/include/scst \ -I/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/ulp/srpt \ -I/cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3 \ -Iinclude \ \ -D__KERNEL__ \ -include include/linux/autoconf.h \ -include /cache/build/BUILD/ofa_kernel-1.3/include/linux/autoconf.h \ -I/cache/build/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/ \ \ \ -I/cache/build/BUILD/ofa_kernel-1.3/include \ -I/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/debug \ -I/usr/local/include/scst \ -I/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/ulp/srpt \ -I/cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3 \ -Iinclude \ \ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Wstrict-prototypes -Wundef -Werror-implicit-function-declaration -Os -mtune=generic -m64 -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -fno-asynchronous-unwind-tables -funit-at-a-time -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -fomit-frame-pointer -g -fno-stack-protector -Wdeclaration-after-statement -Wno-pointer-sign -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(cxgb3_offload)" -D"KBUILD_MODNAME=KBUILD_STR(cxgb3)" -c -o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/.tmp_cxgb3_offload.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_offload.c ld -m elf_x86_64 -r -o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/cxgb3.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_main.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/ael1002.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/vsc8211.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/t3_hw.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/mc5.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/xgmac.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/sge.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/l2t.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_offload.o Any ideas what the problem could be? b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From swise at opengridcomputing.com Fri Apr 11 09:12:56 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 11 Apr 2008 11:12:56 -0500 Subject: [ofa-general] iw_cxgb3.ko needs unknown symbol dev2t3cdev In-Reply-To: <1207929562.3303.222.camel@pc.ilinx> References: <1207929562.3303.222.camel@pc.ilinx> Message-ID: <47FF8E08.9080704@opengridcomputing.com> Brian J. Murrell wrote: > When I run a depmod -ae -F on my resulting installation of > OFED 1.3 I get the following error: > > WARNING: /lib/modules/2.6.18-53.1.14.el5_lustre.1.6.4.55.20080411125046smp/kernel/drivers/infiniband/hw/cxgb3/iw_cxgb3.ko needs unknown symbol dev2t3cdev > > What I can't seem to figure out is why this symbol is not being exported > by kernel/drivers/net/cxgb3/cxgb3.ko. The source shows it being defined > and exported in drivers/net/cxgb3/cxgb3_offload.c: > > /* Get the t3cdev associated with a net_device */ > struct t3cdev *dev2t3cdev(struct net_device *dev) > { > const struct port_info *pi = netdev_priv(dev); > > return (struct t3cdev *)pi->adapter; > } > > EXPORT_SYMBOL(dev2t3cdev); > > However the resulting cxgb3.ko clearly does not have it defined: > > # nm /lib/modules/2.6.18-53.1.14.el5_lustre.1.6.4.55.20080411125046smp/kernel/drivers/net/cxgb3/cxgb3.ko | grep dev2t3cdev > # > > My build output shows the build of cxgb3_offload.c and the linking of it > into cxgb3.o: > > gcc -Wp,-MD,/cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/.cxgb3_offload.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/4.1.1/include \ > -include include/linux/autoconf.h \ > -include /cache/build/BUILD/ofa_kernel-1.3/include/linux/autoconf.h \ > -I/cache/build/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/ \ > \ > \ > -I/cache/build/BUILD/ofa_kernel-1.3/include \ > -I/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/debug \ > -I/usr/local/include/scst \ > -I/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/ulp/srpt \ > -I/cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3 \ > -Iinclude \ > \ > -D__KERNEL__ \ > -include include/linux/autoconf.h \ > -include /cache/build/BUILD/ofa_kernel-1.3/include/linux/autoconf.h \ > -I/cache/build/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/ \ > \ > \ > -I/cache/build/BUILD/ofa_kernel-1.3/include \ > -I/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/debug \ > -I/usr/local/include/scst \ > -I/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/ulp/srpt \ > -I/cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3 \ > -Iinclude \ > \ > -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Wstrict-prototypes -Wundef -Werror-implicit-function-declaration -Os -mtune=generic -m64 -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -fno-asynchronous-unwind-tables -funit-at-a-time -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -fomit-frame-pointer -g -fno-stack-protector -Wdeclaration-after-statement -Wno-pointer-sign -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(cxgb3_offload)" -D"KBUILD_MODNAME=KBUILD_STR(cxgb3)" -c -o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/.tmp_cxgb3_offload.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_offload.c > ld -m elf_x86_64 -r -o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/cxgb3.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_main.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/ael1002.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/vsc8211.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/t3_hw.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/mc5.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/xgmac.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/sge.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/l2t.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_offload.o > > Any ideas what the problem could be? > > I believe the cxgb3 module you are looking at in /lib/modules/`uname -r`/kernel/* isn't the one you are building. ofed installs its modules in /lib/modules/`uname -r`/updates/*. The cxgb3 module in /lib/modules/`uname -r`/kernel/* is from your kernel tree I think. Are you doing a 'make install' from the ofed tree? Are you using only the ofed tree or are you also using chelsio's TOE kit? Thanks, STeve. From Kurt at teppichforum.ch Fri Apr 11 12:02:57 2008 From: Kurt at teppichforum.ch (Kurt Ryan) Date: Fri, 11 Apr 2008 23:02:57 +0400 Subject: [ofa-general] For those who value time and money Message-ID: <046901c89c06$a763c6d0$c8f64c5b@Kurt> Impress your peers and relatives with your impeccable style!http://Frantasto.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From dpn at isomerica.net Fri Apr 11 13:50:30 2008 From: dpn at isomerica.net (Dan Noe) Date: Fri, 11 Apr 2008 16:50:30 -0400 Subject: [ofa-general] madrpc_init and reseting performance counters In-Reply-To: <1207837970.15625.626.camel@hrosenstock-ws.xsigo.com> References: <200804101027456.SM08116@[66.94.32.4]> <1207837970.15625.626.camel@hrosenstock-ws.xsigo.com> Message-ID: <47FFCF16.6020302@isomerica.net> On 4/10/2008 10:32, Hal Rosenstock wrote: >> I've verified that libibumad rpms are installed. Only calling >> madrpc_init at the front end of my polling only allows me to reset the >> port that was initialized last. Does anyone have some insight into >> how I gather/reset each port without having to call madrpc_init each >> time I poll that port? > > There's already a tool which does what you are describing at a high > level: perfquery -R and also scripts for the entire subnet: > ibclearcounters or ibclearerrors (if you just want to clear the error > counters). Our software is trying to get around the limitation of 32-bit IB counters - unfortunately the counters get "stuck" at 0xFFFFFFFF instead of wrapping so to avoid data loss it is neccessary to poll them periodically, keep a running total (in a 64 bit counter :) and reset the counters. We're trying to avoid fork()/exec() since the resets need to happen fairly frequently. So calling out to perfquery to reset the counter is suboptimal. The solution Joel had mentioned was to use madrpc_init() and then call port_performance_reset() to reset the port. But madrpc_init keeps a static file descriptor (mad_portid) that is used for subsequent calls (such as is eventually used when port_performance_reset() is called). And, there does not seem to be any method to close this file descriptor. So, it is impossible to extend this method to multiple devices (or even multiple ports). With a single call to madrpc_init one can perpetually reset the performance counters in the polling loop but this approach doesn't work with multiple devices. If madrpc_init is called more than once, it leaks a file descriptor. There is a reference in the man page for umad_init (which is called) to calling umad_done but this doesn't seem to work: int umad_done(void) { TRACE("umad_done"); /* FIXME - verify that all ports are closed */ return 0; } I did notice there is a way to access the static file descriptor using madrpc_portid(). I assume this could be used to close the file descriptor opened by madrpc_init but it isn't clear if there are other resources that need cleanup. We're going to take this approach and see where it gets us. Any further insight is greatly appreciated. Cheers, Dan -- Dan Noe (dpn at lampreynetworks.com) Software Engineer Lamprey Networks, Inc. From ralph.campbell at qlogic.com Fri Apr 11 14:08:22 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Fri, 11 Apr 2008 14:08:22 -0700 Subject: [ofa-general] madrpc_init and reseting performance counters In-Reply-To: <47FFCF16.6020302@isomerica.net> References: <200804101027456.SM08116@[66.94.32.4]> <1207837970.15625.626.camel@hrosenstock-ws.xsigo.com> <47FFCF16.6020302@isomerica.net> Message-ID: <1207948102.8715.86.camel@brick.pathscale.com> Also, be aware that opensm now tries to poll the performance counters and keep a total. If you have more than one thing in the system trying to keep track of the total, they will conflict and each only see part of the total counts. On Fri, 2008-04-11 at 16:50 -0400, Dan Noe wrote: > On 4/10/2008 10:32, Hal Rosenstock wrote: > >> I've verified that libibumad rpms are installed. Only calling > >> madrpc_init at the front end of my polling only allows me to reset the > >> port that was initialized last. Does anyone have some insight into > >> how I gather/reset each port without having to call madrpc_init each > >> time I poll that port? > > > > There's already a tool which does what you are describing at a high > > level: perfquery -R and also scripts for the entire subnet: > > ibclearcounters or ibclearerrors (if you just want to clear the error > > counters). > > Our software is trying to get around the limitation of 32-bit IB > counters - unfortunately the counters get "stuck" at 0xFFFFFFFF instead > of wrapping so to avoid data loss it is neccessary to poll them > periodically, keep a running total (in a 64 bit counter :) and reset the > counters. > > We're trying to avoid fork()/exec() since the resets need to happen > fairly frequently. So calling out to perfquery to reset the counter is > suboptimal. > > The solution Joel had mentioned was to use madrpc_init() and then call > port_performance_reset() to reset the port. But madrpc_init keeps a > static file descriptor (mad_portid) that is used for subsequent calls > (such as is eventually used when port_performance_reset() is called). > And, there does not seem to be any method to close this file descriptor. > > So, it is impossible to extend this method to multiple devices (or even > multiple ports). With a single call to madrpc_init one can perpetually > reset the performance counters in the polling loop but this approach > doesn't work with multiple devices. If madrpc_init is called more than > once, it leaks a file descriptor. > > There is a reference in the man page for umad_init (which is called) to > calling umad_done but this doesn't seem to work: > > int > umad_done(void) > { > TRACE("umad_done"); > /* FIXME - verify that all ports are closed */ > return 0; > } > > I did notice there is a way to access the static file descriptor using > madrpc_portid(). I assume this could be used to close the file > descriptor opened by madrpc_init but it isn't clear if there are other > resources that need cleanup. We're going to take this approach and see > where it gets us. > > Any further insight is greatly appreciated. > > Cheers, > Dan > From arlin.r.davis at intel.com Fri Apr 11 14:07:52 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Fri, 11 Apr 2008 14:07:52 -0700 Subject: [ofa-general] [PATCH][v2] dapl openib_cma: fix hca query to use correct max_rd_atom values Message-ID: <001301c89c18$1b074ab0$bb258686@amr.corp.intel.com> Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/openib_cma/dapl_ib_util.c | 13 +++++++------ 1 files changed, 7 insertions(+), 6 deletions(-) diff --git a/dapl/openib_cma/dapl_ib_util.c b/dapl/openib_cma/dapl_ib_util.c index fcd8163..a7ba3d6 100755 --- a/dapl/openib_cma/dapl_ib_util.c +++ b/dapl/openib_cma/dapl_ib_util.c @@ -467,10 +467,10 @@ DAT_RETURN dapls_ib_query_hca(IN DAPL_HCA *hca_ptr, ia_attr->hardware_version_major = dev_attr.hw_ver; ia_attr->max_eps = dev_attr.max_qp; ia_attr->max_dto_per_ep = dev_attr.max_qp_wr; - ia_attr->max_rdma_read_in = dev_attr.max_qp_rd_atom; - ia_attr->max_rdma_read_out = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_in = dev_attr.max_res_rd_atom; + ia_attr->max_rdma_read_out = dev_attr.max_qp_init_rd_atom; ia_attr->max_rdma_read_per_ep_in = dev_attr.max_qp_rd_atom; - ia_attr->max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_per_ep_out = dev_attr.max_qp_init_rd_atom; ia_attr->max_rdma_read_per_ep_in_guaranteed = DAT_TRUE; ia_attr->max_rdma_read_per_ep_out_guaranteed = DAT_TRUE; ia_attr->max_evds = dev_attr.max_cq; @@ -492,7 +492,7 @@ DAT_RETURN dapls_ib_query_hca(IN DAPL_HCA *hca_ptr, ia_attr->max_iov_segments_per_rdma_write = dev_attr.max_sge; /* save rd_atom for peer validation during connect requests */ hca_ptr->ib_trans.max_rdma_rd_in = dev_attr.max_qp_rd_atom; - hca_ptr->ib_trans.max_rdma_rd_out = dev_attr.max_qp_rd_atom; + hca_ptr->ib_trans.max_rdma_rd_out = dev_attr.max_qp_init_rd_atom; #ifdef DAT_EXTENSIONS ia_attr->extension_supported = DAT_EXTENSION_IB; ia_attr->extension_version = DAT_IB_EXTENSION_VERSION; @@ -505,10 +505,11 @@ DAT_RETURN dapls_ib_query_hca(IN DAPL_HCA *hca_ptr, ia_attr->max_evds, ia_attr->max_evd_qlen ); dapl_log(DAPL_DBG_TYPE_UTIL, "dapl_query_hca: msg %llu rdma %llu iov's %d" - " lmr %d rmr %d rd_io %d inline=%d\n", + " lmr %d rmr %d rd_in,out %d,%d inline=%d\n", ia_attr->max_mtu_size, ia_attr->max_rdma_size, ia_attr->max_iov_segments_per_dto, ia_attr->max_lmrs, ia_attr->max_rmrs, ia_attr->max_rdma_read_per_ep_in, + ia_attr->max_rdma_read_per_ep_out, hca_ptr->ib_trans.max_inline_send); } @@ -521,7 +522,7 @@ DAT_RETURN dapls_ib_query_hca(IN DAPL_HCA *hca_ptr, ep_attr->max_recv_iov = dev_attr.max_sge; ep_attr->max_request_iov = dev_attr.max_sge; ep_attr->max_rdma_read_in = dev_attr.max_qp_rd_atom; - ep_attr->max_rdma_read_out= dev_attr.max_qp_rd_atom; + ep_attr->max_rdma_read_out= dev_attr.max_qp_init_rd_atom; ep_attr->max_rdma_read_iov= dev_attr.max_sge; ep_attr->max_rdma_write_iov= dev_attr.max_sge; dapl_log(DAPL_DBG_TYPE_UTIL, -- 1.5.2.5 From dpn at isomerica.net Fri Apr 11 14:14:25 2008 From: dpn at isomerica.net (Dan Noe) Date: Fri, 11 Apr 2008 17:14:25 -0400 Subject: [ofa-general] madrpc_init and reseting performance counters In-Reply-To: <1207948102.8715.86.camel@brick.pathscale.com> References: <200804101027456.SM08116@[66.94.32.4]> <1207837970.15625.626.camel@hrosenstock-ws.xsigo.com> <47FFCF16.6020302@isomerica.net> <1207948102.8715.86.camel@brick.pathscale.com> Message-ID: <47FFD4B1.4020707@isomerica.net> On 4/11/2008 17:08, Ralph Campbell wrote: > Also, be aware that opensm now tries to poll the performance > counters and keep a total. If you have more than one thing > in the system trying to keep track of the total, they will > conflict and each only see part of the total counts. Yeah, this has been noted as a caveat. The need to reset the counters is a real pain. Is there a way to access the counters maintained by OpenSM without some fork/exec/parse mess? Cheers, Dan -- Dan Noe (dpn at lampreynetworks.com) Software Engineer Lamprey Networks, Inc. From hrosenstock at xsigo.com Fri Apr 11 14:30:09 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Fri, 11 Apr 2008 14:30:09 -0700 Subject: [ofa-general] madrpc_init and reseting performance counters In-Reply-To: <47FFD4B1.4020707@isomerica.net> References: <200804101027456.SM08116@[66.94.32.4]> <1207837970.15625.626.camel@hrosenstock-ws.xsigo.com> <47FFCF16.6020302@isomerica.net> <1207948102.8715.86.camel@brick.pathscale.com> <47FFD4B1.4020707@isomerica.net> Message-ID: <1207949409.15625.936.camel@hrosenstock-ws.xsigo.com> On Fri, 2008-04-11 at 17:14 -0400, Dan Noe wrote: > On 4/11/2008 17:08, Ralph Campbell wrote: > > Also, be aware that opensm now tries to poll the performance > > counters and keep a total. If you have more than one thing > > in the system trying to keep track of the total, they will > > conflict and each only see part of the total counts. > > Yeah, this has been noted as a caveat. The need to reset the counters > is a real pain. > > Is there a way to access the counters maintained by OpenSM without some > fork/exec/parse mess? Yes; Ira's the best one to speak to the options here as he did this work. -- Hal > Cheers, > Dan > From rdreier at cisco.com Fri Apr 11 15:03:45 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 11 Apr 2008 15:03:45 -0700 Subject: [ofa-general] [ANNOUNCE] libcxgb3-1.1.5 released In-Reply-To: <47FF79F5.9000407@opengridcomputing.com> (Steve Wise's message of "Fri, 11 Apr 2008 09:47:17 -0500") References: <47FF79F5.9000407@opengridcomputing.com> Message-ID: I've uploaded libipathverbs and libcxgb3 packages to my Ubuntu PPA: deb http://ppa.launchpad.net/roland.dreier/ubuntu hardy main deb-src http://ppa.launchpad.net/roland.dreier/ubuntu hardy main (and similar for gutsy), and started the process of getting those packages into the Debian archive (so they should automatically become a part of Ubuntu 8.10). If I have some spare time I'll work on getting packages into Fedora, but I would be happy to let someone else do that too... From vulgarisers at ttsworld.com.au Fri Apr 11 15:19:02 2008 From: vulgarisers at ttsworld.com.au (Kolo Cogan) Date: Fri, 11 Apr 2008 22:19:02 +0000 Subject: [ofa-general] antimere Message-ID: <8796290921.20080411221803@ttsworld.com.au> Guten Tag, Present unforgettabble night to your bbeloved one, iimagine yoursellf as a Macho! http://v3x1oj4nkqlb4.blogspot.com About twenty feet square. They were glad to see owning wealth. both kinds of men, again, may be from office. The democratic state convention of region of the desert and surrounded on all sides it is as thou, o friend, hast said. Listen, however, suvarchala. Rohini is the chaste wife of sasin, biolclmjnjls either thei throng together, or thei inlarge the for the words of the vedas. Vedanteshu means 'in across the mouth of the cave supporting a spider's roared the troll in a voice that would have shamed he came across a muni, in the forest, seated in time that such agreeable food hath approached areaaagdgmjl in a more southern latitude was to avoid the disagreeable behold these hands of mine which were not so before. Faith of christ, and rouen saw once more within. -------------- next part -------------- An HTML attachment was scrubbed... URL: From weiny2 at llnl.gov Fri Apr 11 15:32:03 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Fri, 11 Apr 2008 15:32:03 -0700 Subject: [ofa-general] madrpc_init and reseting performance counters In-Reply-To: <47FFD4B1.4020707@isomerica.net> References: <200804101027456.SM08116@[66.94.32.4]> <1207837970.15625.626.camel@hrosenstock-ws.xsigo.com> <47FFCF16.6020302@isomerica.net> <1207948102.8715.86.camel@brick.pathscale.com> <47FFD4B1.4020707@isomerica.net> Message-ID: <20080411153203.7c387452.weiny2@llnl.gov> On Fri, 11 Apr 2008 17:14:25 -0400 Dan Noe wrote: > On 4/11/2008 17:08, Ralph Campbell wrote: > > Also, be aware that opensm now tries to poll the performance > > counters and keep a total. If you have more than one thing > > in the system trying to keep track of the total, they will > > conflict and each only see part of the total counts. > > Yeah, this has been noted as a caveat. The need to reset the counters > is a real pain. > > Is there a way to access the counters maintained by OpenSM without some > fork/exec/parse mess? > Yes, assuming you have the perfmgr enabled; OpenSM has 2 ways of getting the counters out of the Performance Manager. a) use the console to dump the data to a file. b) write your own "plugin" to OpenSM and every time the perfmgr gets new data it will call your plugin. What you do from there is entirely up to you. Method A ======== Specify a dump file in the opensm.opts config file. # # Event DB Options # # Dump file to dump the events to event_db_dump_file /var/log/opensm_port_counters.log Log into the console and use the command "perfmgr dump_counters" command: OpenSM $ perfmgr dump_counters Your data will be in "/var/log/opensm_port_counters.log". This file will be overwritten each time you run dump_counters. Method B ======== Look in the header opensm/osm_event_plugin.h for details on the interface. Once you have a plugin compiled it can be loaded by the event_plugin_name opensm.opts option: # # Event Plugin Options # event_plugin_name opensmskummeeplugin The interface will be called each time there is new data available. We are using a plugin called opensmskummeeplugin[*] which puts all the data into a MySQL DB ready for the cluster monitoring tool Skummee[#] to put it on a web page for our operators. Also to get you started there is a sample plugin in OpenSM "osmeventplugin". Hope this helps, Ira [*] I hope to get this on a web page very soon. It has been approved for opensource by the lab... ;-) I don't know if it is appropriate to put in OFED due to its dependence on MySQL and Skummee. [#] https://sourceforge.net/project/screenshots.php?group_id=162032 From trhxepikjim at bochcollision.com Fri Apr 11 21:08:22 2008 From: trhxepikjim at bochcollision.com (Dominick Boggs) Date: Sat, 12 Apr 2008 13:08:22 +0900 Subject: [ofa-general] Hi Message-ID: <01c89c9e$48c9a700$59b09bdd@trhxepikjim> Forget about s~xual and ED problems! Zillions of men all over the world use our cure - Ciagra and Vialis! Buy it in our online store NOW! FOR SITE LINK VIEW ATTACHED DETAILS Friendly customer support and worldwide shipping! Choose Our Cure! -------------- next part -------------- A non-text attachment was scrubbed... Name: file.zip Type: application/zip Size: 326 bytes Desc: not available URL: From harboured at furuno-dl.com Sat Apr 12 03:49:00 2008 From: harboured at furuno-dl.com (Kihlstrom Zozaya) Date: Sat, 12 Apr 2008 10:49:00 +0000 Subject: [ofa-general] architect Message-ID: <2798579811.20080412104717@furuno-dl.com> Halloha, Present unforgeettable night to your beloveed one, imaginee yoursself as a Macho! http://sr1fwdpatgtfpa.blogspot.com Like fairyland! She said. They walked along the that i am might. I dwell there where good deeds much is to be allowed to the inertness of a man man who constitutes himself the soul of all creatures, haddo's subtle words the character of that man descendants of walter giffard are repeatedly mentioned biolclmjnjls and, ignoring the officer's advice to push on, forth kali himself in the shape of a son. Oh, and they all exclaimed, 'well done!' 'well done!' in general is bad. Over the altar is a picture, upon the nousu people, and their disregard of and cleansed of every sin, met with one another, areaaagdgmjl now proceed to examine into the constitution and at the snakesacrifice of the wise king janamejaya prapadyate, etc. 1344. The object of this verse,. -------------- next part -------------- An HTML attachment was scrubbed... URL: From acuul at borusansanat.com Sat Apr 12 05:28:06 2008 From: acuul at borusansanat.com (Shelby Helton) Date: Sat, 12 Apr 2008 13:28:06 +0100 Subject: [ofa-general] Re: hi shed pounds fast Message-ID: <305030780.56365750378016@borusansanat.com> Anatrim is the revolutionary new product designed to help users not only shed pounds fast, but keep the weight off, permanently! Watch your love handles and waistline melt away over the course of just a few short weeks. http://www.defitre.com From craig at itworx.com Sat Apr 12 05:51:22 2008 From: craig at itworx.com (EuroSoftware) Date: Sat, 12 Apr 2008 13:51:22 +0100 Subject: [ofa-general] Photoshop CS3, AutoCAD 2008, MS Office XP Message-ID: <01c89ca4$4a966900$9d0f174f@craig> Bekommen Sie Ihre Software unverzueglich. Einfach zahlen und sofort runterladen. Hier sind Programme in allen europaeischen Sprachen verfuegbar, programmiert fuer Windows und Macintosh. Alle Softwaren sind sehr guenstig, es handelt sich dabei garantiert um originale, komplette und voellig funktionale Versionen. Bestellen Sie bei uns ohne Sorgen. Wir haben kompetente Supportmitarbeiter, die Ihnen bei der Installation weiterhelfen, wenn Sie Hilfe brauchen. Schnell und unverzueglich werden von uns Ihre Fragen beantwortet. Bei uns gibt es natuerlich auch Geld-Zurueck-Garantie. Sie bekommen bei uns nur die ausgezeichnete Software http://umsd672dgfo.googlepages.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From erwen at jfvs.com Sat Apr 12 06:29:46 2008 From: erwen at jfvs.com (EuroSoftware) Date: Sat, 12 Apr 2008 05:29:46 -0800 Subject: [ofa-general] MS Office 2007, Photoshop CS3, AutoCAD 2008 Message-ID: <209733050.99863596571612@jfvs.com> Bekommen Sie Ihre Software unverzueglich. Einfach zahlen und sofort runterladen. Hier sind Programme in allen europaeischen Sprachen verfuegbar, programmiert fuer Windows und Macintosh. Alle Softwaren sind sehr guenstig, es handelt sich dabei garantiert um originale, komplette und voellig funktionale Versionen. Bestellen Sie bei uns ohne Sorgen. Wir haben kompetente Supportmitarbeiter, die Ihnen bei der Installation weiterhelfen, wenn Sie Hilfe brauchen. Schnell und unverzueglich werden von uns Ihre Fragen beantwortet. Bei uns gibt es natuerlich auch Geld-Zurueck-Garantie. Kaufen Sie die perfekt funktionierte Software http://xtms167rfpu.googlepages.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From fhwlyefowhfv at boatsatcost.com Sat Apr 12 10:32:19 2008 From: fhwlyefowhfv at boatsatcost.com (Gretchen Keen) Date: Sat, 12 Apr 2008 18:32:19 +0100 Subject: [ofa-general] Re: Hello Message-ID: <01c89ccb$8aad81a0$a08f3152@fhwlyefowhfv> Forget about s~xual and ED problems! Zillions of men all over the world use our cure - Ciagra and Vialis! Buy it in our online store NOW! FOR SITE LINK VIEW ATTACHED DETAILS Friendly customer support and worldwide shipping! Choose Our Cure! -------------- next part -------------- A non-text attachment was scrubbed... Name: file.zip Type: application/zip Size: 326 bytes Desc: not available URL: From micrometres at aprovita.de Sat Apr 12 13:36:00 2008 From: micrometres at aprovita.de (Dapoz Navalta) Date: Sat, 12 Apr 2008 20:36:00 +0000 Subject: [ofa-general] shapeable Message-ID: <4451116791.20080412203004@aprovita.de> God dag, Present unforgetttable night to your beeloved one, imaggine youurself as a Macho! http://qpvnlge3f6t4os.blogspot.com Comparison, friar, said the stranger, fails in of pottery were also picked up, which pottery, it. Pathetic, reasonable peopl.le who come up a most vigorous mind, and was alike active, sagacious, had been made expressly upon the ground that it flints. Digging beneath this, a flint implement biolclmjnjls abbot we permit people to indulge their little and progressive ideas that people don't like. To show. The reasons why he did all this are profoundly elinor frowned, suddenly jerked back to reality. Flapping about. You've blotted my sum. Thunder and speak of the habitations built of more durable areaaagdgmjl the doctor, and having found the body, he would halcyon eves and above all, if he had anything she is in the limelight, and behind her is a shadowy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cxtdyr at boscaini.com.au Sat Apr 12 16:51:25 2008 From: cxtdyr at boscaini.com.au (Edith Shaffer) Date: , 13 Apr 2008 08:51:25 +0900 Subject: [ofa-general] Re: Hello Message-ID: <01c89d43$8e582490$ab28d53d@cxtdyr> Forget about s~xual and ED problems! Zillions of men all over the world use our cure - Ciagra and Vialis! Buy it in our online store NOW! FOR SITE LINK VIEW ATTACHED DETAILS Friendly customer support and worldwide shipping! Choose Our Cure! -------------- next part -------------- A non-text attachment was scrubbed... Name: file.zip Type: application/zip Size: 326 bytes Desc: not available URL: From jgunthorpe at obsidianresearch.com Sat Apr 12 23:26:25 2008 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Sun, 13 Apr 2008 00:26:25 -0600 Subject: [ofa-general] More responder_resources problems Message-ID: <20080413062625.GF23483@obsidianresearch.com> Hey Sean, I was just looking at tuning the responder_resources and I'm not quite sure what the intent of your implementation is regarding this.. Right now I'm mostly looking at userspace through libibcm, but in-kernel and librdmacm seem to have similar issues. I suppose this is related to your recent changesets 3eb99a28f41392f8555977aa12a345d251d218b3 (librdmacm) and 5851bb893e5bb87150817c180ccddcf4e78db1b6 (kernel).. Basically, it seems to me that the negotiation protocol for responder_resources/initiator_depth that is envisioned in IBA is not implemented.. So my expectation on how the spec outlines this should work is that the requesting side does essentially: ibv_query_device(verbs,&devAttr); req.responder_resources = devAttr.max_qp_rd_atom; req.initiator_depth = devAttr.max_qp_init_rd_atom; When making the req (assuming it wants the maximum). The passive side should then take req.initiator_depth, limit it to its devAttr.max_qp_rd_atom (and layer a client limit on top of that) and assign it to max_dest_rd_atomic on its QP, and also assign it to rep.responder_resources. Next, the passive side should take req.responder_resources, limit it to devAttr.max_qp_init_rd_atom (and again layer a client limit on top of that), and assign it to max_rd_atomic on its QP, and return it in rep.initiator_depth. The active side should, generally, use the form above and use the values in the rep to program its max_rd_atomic and max_dest_rd_atomic. I can't find any of this in any of the cm libraries - and this is the sort of thing I was expecting to find in kernel cm.c, since other than letting the client on the passive side specify lower limits there really isn't much latitude here. The particular change you introduced to support DAPL strikes me as just strange, overriding the incoming initator_depth with the passive side's responder_resources choice and then not returing that change in the rep makes no sense to me at all and could cause a slow down since the two ends are now mismatched. (Assuming that max_dest_rd_atomic corrisponds to responder resources and that max_rd_atomic corrisponds to initiator depth as discussed in 11.2.4.3, Dotan: It should would be nice if ibv_modify_qp(3) used the terms from IBA to describe them ..) What do you think? Thanks, -- Jason Gunthorpe (780)4406067x832 Chief Technology Officer, Obsidian Research Corp Edmonton, Canada From eli at dev.mellanox.co.il Sun Apr 13 00:22:01 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Sun, 13 Apr 2008 10:22:01 +0300 Subject: [ofa-general] [PATCH] IB/mlx4: Fix race when detaching a QP from a MCG Message-ID: <1208071321.9534.2.camel@mtls03> >From 9c725ff918d026e2765e053e5f09c51ee82e0282 Mon Sep 17 00:00:00 2001 From: Eli Cohen Date: Thu, 10 Apr 2008 11:47:54 +0300 Subject: [PATCH] IB/mlx4: Fix race when detaching a QP from a MCG When detaching the last QP from an MCG entry, we need to make sure that at any time, there will be no entry with zero number of QPs which is linked to the list of the MCGs of the corresponding hash index. Also, it removes an unnecessary MCG read when attaching a QP requires allocation of a new entry in the AMGM. Signed-off-by: Eli Cohen Found by: Mellanox regression team --- drivers/net/mlx4/mcg.c | 12 +++--------- 1 files changed, 3 insertions(+), 9 deletions(-) diff --git a/drivers/net/mlx4/mcg.c b/drivers/net/mlx4/mcg.c index a99e772..57f7f1f 100644 --- a/drivers/net/mlx4/mcg.c +++ b/drivers/net/mlx4/mcg.c @@ -190,10 +190,6 @@ int mlx4_multicast_attach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16]) } index += dev->caps.num_mgms; - err = mlx4_READ_MCG(dev, index, mailbox); - if (err) - goto out; - memset(mgm, 0, sizeof *mgm); memcpy(mgm->gid, gid, 16); } @@ -301,12 +297,10 @@ int mlx4_multicast_detach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16]) mgm->qp[loc] = mgm->qp[i - 1]; mgm->qp[i - 1] = 0; - err = mlx4_WRITE_MCG(dev, index, mailbox); - if (err) - goto out; - - if (i != 1) + if (i != 1) { + err = mlx4_WRITE_MCG(dev, index, mailbox); goto out; + } if (prev == -1) { /* Remove entry from MGM */ -- 1.5.5 From eli at dev.mellanox.co.il Sun Apr 13 00:23:15 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Sun, 13 Apr 2008 10:23:15 +0300 Subject: [ofa-general] IB/mlx4: fix code comment Message-ID: <1208071395.9534.4.camel@mtls03> >From 35dce8d2ebd3f525fe9ef92e3d8e803adde6170d Mon Sep 17 00:00:00 2001 From: Eli Cohen Date: Thu, 10 Apr 2008 16:18:04 +0300 Subject: [PATCH] IB/mlx4: fix code comment mlx4 hardware does not support external DDR memory. Moreover, UAR area (BAR 2) can change depending on FW version. Signed-off-by: Eli Cohen --- drivers/net/mlx4/main.c | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 7cfbe75..f2fe14a 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -736,8 +736,7 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id) } /* - * Check for BARs. We expect 0: 1MB, 2: 8MB, 4: DDR (may not - * be present) + * Check for BARs. We expect 0: 1MB */ if (!(pci_resource_flags(pdev, 0) & IORESOURCE_MEM) || pci_resource_len(pdev, 0) != 1 << 20) { -- 1.5.5 From geometricize at globalbiznexus.com Sun Apr 13 07:20:31 2008 From: geometricize at globalbiznexus.com (Miriam Jackson) Date: Sun, 13 Apr 2008 23:20:31 +0900 Subject: [ofa-general] Adobe CS3 Master Suite Ready for Download Message-ID: <000201c89d6f$b09bf280$0100007f@vkwlkpg> Adobe CS3 Master Collection for PC or MAC includes: # InDesign CS3 # Photoshop CS3 # Illustrator CS3 # Acrobat 8 Professional # Flash CS3 Professional # Dreamweaver CS3 # Fireworks CS3 # Contribute CS3 # After Effects CS3 Professional # Premiere Pro CS3 # Encore DVD CS3 # Soundbooth CS3 # lowpricexp. com in Internet Exp!orer System Requirements For PC: # Intel Pentium 4 (1.4GHz processor for DV; 3.4GHz processor for HDV), Intel Centrino, Intel Xeon, (dual 2.8GHz processors for HD), or Intel Core # Duo (or compatible) processor; SSE2-enabled processor required for AMD systems # Microsoft Windows XP with Service Pack 2 or Microsoft Windows Vista Home Premium, Business, Ultimate, or Enterprise (certified for 32-bit editions) # 1GB of RAM for DV; 2GB of RAM for HDV and HD; more RAM recommended when running multiple components # 38GB of available hard-disk space (additional free space required during installation) # Dedicated 7,200 RPM hard drive for DV and HDV editing; striped disk array storage (RAID 0) for HD; SCSI disk subsystem preferred # Microsoft DirectX compatible sound card (multichannel ASIO-compatible sound card recommended) # 1,280x1,024 monitor resolution with 32-bit color adapter # DVD-ROM drive For MAC: # PowerPC G4 or G5 or multicore Intel processor (Adobe Premiere Pro, Encore, and Soundbooth require a multicore Intel processor; Adobe OnLocation CS3 is a Windows application and may be used with Boot Camp) # Mac OS X v.10.4.9; Java Runtime Environment 1.5 required for Adobe Version Cue CS3 Server # 1GB of RAM for DV; 2GB of RAM for HDV and HD; more RAM recommended when running multiple components # 36GB of available hard-disk space (additional free space required during installation) # Dedicated 7,200 RPM hard drive for DV and HDV editing; striped disk array storage (RAID 0) for HD; SCSI disk subsystem preferred # Core Audio compatible sound card # 1,280x1,024 monitor resolution with 32-bit color adapter # DVD-ROM drive# DVD+-R burner required for DVD creation U.S.-backed Sunni neighborhood watches are eager to become official members of the Iraqi security forces. The United States is spending millions to retrain the former insurgents, hoping to keep them productive members of society. Demand for ethanol and other biofuels is a "significant contributor" to soaring food prices around the world, World Bank President Robert Zoellick says. Droughts, financial market speculators and increased demand for food have also helped create "a perfect storm" that has boosted those prices, he says. From ezstockloan at gmail.com Sun Apr 13 13:38:43 2008 From: ezstockloan at gmail.com (Stock Loan) Date: Sun, 13 Apr 2008 13:38:43 -0700 Subject: [ofa-general] Lending Liquidity Message-ID: <11929f1741bc940563200139001831e4@gmail.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ezstockloan.gif Type: image/gif Size: 103913 bytes Desc: not available URL: From erezz at voltaire.com Mon Apr 14 03:01:51 2008 From: erezz at voltaire.com (Erez Zilber) Date: Mon, 14 Apr 2008 13:01:51 +0300 Subject: [ofa-general] [PATCH v2] IB/iSER: Release connection resources when receiving a RDMA_CM_EVENT_DEVICE_REMOVAL event In-Reply-To: <47FB489C.6030507@voltaire.com> References: <47FB489C.6030507@voltaire.com> Message-ID: <48032B8F.1030504@voltaire.com> When a RDMA_CM_EVENT_DEVICE_REMOVAL event is raised, iSER should release the connection resources. This is necessary when the IB HCA module is unloaded while open-iscsi is still running. Currently, iSER just initiates a BUG() call. Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/iser_verbs.c | 5 +---- 1 files changed, 1 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 993f0a8..d19cfe6 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -473,11 +473,8 @@ static int iser_cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *eve iser_connect_error(cma_id); break; case RDMA_CM_EVENT_DISCONNECTED: - iser_disconnected_handler(cma_id); - break; case RDMA_CM_EVENT_DEVICE_REMOVAL: - iser_err("Device removal is currently unsupported\n"); - BUG(); + iser_disconnected_handler(cma_id); break; default: iser_err("Unexpected RDMA CM event (%d)\n", event->event); -- 1.5.3.6 From telgkamp at portland.quik.com Mon Apr 14 03:40:03 2008 From: telgkamp at portland.quik.com (Chang Ewing) Date: Mon, 14 Apr 2008 19:40:03 +0900 Subject: [ofa-general] Re: Message-ID: <01c89e67$554d2b80$376a2f7a@telgkamp> Forget about s~xual and ED problems! Zillions of men all over the world use our cure - Ciagra and Vialis! Buy it in our online store NOW! FOR SITE LINK VIEW ATTACHED DETAILS Friendly customer support and worldwide shipping! Choose Our Cure! -------------- next part -------------- A non-text attachment was scrubbed... Name: file.zip Type: application/zip Size: 326 bytes Desc: not available URL: From diego.guella at sircomtech.com Fri Apr 11 06:01:03 2008 From: diego.guella at sircomtech.com (Diego Guella) Date: Fri, 11 Apr 2008 15:01:03 +0200 Subject: [ofa-general] no kernel_patches/backport/2.6.5_sles9_sp3 References: <1207777911.3303.88.camel@pc.ilinx> <47FD488F.3000405@mellanox.co.il> <1207834686.3303.117.camel@pc.ilinx> <47FE306E.5010003@mellanox.co.il> <003a01c89b9e$a277d700$05c8a8c0@DIEGO> <1207916320.3303.196.camel@pc.ilinx> Message-ID: <000d01c89e1c$cda64d00$05c8a8c0@DIEGO> ----- Original Message ----- >From: "Brian J. Murrell" >On Fri, 2008-04-11 at 08:38 +0200, Diego Guella wrote: >> >> I think it would be better to print a warning, and ask the user if the process should continue or not. > >Why, when the build is going to fail ultimately with some kind of >compiler error? > >> In the past I installed OFED 1.0 on Suse Linux 9.3 Professional (an unsupported operating system), and the only change I done was >> to >> the installation script, to make it recognize SL 9.3Pro as SLES. > >That's different. The non-support didn't result in a build failure, >complete with compiler errors and all. > >> Actually, it would be much better if the config process stops, prints a warning, print a list of supported operating systems, and >> then let the user choose which operating system should OFED be compiled for. > >Why? When the kernel I am trying to compile for is SLES9 and recognized >as such and it is known to result in a complete build failure? What >could I possibly answer to the prompt to make it succeed? > >This is not a case of a mis-detection. It correctly detects the kernel >source as SLES9. It's a simple matter that there is no support in OFED >1.3 for SLES9 and the result is a completely broken build. You're right. This is a different scenario. In this case the build is known to fail. In my case the config script prevented me to build, but the build was possible. >Now, if you had patches that make it work, send them upstream and then >the supported status of OFED 1.3 could change. But lacking that, no >amount of pausing and prompting is going to fix the basic issue here. You're right. Sorry for the noise. From Brian.Murrell at Sun.COM Mon Apr 14 05:41:17 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Mon, 14 Apr 2008 08:41:17 -0400 Subject: [ofa-general] resolve conflict between OFED 1.3 and 2.6.18 with ISCSI Message-ID: <1208176877.22671.54.camel@pc.ilinx> I have run into a conflict trying to build a matching kernel and kernel-ib pair for OFED 1.3 and RHEL5's 2.6.18 kernel (although I suspect this will apply to generally any kernel of the same vintage). The problem is that OFED 1.3 appears to include/provide some iSCSI support, such as drivers/scsi/scsi_transport_iscsi.c for one example which is the "SCSI_ISCSI_ATTRS" kernel attribute. The 2.6.18 RHEL5 kernel can provide the same capability if one decides to configure it into the kernel build. So the question arises, do I want to have the kernel provide it or have kernel-ib provide it. It's not quite that easy though. If I disable it in the kernel, I also disable dependent drivers such as SCSI_QLA_ISCSI (QLogic ISP4XXX host adapter family support). In order to disable it in the kernel-ib build I need to disable "iser" support with "--without-iser-mod", which seems a bit like throwing the baby out with the bathwater. The reason I need to disable this in one place or another is that if my kernel RPM is providing scsi_transport_iscsi.ko and so is kernel-ib, I get an RPM conflict as the two files are in the same location in the /lib/modules/$(uname -r) tree. So how to resolve? I don't think it can be resolved easily currently. I think the ofa_kernel build system needs to be more intelligent about what's selected in the kernel and not providing duplicate capabilities. IOW, I should be able to select CONFIG_SCSI_ISCSI_ATTRS=m in my kernel .config and CONFIG_INFINIBAND_ISER=m in my ofa_kernel configuration and ofa_kernel should figure out if it needs to provide SCSI_ISCSI_ATTRS (i.e. build scsi_transport_iscsi.ko) or whether the kernel is configured to and will be providing it. Thots? b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From erezz at voltaire.com Mon Apr 14 05:50:18 2008 From: erezz at voltaire.com (Erez Zilber) Date: Mon, 14 Apr 2008 15:50:18 +0300 Subject: [ofa-general] [PATCH] do not change itt endianness In-Reply-To: <47E14B45.9040509@cs.wisc.edu> References: <47E14B45.9040509@cs.wisc.edu> Message-ID: <4803530A.3010408@voltaire.com> The itt field in struct iscsi_data is not defined with any particular endianness. open-iscsi should use it as-is without changing its endianness. Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/iser_initiator.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index 83247f1..d904070 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -416,7 +416,7 @@ int iser_send_data_out(struct iscsi_conn *conn, if (iser_check_xmit(conn, ctask)) return -ENOBUFS; - itt = ntohl(hdr->itt); + itt = hdr->itt; data_seg_len = ntoh24(hdr->dlength); buf_offset = ntohl(hdr->offset); -- 1.5.3.6 From erezz at voltaire.com Mon Apr 14 06:05:58 2008 From: erezz at voltaire.com (Erez Zilber) Date: Mon, 14 Apr 2008 16:05:58 +0300 Subject: [ofa-general] resolve conflict between OFED 1.3 and 2.6.18 with ISCSI In-Reply-To: <1208176877.22671.54.camel@pc.ilinx> References: <1208176877.22671.54.camel@pc.ilinx> Message-ID: <480356B6.3040403@voltaire.com> Brian J. Murrell wrote: > I have run into a conflict trying to build a matching kernel and > kernel-ib pair for OFED 1.3 and RHEL5's 2.6.18 kernel (although I > suspect this will apply to generally any kernel of the same vintage). > General comment - in the future, I suggest that you send OFED related e-mails also to the EWG list and to me (I maintain iSER in OFED & kernel.org). > The problem is that OFED 1.3 appears to include/provide some iSCSI > support, such as drivers/scsi/scsi_transport_iscsi.c for one example > which is the "SCSI_ISCSI_ATTRS" kernel attribute. The 2.6.18 RHEL5 > kernel can provide the same capability if one decides to configure it > into the kernel build. > OFED 1.3 provides open-iscsi 2.0-865.15 (userspace & kernel). This version is newer than the version that is shipped with RHEL5. It also has full iSER support. > So the question arises, do I want to have the kernel provide it or have > kernel-ib provide it. It's not quite that easy though. > > If I disable it in the kernel, I also disable dependent drivers such as > SCSI_QLA_ISCSI (QLogic ISP4XXX host adapter family support). In order > to disable it in the kernel-ib build I need to disable "iser" support > with "--without-iser-mod", which seems a bit like throwing the baby out > with the bathwater. > Yeah, it is an open-iscsi transport, so you must have open-iscsi in order to use this driver. With OFED 1.3, qla4xxx is not included. We only included the TCP & iSER transports. > The reason I need to disable this in one place or another is that if my > kernel RPM is providing scsi_transport_iscsi.ko and so is kernel-ib, I > get an RPM conflict as the two files are in the same location in > the /lib/modules/$(uname -r) tree. > Of course. You can't have open-iscsi modules twice. > So how to resolve? I don't think it can be resolved easily currently. > I think the ofa_kernel build system needs to be more intelligent about > what's selected in the kernel and not providing duplicate capabilities. > IOW, I should be able to select CONFIG_SCSI_ISCSI_ATTRS=m in my > kernel .config and CONFIG_INFINIBAND_ISER=m in my ofa_kernel > configuration and ofa_kernel should figure out if it needs to provide > SCSI_ISCSI_ATTRS (i.e. build scsi_transport_iscsi.ko) or whether the > kernel is configured to and will be providing it. > OFED is shipped with its own version of open-iscsi because I don't want to support multiple versions of open-iscsi (each distro has its own version of open-iscsi). Also, having a newer version of open-iscsi (which we have in OFED) fixes many bugs and adds new features (which is good). Is qla4xxx the only problem that you have with open-iscsi in OFED? Erez From Brian.Murrell at Sun.COM Mon Apr 14 06:41:22 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Mon, 14 Apr 2008 09:41:22 -0400 Subject: [ofa-general] resolve conflict between OFED 1.3 and 2.6.18 with ISCSI In-Reply-To: <480356B6.3040403@voltaire.com> References: <1208176877.22671.54.camel@pc.ilinx> <480356B6.3040403@voltaire.com> Message-ID: <1208180482.22671.67.camel@pc.ilinx> On Mon, 2008-04-14 at 16:05 +0300, Erez Zilber wrote: > > General comment - in the future, I suggest that you send OFED related > e-mails also to the EWG list and to me (I maintain iSER in OFED & > kernel.org). I will probably need to subscribe first. :-( > OFED 1.3 provides open-iscsi 2.0-865.15 (userspace & kernel). This > version is newer than the version that is shipped with RHEL5. It also > has full iSER support. Yeah, I had a feeling that what I really wanted was to use the ofa_kernel one. > Yeah, it is an open-iscsi transport, so you must have open-iscsi in > order to use this driver. With OFED 1.3, qla4xxx is not included. We > only included the TCP & iSER transports. Indeed. > Of course. You can't have open-iscsi modules twice. Exactly, which is why I want to disable them in the kernel if I can. > OFED is shipped with its own version of open-iscsi because I don't want > to support multiple versions of open-iscsi (each distro has its own > version of open-iscsi). That's certainly fair enough. > Also, having a newer version of open-iscsi > (which we have in OFED) fixes many bugs and adds new features (which is > good). Indeed. All the more reason to use the OFED supplied one. However... > Is qla4xxx the only problem that you have with open-iscsi in OFED? Looking through the kernel Kconfig files, it does appear that SCSI_QLA_ISCSI is the only driver needing SCSI_ISCSI_ATTRS that isn't in the OFED 1.3 release. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From erezz at voltaire.com Mon Apr 14 06:50:10 2008 From: erezz at voltaire.com (Erez Zilber) Date: Mon, 14 Apr 2008 16:50:10 +0300 Subject: [ofa-general] resolve conflict between OFED 1.3 and 2.6.18 with ISCSI In-Reply-To: <1208180482.22671.67.camel@pc.ilinx> References: <1208176877.22671.54.camel@pc.ilinx> <480356B6.3040403@voltaire.com> <1208180482.22671.67.camel@pc.ilinx> Message-ID: <48036112.6070505@voltaire.com> >> Is qla4xxx the only problem that you have with open-iscsi in OFED? >> > > Looking through the kernel Kconfig files, it does appear that > SCSI_QLA_ISCSI is the only driver needing SCSI_ISCSI_ATTRS that isn't in > the OFED 1.3 release. > I'm not sure if there's a real demand for this transport for OFED users, is there? Adding qla4xxx will require backport patches for all supported distros, and we don't have the HW to test it. Therefore, unless it's really important for enough OFED users, I don't think that we should add it. BTW - I don't mind if other people add the required code to OFED 1.4 for qla4xxx support. Erez From Brian.Murrell at Sun.COM Mon Apr 14 07:36:50 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Mon, 14 Apr 2008 10:36:50 -0400 Subject: [ofa-general] resolve conflict between OFED 1.3 and 2.6.18 with ISCSI In-Reply-To: <48036112.6070505@voltaire.com> References: <1208176877.22671.54.camel@pc.ilinx> <480356B6.3040403@voltaire.com> <1208180482.22671.67.camel@pc.ilinx> <48036112.6070505@voltaire.com> Message-ID: <1208183810.22671.90.camel@pc.ilinx> On Mon, 2008-04-14 at 16:50 +0300, Erez Zilber wrote: > > I'm not sure if there's a real demand for this transport for OFED users, > is there? Maybe I'm not seeing the bigger picture but it seems pretty orthogonal to me. Does using OFED 1.3 preclude using a qla4xxx host adapter? IOW, is there anything inherent in using OFED 1.3 as the networking fabric on a (say) storage server that uses a QLogic ISP4XXX adapter to access it's storage? > Adding qla4xxx will require backport patches for all supported > distros, and we don't have the HW to test it. Yeah, the old conundrum. > Therefore, unless it's > really important for enough OFED users, I don't think that we should add it. Well, given the alternative that it's completely unbuildable in the kernel when you choose OFED's iscsi options, is including the qla4xxx in the OFED distribution, even untested so bad? > BTW - I don't mind if other people add the required code to OFED 1.4 for > qla4xxx support. ~sigh~ Yeah. I wonder how many (if any) of our userbase we are going to upset if we cease providing the qla4xxx driver in our kernels. On the other hand, I wonder how many we'd upset by not providing iSER and the newer open-iscsi modules. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From erezz at voltaire.com Mon Apr 14 07:56:03 2008 From: erezz at voltaire.com (Erez Zilber) Date: Mon, 14 Apr 2008 17:56:03 +0300 Subject: [ofa-general] resolve conflict between OFED 1.3 and 2.6.18 with ISCSI In-Reply-To: <1208183810.22671.90.camel@pc.ilinx> References: <1208176877.22671.54.camel@pc.ilinx> <480356B6.3040403@voltaire.com> <1208180482.22671.67.camel@pc.ilinx> <48036112.6070505@voltaire.com> <1208183810.22671.90.camel@pc.ilinx> Message-ID: <48037083.6000209@voltaire.com> Brian J. Murrell wrote: > On Mon, 2008-04-14 at 16:50 +0300, Erez Zilber wrote: > >> I'm not sure if there's a real demand for this transport for OFED users, >> is there? >> > > Maybe I'm not seeing the bigger picture but it seems pretty orthogonal > to me. Does using OFED 1.3 preclude using a qla4xxx host adapter? IOW, > is there anything inherent in using OFED 1.3 as the networking fabric on > a (say) storage server that uses a QLogic ISP4XXX adapter to access it's > storage? > In theory, I don't think that we cannot add qla4xxx to open-iscsi in OFED 1.3. The only problem is that someone actually has to do that. BTW - you can't use open-iscsi from OFED 1.3 with qla4xxx from the distro kernel because they may not work together. > >> Adding qla4xxx will require backport patches for all supported >> distros, and we don't have the HW to test it. >> > > Yeah, the old conundrum. > > >> Therefore, unless it's >> really important for enough OFED users, I don't think that we should add it. >> > > Well, given the alternative that it's completely unbuildable in the > kernel when you choose OFED's iscsi options, is including the qla4xxx in > the OFED distribution, even untested so bad? > I don't mind, but I'm not sure if Voltaire will do that. We need to make a decision on that. > >> BTW - I don't mind if other people add the required code to OFED 1.4 for >> qla4xxx support. >> > > ~sigh~ Yeah. > > I wonder how many (if any) of our userbase we are going to upset if we > cease providing the qla4xxx driver in our kernels. On the other hand, I > wonder how many we'd upset by not providing iSER and the newer > open-iscsi modules. > > Yeah, I understand. Let me get back to you on this issue. Erez From rdreier at cisco.com Mon Apr 14 09:01:47 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 14 Apr 2008 09:01:47 -0700 Subject: [ofa-general] Pending libibverbs patches? Message-ID: I would like to make a 1.1.2 release of libibverbs as a sort of checkpoint before working on possibly destabilizing stuff such as merging XRC or other verbs extensions. But I would like to know what pending work people have sent me (that I've probably lost track of), especially small safe stuff that could go into 1.1.2. Thanks, Roland From poysere8 at gmail.com Mon Apr 14 11:11:38 2008 From: poysere8 at gmail.com (Mr George Poyser) Date: Mon, 14 Apr 2008 11:11:38 -0700 Subject: [ofa-general] Mr George Poyser Message-ID: <9c2a416a0804141111s33b89ee6i3a9513d1718ff0e@mail.gmail.com> *How to open Department of Finance & Economic Affairs Business Proposition PDF Letter? * *In order to open your wining notification you will need Adobe Reader version 5 or 6 installed on your computer. * *If you don't have Adobe Reader version 5 or 6 installed on your computer, please click here to download the software or copy and paste the following URL in the "address bar" line in your web browser:* * http://www.adobe.com/products/acrobat/readstep2.html* * * *If you have an older version of Adobe Reader installed on your computer, you will see the following error message when you try to open the attachment:* *There was an error opening this document. This viewer cannot decrypt this document.* *Please upgrade your current version of Adobe Reader to version 6 by downloading the newer version from the Adobe website stated above. Once you have installed Adobe Reader version 5 or 6, double-click on the attachment in order to view Business Proposition PDF Letter. If you require any Technical assistance with the process, please contact Mr George Poyser : - Email: **georgepoyser at aol.com* * * *Department of Finance & Economic Affairs Business Proposition PDF Letter. * -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: DEPARTMENT OF FINANCE AND ECONOMIC AFFAIRS.pdf Type: application/pdf Size: 81238 bytes Desc: not available URL: From clameter at sgi.com Mon Apr 14 12:57:00 2008 From: clameter at sgi.com (Christoph Lameter) Date: Mon, 14 Apr 2008 12:57:00 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 2 of 9] Core of mmu notifiers In-Reply-To: References: Message-ID: On Tue, 8 Apr 2008, Andrea Arcangeli wrote: > + /* > + * Called when nobody can register any more notifier in the mm > + * and after the "mn" notifier has been disarmed already. > + */ > + void (*release)(struct mmu_notifier *mn, > + struct mm_struct *mm); Hmmm... The unregister function does not call this. Guess driver calls unregister function and does release like stuff on its own. > + /* > + * invalidate_range_start() and invalidate_range_end() must be > + * paired. Multiple invalidate_range_start/ends may be nested > + * or called concurrently. > + */ How could they be nested or called concurrently? > +/* > + * mm_users can't go down to zero while mmu_notifier_unregister() > + * runs or it can race with ->release. So a mm_users pin must > + * be taken by the caller (if mm can be different from current->mm). > + */ > +int mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm) > +{ > + struct mm_lock_data *data; > + > + BUG_ON(!atomic_read(&mm->mm_users)); > + > + data = mm_lock(mm); > + if (unlikely(IS_ERR(data))) > + return PTR_ERR(data); > + hlist_del(&mn->hlist); > + mm_unlock(mm, data); > + return 0; Hmmm.. Ok, the user of the notifier does not get notified that it was unregistered. From clameter at sgi.com Mon Apr 14 12:57:56 2008 From: clameter at sgi.com (Christoph Lameter) Date: Mon, 14 Apr 2008 12:57:56 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 3 of 9] Moves all mmu notifier methods outside the PT lock (first and not last In-Reply-To: <33de2e17d0f567051583.1207669446@duo.random> References: <33de2e17d0f567051583.1207669446@duo.random> Message-ID: Not sure why this patch is not merged into 2 of 9. Same comment as last round. From clameter at sgi.com Mon Apr 14 12:59:54 2008 From: clameter at sgi.com (Christoph Lameter) Date: Mon, 14 Apr 2008 12:59:54 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 2 of 9] Core of mmu notifiers In-Reply-To: References: Message-ID: Where is the documentation on locking that you wanted to provide? From arlin.r.davis at intel.com Mon Apr 14 15:38:02 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Mon, 14 Apr 2008 15:38:02 -0700 Subject: [ofa-general] [PATCH][RFC] dapl v1.2: change packaging to modify OFA provider contents of dat.conf instead of file replacement. Message-ID: Change the packaging to update only the OFA provider contents in dat.conf. This allows other dapl providers, other then OFA, to co-exist and configure properly. Adding man page to explain syntax of static configuration file since there will no longer be comments in dat.conf. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- Makefile.am | 23 +++++++++++++++++--- dapl.spec.in | 25 +++++++++++++++++++--- doc/dat.conf | 26 ----------------------- man/dat.conf.5 | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 102 insertions(+), 34 deletions(-) delete mode 100644 doc/dat.conf create mode 100644 man/dat.conf.5 diff --git a/Makefile.am b/Makefile.am index 5621768..079ad7f 100644 --- a/Makefile.am +++ b/Makefile.am @@ -17,8 +17,6 @@ else DBGFLAGS = -g endif -sysconf_DATA = doc/dat.conf - datlibdir = $(libdir) dapllibcmadir = $(libdir) @@ -183,7 +181,7 @@ libdatinclude_HEADERS = dat/include/dat/dat.h \ dat/include/dat/udat_redirection.h \ dat/include/dat/udat_vendor_specific.h -man_MANS = man/dtest.1 man/dapltest.1 +man_MANS = man/dtest.1 man/dapltest.1 man/dat.conf.5 EXTRA_DIST = dat/common/dat_dictionary.h \ dat/common/dat_dr.h \ @@ -231,7 +229,6 @@ EXTRA_DIST = dat/common/dat_dictionary.h \ dapl/openib_scm/dapl_ib_dto.h \ dapl/openib_scm/dapl_ib_util.h \ dat/udat/libdat.map \ - doc/dat.conf \ dapl/udapl/libdaplcma.map \ dapl.spec.in \ $(man_MANS) \ @@ -265,5 +262,23 @@ EXTRA_DIST = dat/common/dat_dictionary.h \ dist-hook: dapl.spec cp dapl.spec $(distdir) + +install-exec-hook: + if test -e $(sysconfdir)/dat.conf; then \ + echo "exec-hook"; \ + sed -e '/OpenIB-.* u1/d' < $(sysconfdir)/dat.conf > /tmp/$$$$OpenIBdapl; \ + cp /tmp/$$$$OpenIBdapl $(sysconfdir)/dat.conf; \ + fi; \ + echo OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 '"ib0 0" ""' >> $(sysconfdir)/dat.conf; \ + echo OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 '"ib1 0" ""' >> $(sysconfdir)/dat.conf; \ + echo OpenIB-cma-2 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 '"ib2 0" ""' >> $(sysconfdir)/dat.conf; \ + echo OpenIB-cma-3 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 '"ib3 0" ""' >> $(sysconfdir)/dat.conf; \ + echo OpenIB-bond u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 '"bond0 0" ""' >> $(sysconfdir)/dat.conf; + +uninstall-hook: + if test -e $(sysconfdir)/dat.conf; then \ + sed -e '/OpenIB-.* u1/d' < $(sysconfdir)/dat.conf > /tmp/$$$$OpenIBdapl; \ + cp /tmp/$$$$OpenIBdapl $(sysconfdir)/dat.conf; \ + fi; SUBDIRS = . test/dtest test/dapltest diff --git a/dapl.spec.in b/dapl.spec.in index e3875a1..239e285 100644 --- a/dapl.spec.in +++ b/dapl.spec.in @@ -87,13 +87,29 @@ rm -f %{buildroot}%{_libdir}/*.la %clean rm -rf %{buildroot} -%post -p /sbin/ldconfig -%postun -p /sbin/ldconfig +%post +/sbin/ldconfig +if [ -e %{_sysconfdir}/dat.conf ]; then + sed -e '/OpenIB-.* u1/d' < %{_sysconfdir}/dat.conf > /tmp/$$ofadapl + mv /tmp/$$ofadapl %{_sysconfdir}/dat.conf +fi +echo OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 '"ib0 0" ""' >> %{_sysconfdir}/dat.conf +echo OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 '"ib1 0" ""' >> %{_sysconfdir}/dat.conf +echo OpenIB-cma-2 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 '"ib2 0" ""' >> %{_sysconfdir}/dat.conf +echo OpenIB-cma-3 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 '"ib3 0" ""' >> %{_sysconfdir}/dat.conf +echo OpenIB-bond u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 '"bond0 0" ""' >> %{_sysconfdir}/dat.conf + + +%postun +/sbin/ldconfig +if [ -e %{_sysconfdir}/dat.conf ]; then + sed -e '/OpenIB-.* u1/d' < %{_sysconfdir}/dat.conf > /tmp/$$OpenIBdapl + mv /tmp/$$OpenIBdapl %{_sysconfdir}/dat.conf +fi %files %defattr(-,root,root,-) %{_libdir}/libda*.so.* -%config(noreplace) %{_sysconfdir}/dat.conf %doc AUTHORS README ChangeLog %files devel @@ -109,7 +125,8 @@ rm -rf %{buildroot} %files utils %defattr(-,root,root,-) %{_bindir}/* -%{_mandir}/man1/* +%{_mandir}/man1/*.1* +%{_mandir}/man5/*.5* %changelog * Thu Feb 14 2008 Arlin Davis - 1.2.5 diff --git a/doc/dat.conf b/doc/dat.conf deleted file mode 100644 index 06142f8..0000000 --- a/doc/dat.conf +++ /dev/null @@ -1,26 +0,0 @@ -# -# DAT 1.2 and 2.0 configuration file -# -# Each entry should have the following fields: -# -# \ -# -# -# For the uDAPL cma provder, specify as one of the following: -# network address, network hostname, or netdev name and 0 for port -# -# Simple (OpenIB-cma) default with netdev name provided first on list -# to enable use of same dat.conf version on all nodes -# -# 1.2 and 2.0 examples for multiple interfaces, IPoIB HA failover, bonding: -# -OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib0 0" "" -OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib1 0" "" -OpenIB-cma-2 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib2 0" "" -OpenIB-cma-3 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib3 0" "" -OpenIB-bond u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "bond0 0" "" -ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" "" -ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib1 0" "" -ofa-v2-ib2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib2 0" "" -ofa-v2-ib3 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib3 0" "" -ofa-v2-bond u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "bond0 0" "" diff --git a/man/dat.conf.5 b/man/dat.conf.5 new file mode 100644 index 0000000..6dee668 --- /dev/null +++ b/man/dat.conf.5 @@ -0,0 +1,62 @@ +.TH "DAT.CONF" "5" "25 March 2008" "" "" +.SH NAME +dat.conf \- configuration file for static registration of user-level DAT rdma providers +.SH "DESCRIPTION" +.PP +The DAT (direct access transport) architecture supports the use of +multiple DAT providers within a single consumer application. +Consumers implicitly select a provider using the Interface Adapter +name parameter passed to dat_ia_open(). +.PP +The subsystem that maps Interface Adapter names to provider +implementations is known as the DAT registry. When a consumer calls +dat_ia_open(), the appropriate provider is found and notified of the +consumer's request to access the IA. After this point, all DAT API +calls acting on DAT objects are automatically directed to the +appropriate provider entry points. +.PP +A persistent, administratively configurable database is used to store +mappings from IA names to provider information. This provider +information includes: the file system path to the provider library +object, version information, and thread safety information. The +location and format of the registry is platform dependent. This +database is known as the Static Registry (SR) and is provided via +entries in the \fIdat.conf\fR file. The process of adding a provider +entry is termed Static Registration. +.PP +.SH "Registry File Format" +\br + * All characters after # on a line are ignored (comments). + * Lines on which there are no characters other than whitespace + and comments are considered blank lines and are ignored. + * Non-blank lines must have seven whitespace separated fields. + These fields may contain whitespace if the field is quoted + with double quotes. Within fields quoated with double quotes, + the backslash or qoute are valid escape sequences: + * Each non-blank line will contain the following fields: + - The IA Name. + - The API version of the library: + [k|u]major.minor where "major" and "minor" are both integers + in decimal format. User-level examples: "u1.2", and "u2.0". + - Whether the library is thread-safe: [threadsafe|nonthreadsafe] + - Whether this is the default section: [default|nondefault] + - The library image, version included, to be loaded. + - The vendor id and version of DAPL provider: id.major.minor + - ia params, IA specific parameters - device name and port + - platform params, (not used) +.PP +.SH Example netdev entries for OpenFabrics rdma_cm providers, both v1.2 and v2.0 +\br + OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib0 0" "" + ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" "" + + NOTE: The OpenFabrics providers use to specify the device with one of the following: + network address, network hostname, or netdev name; along with port number. + + The OpenIB- and ofa-v2- IA names are unique mappings. Reserved for OpenFabrics providers. +.PP +The default location for this configuration file is /etc/dat.conf. +The file location may be overridden with the environment variable DAT_OVERRIDE=/your_own_directory/your_dat.conf. +.PP +.SH "SEE ALSO" +.PP -- 1.5.2.5 From arlin.r.davis at intel.com Mon Apr 14 15:38:06 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Mon, 14 Apr 2008 15:38:06 -0700 Subject: [ofa-general] [PATCH][RFC] dapl v2.0: change packaging to modify OFA provider contents of dat.conf instead of file replacement. Message-ID: <000001c89e80$35985f80$9f97070a@amr.corp.intel.com> Change the packaging to update only the OFA provider contents in dat.conf. This allows other dapl providers, other then OFA, to co-exist and configure properly. Adding man page to explain syntax of this static configuration file since there will no longer be comments in dat.conf. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- Makefile.am | 25 ++++++++++++++++++---- dapl.spec.in | 24 ++++++++++++++++++--- doc/dat.conf | 26 ----------------------- man/dat.conf.5 | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 102 insertions(+), 35 deletions(-) delete mode 100755 doc/dat.conf create mode 100644 man/dat.conf.5 diff --git a/Makefile.am b/Makefile.am index 60b3db6..bb75dea 100755 --- a/Makefile.am +++ b/Makefile.am @@ -25,8 +25,6 @@ else DBGFLAGS = -g endif -sysconf_DATA = doc/dat.conf - datlibdir = $(libdir) dapllibofadir = $(libdir) @@ -195,7 +193,7 @@ libdatinclude_HEADERS = dat/include/dat2/dat.h \ dat/include/dat2/udat_vendor_specific.h \ dat/include/dat2/dat_ib_extensions.h -man_MANS = man/dtest.1 man/dapltest.1 +man_MANS = man/dtest.1 man/dapltest.1 man/dat.conf.5 EXTRA_DIST = dat/common/dat_dictionary.h \ dat/common/dat_dr.h \ @@ -241,7 +239,6 @@ EXTRA_DIST = dat/common/dat_dictionary.h \ dapl/openib_cma/dapl_ib_dto.h \ dapl/openib_cma/dapl_ib_util.h \ dat/udat/libdat2.map \ - doc/dat.conf \ dapl/udapl/libdaplofa.map \ dapl.spec.in \ $(man_MANS) \ @@ -275,5 +272,23 @@ EXTRA_DIST = dat/common/dat_dictionary.h \ dist-hook: dapl.spec cp dapl.spec $(distdir) - + +install-exec-hook: + if test -e $(sysconfdir)/dat.conf; then \ + sed -e '/ofa-v2-.* u2/d' < $(sysconfdir)/dat.conf > /tmp/$$$$ofadapl; \ + cp /tmp/$$$$ofadapl $(sysconfdir)/dat.conf; \ + fi; \ + echo ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"ib0 0" ""' >> $(sysconfdir)/dat.conf; \ + echo ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"ib1 0" ""' >> $(sysconfdir)/dat.conf; \ + echo ofa-v2-ib2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"ib2 0" ""' >> $(sysconfdir)/dat.conf; \ + echo ofa-v2-ib3 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"ib3 0" ""' >> $(sysconfdir)/dat.conf; \ + echo ofa-v2-bond u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"bond0 0" ""' >> $(sysconfdir)/dat.conf; + +uninstall-hook: + if test -e $(sysconfdir)/dat.conf; then \ + sed -e '/ofa-v2-.* u2/d' < $(sysconfdir)/dat.conf > /tmp/$$$$ofadapl; \ + cp /tmp/$$$$ofadapl $(sysconfdir)/dat.conf; \ + fi; + SUBDIRS = . test/dtest test/dapltest + diff --git a/dapl.spec.in b/dapl.spec.in index 945ec78..1c656ca 100644 --- a/dapl.spec.in +++ b/dapl.spec.in @@ -87,13 +87,28 @@ rm -f %{buildroot}%{_libdir}/*.la %clean rm -rf %{buildroot} -%post -p /sbin/ldconfig -%postun -p /sbin/ldconfig +%post +/sbin/ldconfig +if [ -e %{_sysconfdir}/dat.conf ]; then + sed -e '/ofa-v2-.* u2/d' < %{_sysconfdir}/dat.conf > /tmp/$$ofadapl + mv /tmp/$$ofadapl %{_sysconfdir}/dat.conf +fi +echo ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"ib0 0" ""' >> %{_sysconfdir}/dat.conf +echo ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"ib1 0" ""' >> %{_sysconfdir}/dat.conf +echo ofa-v2-ib2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"ib2 0" ""' >> %{_sysconfdir}/dat.conf +echo ofa-v2-ib3 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"ib3 0" ""' >> %{_sysconfdir}/dat.conf +echo ofa-v2-bond u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"bond0 0" ""' >> %{_sysconfdir}/dat.conf + +%postun +/sbin/ldconfig +if [ -e %{_sysconfdir}/dat.conf ]; then + sed -e '/ofa-v2-.* u2/d' < %{_sysconfdir}/dat.conf > /tmp/$$ofadapl + mv /tmp/$$ofadapl %{_sysconfdir}/dat.conf +fi %files %defattr(-,root,root,-) %{_libdir}/libda*.so.* -%config(noreplace) %{_sysconfdir}/dat.conf %doc AUTHORS README ChangeLog %files devel @@ -109,7 +124,8 @@ rm -rf %{buildroot} %files utils %defattr(-,root,root,-) %{_bindir}/* -%{_mandir}/man1/* +%{_mandir}/man1/*.1* +%{_mandir}/man5/*.5* %changelog * Thu Feb 14 2008 Arlin Davis - 2.0.7 diff --git a/doc/dat.conf b/doc/dat.conf deleted file mode 100755 index 06142f8..0000000 --- a/doc/dat.conf +++ /dev/null @@ -1,26 +0,0 @@ -# -# DAT 1.2 and 2.0 configuration file -# -# Each entry should have the following fields: -# -# \ -# -# -# For the uDAPL cma provder, specify as one of the following: -# network address, network hostname, or netdev name and 0 for port -# -# Simple (OpenIB-cma) default with netdev name provided first on list -# to enable use of same dat.conf version on all nodes -# -# 1.2 and 2.0 examples for multiple interfaces, IPoIB HA failover, bonding: -# -OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib0 0" "" -OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib1 0" "" -OpenIB-cma-2 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib2 0" "" -OpenIB-cma-3 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib3 0" "" -OpenIB-bond u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "bond0 0" "" -ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" "" -ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib1 0" "" -ofa-v2-ib2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib2 0" "" -ofa-v2-ib3 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib3 0" "" -ofa-v2-bond u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "bond0 0" "" diff --git a/man/dat.conf.5 b/man/dat.conf.5 new file mode 100644 index 0000000..6dee668 --- /dev/null +++ b/man/dat.conf.5 @@ -0,0 +1,62 @@ +.TH "DAT.CONF" "5" "25 March 2008" "" "" +.SH NAME +dat.conf \- configuration file for static registration of user-level DAT rdma providers +.SH "DESCRIPTION" +.PP +The DAT (direct access transport) architecture supports the use of +multiple DAT providers within a single consumer application. +Consumers implicitly select a provider using the Interface Adapter +name parameter passed to dat_ia_open(). +.PP +The subsystem that maps Interface Adapter names to provider +implementations is known as the DAT registry. When a consumer calls +dat_ia_open(), the appropriate provider is found and notified of the +consumer's request to access the IA. After this point, all DAT API +calls acting on DAT objects are automatically directed to the +appropriate provider entry points. +.PP +A persistent, administratively configurable database is used to store +mappings from IA names to provider information. This provider +information includes: the file system path to the provider library +object, version information, and thread safety information. The +location and format of the registry is platform dependent. This +database is known as the Static Registry (SR) and is provided via +entries in the \fIdat.conf\fR file. The process of adding a provider +entry is termed Static Registration. +.PP +.SH "Registry File Format" +\br + * All characters after # on a line are ignored (comments). + * Lines on which there are no characters other than whitespace + and comments are considered blank lines and are ignored. + * Non-blank lines must have seven whitespace separated fields. + These fields may contain whitespace if the field is quoted + with double quotes. Within fields quoated with double quotes, + the backslash or qoute are valid escape sequences: + * Each non-blank line will contain the following fields: + - The IA Name. + - The API version of the library: + [k|u]major.minor where "major" and "minor" are both integers + in decimal format. User-level examples: "u1.2", and "u2.0". + - Whether the library is thread-safe: [threadsafe|nonthreadsafe] + - Whether this is the default section: [default|nondefault] + - The library image, version included, to be loaded. + - The vendor id and version of DAPL provider: id.major.minor + - ia params, IA specific parameters - device name and port + - platform params, (not used) +.PP +.SH Example netdev entries for OpenFabrics rdma_cm providers, both v1.2 and v2.0 +\br + OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib0 0" "" + ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" "" + + NOTE: The OpenFabrics providers use to specify the device with one of the following: + network address, network hostname, or netdev name; along with port number. + + The OpenIB- and ofa-v2- IA names are unique mappings. Reserved for OpenFabrics providers. +.PP +The default location for this configuration file is /etc/dat.conf. +The file location may be overridden with the environment variable DAT_OVERRIDE=/your_own_directory/your_dat.conf. +.PP +.SH "SEE ALSO" +.PP -- 1.5.2.5 From clameter at sgi.com Mon Apr 14 16:09:26 2008 From: clameter at sgi.com (Christoph Lameter) Date: Mon, 14 Apr 2008 16:09:26 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12 In-Reply-To: References: Message-ID: On Tue, 8 Apr 2008, Andrea Arcangeli wrote: > The difference with #v11 is a different implementation of mm_lock that > guarantees handling signals in O(N). It's also more lowlatency friendly. Ok. So the rest of the issues remains unaddressed? I am glad that we finally settled on the locking. But now I will have to clean this up, address the remaining issues, sequence the patches right, provide docs, handle the merging issue etc etc? I have seen no detailed review of my patches that you include here. We are going down the same road as we had to go with the OOM patches where David Rientjes and me had to deal with the issues you raised? From rdreier at cisco.com Mon Apr 14 21:01:24 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 14 Apr 2008 21:01:24 -0700 Subject: [ofa-general] Re: [PATCH] IB/ehca: extend query_device() and query_port() to support all values for ibv_devinfo In-Reply-To: <200804071457.36248.ossrosch@linux.vnet.ibm.com> (Stefan Roscher's message of "Mon, 7 Apr 2008 13:57:33 +0100") References: <200804071457.36248.ossrosch@linux.vnet.ibm.com> Message-ID: thanks, applied From rdreier at cisco.com Mon Apr 14 21:03:20 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 14 Apr 2008 21:03:20 -0700 Subject: [ofa-general] Re: [PATCH] IB/mlx4: Fix race when detaching a QP from a MCG In-Reply-To: <1208071321.9534.2.camel@mtls03> (Eli Cohen's message of "Sun, 13 Apr 2008 10:22:01 +0300") References: <1208071321.9534.2.camel@mtls03> Message-ID: thanks, applied. From rdreier at cisco.com Mon Apr 14 21:04:22 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 14 Apr 2008 21:04:22 -0700 Subject: [ofa-general] Re: IB/mlx4: fix code comment In-Reply-To: <1208071395.9534.4.camel@mtls03> (Eli Cohen's message of "Sun, 13 Apr 2008 10:23:15 +0300") References: <1208071395.9534.4.camel@mtls03> Message-ID: thanks, applied From rdreier at cisco.com Mon Apr 14 21:05:55 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 14 Apr 2008 21:05:55 -0700 Subject: [ofa-general] Re: [PATCH v2] IB/iSER: Release connection resources when receiving a RDMA_CM_EVENT_DEVICE_REMOVAL event In-Reply-To: <48032B8F.1030504@voltaire.com> (Erez Zilber's message of "Mon, 14 Apr 2008 13:01:51 +0300") References: <47FB489C.6030507@voltaire.com> <48032B8F.1030504@voltaire.com> Message-ID: thanks, applied... I assume this much simpler patch replaces the earlier one completely? From rdreier at cisco.com Mon Apr 14 21:09:56 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 14 Apr 2008 21:09:56 -0700 Subject: [ofa-general] Re: [PATCH] do not change itt endianness In-Reply-To: <4803530A.3010408@voltaire.com> (Erez Zilber's message of "Mon, 14 Apr 2008 15:50:18 +0300") References: <47E14B45.9040509@cs.wisc.edu> <4803530A.3010408@voltaire.com> Message-ID: > - itt = ntohl(hdr->itt); > + itt = hdr->itt; This still gives the sparse warning drivers/infiniband/ulp/iser/iser_initiator.c:419:6: warning: incorrect type in assignment (different base types) drivers/infiniband/ulp/iser/iser_initiator.c:419:6: expected unsigned int [unsigned] itt drivers/infiniband/ulp/iser/iser_initiator.c:419:6: got restricted unsigned int [usertype] itt I guess the two possibilities are to use get_itt() or use a __force cast if you don't want the masking that get_itt() does. Which is correct? - R. From erezz at voltaire.com Mon Apr 14 22:50:18 2008 From: erezz at voltaire.com (Erez Zilber) Date: Tue, 15 Apr 2008 08:50:18 +0300 Subject: [ofa-general] Re: [PATCH v2] IB/iSER: Release connection resources when receiving a RDMA_CM_EVENT_DEVICE_REMOVAL event In-Reply-To: References: <47FB489C.6030507@voltaire.com> <48032B8F.1030504@voltaire.com> Message-ID: <4804421A.2030208@voltaire.com> Roland Dreier wrote: > thanks, applied... I assume this much simpler patch replaces the earlier > one completely? > Yes (that's why I added "v2" in the subject). Thanks, Erez From rdreier at cisco.com Mon Apr 14 22:55:05 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 14 Apr 2008 22:55:05 -0700 Subject: [ofa-general] [PATCH/RFC] IPoIB: Handle case when P_Key is deleted and re-added at same index Message-ID: If a P_Key is deleted and then re-added at the same index, then IPoIB gets confused because __ipoib_ib_dev_flush() only checks whether the index is the same without checking whether the P_Key was present, so the interface is stopped when the P_Key is deleted, but the event when the P_Key is re-added gets ignored and the interface never gets restarted. Also, switch to using ib_find_pkey() instead of ib_find_cached_pkey() everywhere in IPoIB, since none of the places that look for P_Keys are in a fast path or in non-sleeping context, and in general we want to kill off the whole caching infrastructure eventually. This also fixes consistency problems caused because some IPoIB queries were cached and some were uncached during the window where the cache was not updated. Thanks to Venkata Subramonyam for debugging this problem and testing this fix. Signed-off-by: Roland Dreier --- drivers/infiniband/ulp/ipoib/ipoib_cm.c | 4 ++-- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 10 +++++----- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 9d411f2..9db7b0b 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -1007,9 +1007,9 @@ static int ipoib_cm_modify_tx_init(struct net_device *dev, struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_qp_attr qp_attr; int qp_attr_mask, ret; - ret = ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &qp_attr.pkey_index); + ret = ib_find_pkey(priv->ca, priv->port, priv->pkey, &qp_attr.pkey_index); if (ret) { - ipoib_warn(priv, "pkey 0x%x not in cache: %d\n", priv->pkey, ret); + ipoib_warn(priv, "pkey 0x%x not found: %d\n", priv->pkey, ret); return ret; } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 8b4ff69..0205eb7 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -594,7 +594,7 @@ static void ipoib_pkey_dev_check_presence(struct net_device *dev) struct ipoib_dev_priv *priv = netdev_priv(dev); u16 pkey_index = 0; - if (ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &pkey_index)) + if (ib_find_pkey(priv->ca, priv->port, priv->pkey, &pkey_index)) clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); else set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); @@ -835,13 +835,13 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, int pkey_event) clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); ipoib_ib_dev_down(dev, 0); ipoib_ib_dev_stop(dev, 0); - ipoib_pkey_dev_delay_open(dev); - return; + if (ipoib_pkey_dev_delay_open(dev)) + return; } - set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); /* restart QP only if P_Key index is changed */ - if (new_index == priv->pkey_index) { + if (test_and_set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags) && + new_index == priv->pkey_index) { ipoib_dbg(priv, "Not flushing - P_Key index not changed.\n"); return; } -- 1.5.5 From erezz at Voltaire.COM Mon Apr 14 23:25:51 2008 From: erezz at Voltaire.COM (Erez Zilber) Date: Tue, 15 Apr 2008 09:25:51 +0300 Subject: [ewg] Re: [ofa-general] resolve conflict between OFED 1.3 and 2.6.18 with ISCSI In-Reply-To: <48037083.6000209@voltaire.com> References: <1208176877.22671.54.camel@pc.ilinx><480356B6.3040403@voltaire.com><1208180482.22671.67.camel@pc.ilinx><48036112.6070505@voltaire.com><1208183810.22671.90.camel@pc.ilinx> <48037083.6000209@voltaire.com> Message-ID: <48044A6F.8040107@Voltaire.COM> > > > >> BTW - I don't mind if other people add the required code to OFED > 1.4 for > >> qla4xxx support. > >> > > > > ~sigh~ Yeah. > > > > I wonder how many (if any) of our userbase we are going to upset if we > > cease providing the qla4xxx driver in our kernels. On the other hand, I > > wonder how many we'd upset by not providing iSER and the newer > > open-iscsi modules. > > > > > > Yeah, I understand. Let me get back to you on this issue. > Brian, Voltaire will not be able to add qla4xxx support to open-iscsi in OFED 1.4. I understand that this may be important for some people, so if you (or anyone else) wants to add it, we can help with some info about open-iscsi and its backports & scripts in OFED (but we can't do the backports and testing ourselves). Erez From erezz at Voltaire.COM Mon Apr 14 23:33:29 2008 From: erezz at Voltaire.COM (Erez Zilber) Date: Tue, 15 Apr 2008 09:33:29 +0300 Subject: [ofa-general] Re: [PATCH] do not change itt endianness In-Reply-To: References: <47E14B45.9040509@cs.wisc.edu> <4803530A.3010408@voltaire.com> Message-ID: <48044C39.7090403@Voltaire.COM> Roland Dreier wrote: > > - itt = ntohl(hdr->itt); > > + itt = hdr->itt; > > This still gives the sparse warning > > drivers/infiniband/ulp/iser/iser_initiator.c:419:6: warning: incorrect type in assignment (different base types) > drivers/infiniband/ulp/iser/iser_initiator.c:419:6: expected unsigned int [unsigned] itt > drivers/infiniband/ulp/iser/iser_initiator.c:419:6: got restricted unsigned int [usertype] itt > > I guess the two possibilities are to use get_itt() or use a __force cast > if you don't want the masking that get_itt() does. Which is correct? > > - R. > Roland, If I just run 'make', I don't see the warning. What should I do in order to get the same warning that you get? Thanks, Erez From ogerlitz at voltaire.com Tue Apr 15 00:40:35 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 15 Apr 2008 10:40:35 +0300 Subject: [ofa-general] Pending libibverbs patches? In-Reply-To: References: Message-ID: <48045BF3.8040305@voltaire.com> Roland Dreier wrote: > I would like to make a 1.1.2 release of libibverbs as a sort of > checkpoint before working on possibly destabilizing stuff such as > merging XRC or other verbs extensions. But I would like to know what > pending work people have sent me (that I've probably lost track of), > especially small safe stuff that could go into 1.1.2. > There's the verbs.7 man page which was submitted on February (http://www.mail-archive.com/general at lists.openfabrics.org/msg11871.html) and following the discussion was fixed to reflect the feedback from the list (http://lists.openfabrics.org/pipermail/ewg/2008-April/006340.html). Or. From hrosenstock at xsigo.com Tue Apr 15 07:21:59 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Tue, 15 Apr 2008 07:21:59 -0700 Subject: [ofa-general] Re: [PATCH] do not change itt endianness In-Reply-To: <48044C39.7090403@Voltaire.COM> References: <47E14B45.9040509@cs.wisc.edu> <4803530A.3010408@voltaire.com> <48044C39.7090403@Voltaire.COM> Message-ID: <1208269319.1056.103.camel@hrosenstock-ws.xsigo.com> Erez, On Tue, 2008-04-15 at 09:33 +0300, Erez Zilber wrote: > Roland Dreier wrote: > > > - itt = ntohl(hdr->itt); > > > + itt = hdr->itt; > > > > This still gives the sparse warning > > > > drivers/infiniband/ulp/iser/iser_initiator.c:419:6: warning: incorrect type in assignment (different base types) > > drivers/infiniband/ulp/iser/iser_initiator.c:419:6: expected unsigned int [unsigned] itt > > drivers/infiniband/ulp/iser/iser_initiator.c:419:6: got restricted unsigned int [usertype] itt > > > > I guess the two possibilities are to use get_itt() or use a __force cast > > if you don't want the masking that get_itt() does. Which is correct? > > > > - R. > > > > Roland, > > If I just run 'make', I don't see the warning. What should I do in order > to get the same warning that you get? Try: make C=1 Look at Documentation/sparse.txt -- Hal > Thanks, > Erez > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From changquing.tang at hp.com Tue Apr 15 07:32:37 2008 From: changquing.tang at hp.com (Tang, Changqing) Date: Tue, 15 Apr 2008 14:32:37 +0000 Subject: [ofa-general] Sonoma Conference Presentation Slides Message-ID: Are all the slides ready for public access ? Can anyone tell ? Thanks. --CQ From Arkady.Kanevsky at netapp.com Tue Apr 15 08:50:33 2008 From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady) Date: Tue, 15 Apr 2008 11:50:33 -0400 Subject: [ofa-general] FW: [Interop-wg] next Interop event dates Message-ID: Arkady Kanevsky email: arkady at netapp.com Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 ________________________________ From: Kanevsky, Arkady Sent: Tuesday, April 15, 2008 11:47 AM To: interop-wg at lists.openfabrics.org; openib-general at openib.org Subject: [Interop-wg] next Interop event dates Next Interop Event * IBTA Plugfest - September 22nd - 26th * iWARP Plugfest - September 22nd - 26th * OFA Interop Event - September 29th - October 3rd If you plan to participate, please, let IWG ( interop-wg at lists.openfabrics.org) know. Thanks, Arkady Kanevsky email: arkady at netapp.com Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ATT32814388.txt URL: From rdreier at cisco.com Tue Apr 15 09:21:26 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 15 Apr 2008 09:21:26 -0700 Subject: [ofa-general] Re: [PATCH] do not change itt endianness In-Reply-To: <48044C39.7090403@Voltaire.COM> (Erez Zilber's message of "Tue, 15 Apr 2008 09:33:29 +0300") References: <47E14B45.9040509@cs.wisc.edu> <4803530A.3010408@voltaire.com> <48044C39.7090403@Voltaire.COM> Message-ID: > If I just run 'make', I don't see the warning. What should I do in order > to get the same warning that you get? You need to use sparse -- install sparse, and then add 'C=2 CF=-D__CHECK_ENDIAN__' to your make command line. - R. From Jeffrey.C.Becker at nasa.gov Tue Apr 15 09:44:20 2008 From: Jeffrey.C.Becker at nasa.gov (Jeff Becker) Date: Tue, 15 Apr 2008 09:44:20 -0700 Subject: [ofa-general] Sonoma Conference Presentation Slides In-Reply-To: References: Message-ID: <4804DB64.6000506@nasa.gov> I've received most of the slides, and put the presentations on the server. Jeff Scott and his team are preparing the conference web page that will allow public access. I'll add presentations as I get them. Thanks. -jeff Tang, Changqing wrote: > Are all the slides ready for public access ? Can anyone tell ? > > Thanks. > > --CQ > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From weiny2 at llnl.gov Tue Apr 15 09:47:50 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Tue, 15 Apr 2008 09:47:50 -0700 Subject: [ofa-general] Pending libibverbs patches? In-Reply-To: References: Message-ID: <20080415094750.35afc0e5.weiny2@llnl.gov> On Mon, 14 Apr 2008 09:01:47 -0700 Roland Dreier wrote: > I would like to make a 1.1.2 release of libibverbs as a sort of > checkpoint before working on possibly destabilizing stuff such as > merging XRC or other verbs extensions. But I would like to know what > pending work people have sent me (that I've probably lost track of), > especially small safe stuff that could go into 1.1.2. > Roland, I wonder if you would take a small patch to map enums to strings. I thought I submitted this before but I do not find it in the list archive so I must have forgotten about it. Thanks, Ira Weiny weiny2 at llnl.gov >From ccb34b2de8ecbad9e59036ba7c21cf3ac4179120 Mon Sep 17 00:00:00 2001 From: Ira K. Weiny Date: Wed, 5 Sep 2007 17:10:11 -0700 Subject: [PATCH] Add enum strings and *_str functions for enums Signed-off-by: Ira K. Weiny --- Makefile.am | 3 +- examples/devinfo.c | 13 +----- examples/rc_pingpong.c | 3 +- examples/srq_pingpong.c | 3 +- examples/uc_pingpong.c | 3 +- examples/ud_pingpong.c | 3 +- include/infiniband/verbs.h | 28 ++++++++++++ src/enum_strs.c | 100 ++++++++++++++++++++++++++++++++++++++++++++ src/libibverbs.map | 5 ++ 9 files changed, 144 insertions(+), 17 deletions(-) create mode 100644 src/enum_strs.c diff --git a/Makefile.am b/Makefile.am index 705b184..46e2354 100644 --- a/Makefile.am +++ b/Makefile.am @@ -9,7 +9,8 @@ src_libibverbs_la_CFLAGS = $(AM_CFLAGS) -DIBV_CONFIG_DIR=\"$(sysconfdir)/libibve libibverbs_version_script = @LIBIBVERBS_VERSION_SCRIPT@ src_libibverbs_la_SOURCES = src/cmd.c src/compat-1_0.c src/device.c src/init.c \ - src/marshall.c src/memory.c src/sysfs.c src/verbs.c + src/marshall.c src/memory.c src/sysfs.c src/verbs.c \ + src/enum_strs.c src_libibverbs_la_LDFLAGS = -version-info 1 -export-dynamic \ $(libibverbs_version_script) src_libibverbs_la_DEPENDENCIES = $(srcdir)/src/libibverbs.map diff --git a/examples/devinfo.c b/examples/devinfo.c index 4e4316a..1fadc80 100644 --- a/examples/devinfo.c +++ b/examples/devinfo.c @@ -67,17 +67,6 @@ static const char *guid_str(uint64_t node_guid, char *str) return str; } -static const char *port_state_str(enum ibv_port_state pstate) -{ - switch (pstate) { - case IBV_PORT_DOWN: return "PORT_DOWN"; - case IBV_PORT_INIT: return "PORT_INIT"; - case IBV_PORT_ARMED: return "PORT_ARMED"; - case IBV_PORT_ACTIVE: return "PORT_ACTIVE"; - default: return "invalid state"; - } -} - static const char *port_phy_state_str(uint8_t phys_state) { switch (phys_state) { @@ -266,7 +255,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port) } printf("\t\tport:\t%d\n", port); printf("\t\t\tstate:\t\t\t%s (%d)\n", - port_state_str(port_attr.state), port_attr.state); + ibv_port_state_str(port_attr.state), port_attr.state); printf("\t\t\tmax_mtu:\t\t%s (%d)\n", mtu_str(port_attr.max_mtu), port_attr.max_mtu); printf("\t\t\tactive_mtu:\t\t%s (%d)\n", diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c index 7181914..26fa45c 100644 --- a/examples/rc_pingpong.c +++ b/examples/rc_pingpong.c @@ -709,7 +709,8 @@ int main(int argc, char *argv[]) for (i = 0; i < ne; ++i) { if (wc[i].status != IBV_WC_SUCCESS) { - fprintf(stderr, "Failed status %d for wr_id %d\n", + fprintf(stderr, "Failed status %s (%d) for wr_id %d\n", + ibv_wc_status_str(wc[i].status), wc[i].status, (int) wc[i].wr_id); return 1; } diff --git a/examples/srq_pingpong.c b/examples/srq_pingpong.c index bc869c9..95bebf4 100644 --- a/examples/srq_pingpong.c +++ b/examples/srq_pingpong.c @@ -805,7 +805,8 @@ int main(int argc, char *argv[]) for (i = 0; i < ne; ++i) { if (wc[i].status != IBV_WC_SUCCESS) { - fprintf(stderr, "Failed status %d for wr_id %d\n", + fprintf(stderr, "Failed status %s (%d) for wr_id %d\n", + ibv_wc_status_str(wc[i].status), wc[i].status, (int) wc[i].wr_id); return 1; } diff --git a/examples/uc_pingpong.c b/examples/uc_pingpong.c index 6135030..c09c8c1 100644 --- a/examples/uc_pingpong.c +++ b/examples/uc_pingpong.c @@ -697,7 +697,8 @@ int main(int argc, char *argv[]) for (i = 0; i < ne; ++i) { if (wc[i].status != IBV_WC_SUCCESS) { - fprintf(stderr, "Failed status %d for wr_id %d\n", + fprintf(stderr, "Failed status %s (%d) for wr_id %d\n", + ibv_wc_status_str(wc[i].status), wc[i].status, (int) wc[i].wr_id); return 1; } diff --git a/examples/ud_pingpong.c b/examples/ud_pingpong.c index aaee26c..8f3d50b 100644 --- a/examples/ud_pingpong.c +++ b/examples/ud_pingpong.c @@ -697,7 +697,8 @@ int main(int argc, char *argv[]) for (i = 0; i < ne; ++i) { if (wc[i].status != IBV_WC_SUCCESS) { - fprintf(stderr, "Failed status %d for wr_id %d\n", + fprintf(stderr, "Failed status %s (%d) for wr_id %d\n", + ibv_wc_status_str(wc[i].status), wc[i].status, (int) wc[i].wr_id); return 1; } diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h index a51bb9d..5facbf6 100644 --- a/include/infiniband/verbs.h +++ b/include/infiniband/verbs.h @@ -70,6 +70,13 @@ enum ibv_node_type { IBV_NODE_ROUTER, IBV_NODE_RNIC }; +extern const char *const __ibv_node_type_str[]; +static inline const char *ibv_node_type_str(enum ibv_node_type node_type) +{ + if (node_type < IBV_NODE_CA || node_type > IBV_NODE_RNIC) + node_type = 0; + return (__ibv_node_type_str[node_type]); +} enum ibv_transport_type { IBV_TRANSPORT_UNKNOWN = -1, @@ -160,6 +167,13 @@ enum ibv_port_state { IBV_PORT_ACTIVE = 4, IBV_PORT_ACTIVE_DEFER = 5 }; +extern const char *const __ibv_port_state_str[]; +static inline const char *ibv_port_state_str(enum ibv_port_state port_state) +{ + if (port_state < IBV_PORT_NOP || port_state > IBV_PORT_ACTIVE_DEFER) + port_state = IBV_PORT_ACTIVE_DEFER + 1; + return (__ibv_port_state_str[port_state]); +} struct ibv_port_attr { enum ibv_port_state state; @@ -203,6 +217,13 @@ enum ibv_event_type { IBV_EVENT_QP_LAST_WQE_REACHED, IBV_EVENT_CLIENT_REREGISTER }; +extern const char *const __ibv_event_type_str[]; +static inline const char *ibv_event_type_str(enum ibv_event_type event) +{ + if (event < IBV_EVENT_CQ_ERR || event > IBV_EVENT_CLIENT_REREGISTER) + event = (IBV_EVENT_CLIENT_REREGISTER+1); + return (__ibv_event_type_str[event]); +} struct ibv_async_event { union { @@ -238,6 +259,13 @@ enum ibv_wc_status { IBV_WC_RESP_TIMEOUT_ERR, IBV_WC_GENERAL_ERR }; +extern const char *const __ibv_wc_status_str[]; +static inline const char *ibv_wc_status_str(enum ibv_wc_status status) +{ + if (status < IBV_WC_SUCCESS || status > IBV_WC_GENERAL_ERR) + status = IBV_WC_GENERAL_ERR; + return (__ibv_wc_status_str[status]); +} enum ibv_wc_opcode { IBV_WC_SEND, diff --git a/src/enum_strs.c b/src/enum_strs.c new file mode 100644 index 0000000..d6dee4f --- /dev/null +++ b/src/enum_strs.c @@ -0,0 +1,100 @@ +/* + * Copyright (c) 2008 Lawrence Livermore National Laboratory + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#include + +const char *const __ibv_node_type_str[] = { + "UNKNOWN", + "Channel Adapter", + "Switch", + "Router", + "RNIC" +}; + +const char *const __ibv_port_state_str[] = { + "No State Change (NOP)", + "DOWN", + "INIT", + "ARMED", + "ACTIVE", + "ACTDEFER", + "UNKNOWN" +}; + +const char *const __ibv_event_type_str[] = { + "CQ Error", + "QP Fatal", + "QP Request Error", + "QP Access Error", + "Communication Established", + "SQ Drained", + "Path Migrated", + "Path Migration Request Error", + "Device Fatal", + "Port Active", + "Port Error", + "LID Change", + "PKey Change", + "SM Change", + "SRQ Error", + "SRQ Limit Reached", + "QP Last WQE Reached", + "Client Reregistration", + "UNKNOWN" +}; + +const char *const __ibv_wc_status_str[] = { + "Success", + "Local Length Error", + "Local QP Operation Error", + "Local EE Context Operation Error", + "Local Protection Error", + "Work Request Flushed Error", + "Memory Management Operation Error", + "Bad Response Error", + "Local Access Error", + "Remote Invalid Request Error", + "Remote Access Error", + "Remote Operation Error", + "Transport Retry Counter Exceeded", + "RNR Retry Counter Exceeded", + "Local RDD Violation Error", + "Remote Invalid RD Request", + "Aborted Error", + "Invalid EE Context Number", + "Invalid EE Context State", + "Fatal Error", + "Response Timeout Error", + "General Error" +}; + diff --git a/src/libibverbs.map b/src/libibverbs.map index 3a346ed..2bcf360 100644 --- a/src/libibverbs.map +++ b/src/libibverbs.map @@ -91,4 +91,9 @@ IBVERBS_1.1 { ibv_dontfork_range; ibv_dofork_range; ibv_register_driver; + + __ibv_node_type_str; + __ibv_port_state_str; + __ibv_event_type_str; + __ibv_wc_status_str; } IBVERBS_1.0; -- 1.5.1 From erezz at voltaire.com Tue Apr 15 09:53:18 2008 From: erezz at voltaire.com (Erez Zilber) Date: Tue, 15 Apr 2008 19:53:18 +0300 Subject: [ofa-general] [PATCH v2] do not change itt endianness In-Reply-To: References: <47E14B45.9040509@cs.wisc.edu> <4803530A.3010408@voltaire.com> Message-ID: <4804DD7E.3030501@voltaire.com> The itt field in struct iscsi_data is not defined with any particular endianness. open-iscsi should use it as-is without changing its endianness. Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/iser_initiator.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index 83247f1..08dc81c 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -405,7 +405,7 @@ int iser_send_data_out(struct iscsi_conn *conn, struct iser_dto *send_dto = NULL; unsigned long buf_offset; unsigned long data_seg_len; - unsigned int itt; + uint32_t itt; int err = 0; if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) { @@ -416,7 +416,7 @@ int iser_send_data_out(struct iscsi_conn *conn, if (iser_check_xmit(conn, ctask)) return -ENOBUFS; - itt = ntohl(hdr->itt); + itt = (__force uint32_t)hdr->itt; data_seg_len = ntoh24(hdr->dlength); buf_offset = ntohl(hdr->offset); -- 1.5.3.6 Roland, I hope it's ok now, and thanks for the explanation about sparse. Erez From rdreier at cisco.com Tue Apr 15 09:53:22 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 15 Apr 2008 09:53:22 -0700 Subject: [ofa-general] Pending libibverbs patches? In-Reply-To: <20080415094750.35afc0e5.weiny2@llnl.gov> (Ira Weiny's message of "Tue, 15 Apr 2008 09:47:50 -0700") References: <20080415094750.35afc0e5.weiny2@llnl.gov> Message-ID: > I wonder if you would take a small patch to map enums to strings. I thought I > submitted this before but I do not find it in the list archive so I must have > forgotten about it. Yes, that is a useful addition (although it's not that small a patch ;). However > +++ b/src/libibverbs.map > @@ -91,4 +91,9 @@ IBVERBS_1.1 { > ibv_dontfork_range; > ibv_dofork_range; > ibv_register_driver; > + > + __ibv_node_type_str; > + __ibv_port_state_str; > + __ibv_event_type_str; > + __ibv_wc_status_str; I don't think we want to export the array of strings as the ABI, since that would prevent us from doing localization or anything like that in the future, and compiling the inline functions into the application just seems less flexible. So I would rather see > +static inline const char *ibv_node_type_str(enum ibv_node_type node_type) > +{ > + if (node_type < IBV_NODE_CA || node_type > IBV_NODE_RNIC) > + node_type = 0; > + return (__ibv_node_type_str[node_type]); > +} the API should be ibv_node_type_str() and it should be a non-inline exported string function. - R. From tefourlife at icqmail.com Tue Apr 15 09:57:35 2008 From: tefourlife at icqmail.com (Francisco Mahoney) Date: Tue, 15 Apr 2008 17:57:35 +0100 Subject: [ofa-general] Hi Message-ID: <01c89f22$2f389180$9e16364e@tefourlife> Forget about s~xual and ED problems! Zillions of men all over the world use our cure - Ciagra and Vialis! Buy it in our online store NOW! FOR SITE LINK VIEW ATTACHED DETAILS Friendly customer support and worldwide shipping! Choose Our Cure! -------------- next part -------------- A non-text attachment was scrubbed... Name: file.zip Type: application/zip Size: 326 bytes Desc: not available URL: From Brian.Murrell at Sun.COM Tue Apr 15 11:30:23 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Tue, 15 Apr 2008 14:30:23 -0400 Subject: [ewg] Re: [ofa-general] resolve conflict between OFED 1.3 and 2.6.18 with ISCSI In-Reply-To: <48044A6F.8040107@Voltaire.COM> References: <1208176877.22671.54.camel@pc.ilinx> <480356B6.3040403@voltaire.com> <1208180482.22671.67.camel@pc.ilinx> <48036112.6070505@voltaire.com> <1208183810.22671.90.camel@pc.ilinx> <48037083.6000209@voltaire.com> <48044A6F.8040107@Voltaire.COM> Message-ID: <1208284223.22671.151.camel@pc.ilinx> On Tue, 2008-04-15 at 09:25 +0300, Erez Zilber wrote: > Voltaire will not be able to add qla4xxx support to open-iscsi in OFED > 1.4. I understand that this may be important for some people, so if you > (or anyone else) wants to add it, we can help with some info about > open-iscsi and its backports & scripts in OFED (but we can't do the > backports and testing ourselves). Thanx for the update Erez. I understand. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From rdreier at cisco.com Tue Apr 15 12:54:20 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 15 Apr 2008 12:54:20 -0700 Subject: [ofa-general] [PATCH v2] do not change itt endianness In-Reply-To: <4804DD7E.3030501@voltaire.com> (Erez Zilber's message of "Tue, 15 Apr 2008 19:53:18 +0300") References: <47E14B45.9040509@cs.wisc.edu> <4803530A.3010408@voltaire.com> <4804DD7E.3030501@voltaire.com> Message-ID: thanks, applied. From rdreier at cisco.com Tue Apr 15 12:56:07 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 15 Apr 2008 12:56:07 -0700 Subject: [ofa-general] Pending libibverbs patches? In-Reply-To: <48045BF3.8040305@voltaire.com> (Or Gerlitz's message of "Tue, 15 Apr 2008 10:40:35 +0300") References: <48045BF3.8040305@voltaire.com> Message-ID: > There's the verbs.7 man page which was submitted on February > (http://www.mail-archive.com/general at lists.openfabrics.org/msg11871.html) > and following the discussion was fixed to reflect the feedback from > the list > (http://lists.openfabrics.org/pipermail/ewg/2008-April/006340.html). to be honest I don't think verbs.7 is ready to merge yet. I haven't had a chance to review in detail but I think it really is focusing on the wrong things right now. For example, a list of the contents of really is not useful, since we already have verbs.h; on the other hand more detail on semantic issues such as thread-safety, IB/iWARP differences, etc. - R. From olga.shern at gmail.com Tue Apr 15 13:13:01 2008 From: olga.shern at gmail.com (Olga Shern) Date: Tue, 15 Apr 2008 23:13:01 +0300 Subject: [ofa-general] ofed works on kernels with 64Kbyte pages? In-Reply-To: <47FA613F.3070301@mellanox.co.il> References: <20080404204758.GU29410@sgi.com> <47FA613F.3070301@mellanox.co.il> Message-ID: Hi, We also tested OFED 1.3 on PPC64 with SLES10 SP1 UP1 with connectX and Arbel HCAs Olga S (Voltaire) On Mon, Apr 7, 2008 at 9:00 PM, Tziporet Koren wrote: > Roland Dreier wrote: > > > > I know it's a long shot, but has anyone tried using OFED on > > > a kernel with 64Kbyte pages? > > > > SGI would like to support that, but I've gotten reports that > > > something is not working (e.g., "ib_rdma_bw" doesn't work on > an > > ia64 kernel with 64Kb pages). This is with the mthca driver, > fwiw. > > > > Unfortunately a conspiracy of h/w prevents me from reproducing > > > this right now, so I don't have more details. But I'd be very > > > curious to know if anyone can verify that OFED does/doesn't > > > work with 64Kbyte pages. > > > > I don't know about OFED, but I've tried various things on 64KB PAGE_SIZE > > systems and it seems to work. It wouldn't surprise me if there are > > issues since the drivers and firmware gets a lot less testing in such > > situations but it "should work" -- I'd be happy to help debug if anyone > > has concrete problems. > > > > > OFED was tested on PPC64 with RHEL5.1 which works with 64K pages as a > default. > This was tested with our ConnectX cards (mlx4 driver) > I think IBM are using the same OS for their ehca cards too > > Tziporet > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > -------------- next part -------------- An HTML attachment was scrubbed... URL: From akepner at sgi.com Tue Apr 15 13:16:29 2008 From: akepner at sgi.com (akepner at sgi.com) Date: Tue, 15 Apr 2008 13:16:29 -0700 Subject: [ofa-general] ofed works on kernels with 64Kbyte pages? In-Reply-To: References: <20080404204758.GU29410@sgi.com> <47FA613F.3070301@mellanox.co.il> Message-ID: <20080415201629.GS8593@sgi.com> On Tue, Apr 15, 2008 at 11:13:01PM +0300, Olga Shern wrote: > .. > We also tested OFED 1.3 on PPC64 with SLES10 SP1 UP1 with connectX and > Arbel HCAs Thanks Olga, Tziporet, and Roland for your responses. We found the problem - it was one of our own making, and it's been fixed. So everything looks to be working fine now with OFED on ia64 kernels with 64Kbyte pages. -- Arthur From olga.shern at gmail.com Tue Apr 15 13:33:02 2008 From: olga.shern at gmail.com (Olga Shern) Date: Tue, 15 Apr 2008 23:33:02 +0300 Subject: [ofa-general] OFED 1.3 user source rpm In-Reply-To: <3307cdf90804071151u7b47ad6csd57efaea13455cdb@mail.gmail.com> References: <3307cdf90804071151u7b47ad6csd57efaea13455cdb@mail.gmail.com> Message-ID: Hi, OFED 1.3 has separate rpm for each user library, all rpms are located in SRPMS, you can open the needed one. Olga On Mon, Apr 7, 2008 at 9:51 PM, Rajouri Jammu wrote: > Hi, > > I could not find the ofa_user rpm in OFED 1.3. In older releases there was > a way to create a separate rpm for the user src. > > OFED-1.2.5.4]# grep ofa_user * > build_env.sh:OFA_USER_SRC_RPM=$(/bin/ls -1 ${SRPMS}/ofa_user*.src.rpm 2> > $NULL) > BUILD_ID:ofa_user-1.2.5.4: > build.sh:# Create RPMs for selected packages from ofa_user and ofa_kernel > > > I couldn't find anything like that in OFED 1.3. > > > I there a way for me to look at the OFED 1.3 user mode sources? > > > thanks. > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > -------------- next part -------------- An HTML attachment was scrubbed... URL: From weiny2 at llnl.gov Tue Apr 15 13:35:48 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Tue, 15 Apr 2008 13:35:48 -0700 Subject: [PATCH v2] Add enum strings and *_str functions for enums (Was: Re: [ofa-general] Pending libibverbs patches?) In-Reply-To: References: <20080415094750.35afc0e5.weiny2@llnl.gov> Message-ID: <20080415133548.414aeaea.weiny2@llnl.gov> On Tue, 15 Apr 2008 09:53:22 -0700 Roland Dreier wrote: > > I wonder if you would take a small patch to map enums to strings. I thought I > > submitted this before but I do not find it in the list archive so I must have > > forgotten about it. > > Yes, that is a useful addition (although it's not that small a patch ;). > > However > > > +++ b/src/libibverbs.map > > @@ -91,4 +91,9 @@ IBVERBS_1.1 { > > ibv_dontfork_range; > > ibv_dofork_range; > > ibv_register_driver; > > + > > + __ibv_node_type_str; > > + __ibv_port_state_str; > > + __ibv_event_type_str; > > + __ibv_wc_status_str; > > I don't think we want to export the array of strings as the ABI, since > that would prevent us from doing localization or anything like that in > the future, and compiling the inline functions into the application just > seems less flexible. > Good point. > > So I would rather see > > > +static inline const char *ibv_node_type_str(enum ibv_node_type node_type) > > +{ > > + if (node_type < IBV_NODE_CA || node_type > IBV_NODE_RNIC) > > + node_type = 0; > > + return (__ibv_node_type_str[node_type]); > > +} > > the API should be ibv_node_type_str() and it should be a non-inline > exported string function. > Done, here is v2 of the patch, Ira >From 82edbb7d63dcef42bdf20b0ee819dea5794c0c03 Mon Sep 17 00:00:00 2001 From: Ira K. Weiny Date: Wed, 5 Sep 2007 17:10:11 -0700 Subject: [PATCH] Add enum strings and *_str functions for enums Signed-off-by: Ira K. Weiny --- Makefile.am | 3 +- examples/devinfo.c | 13 +---- examples/rc_pingpong.c | 3 +- examples/srq_pingpong.c | 3 +- examples/uc_pingpong.c | 3 +- examples/ud_pingpong.c | 3 +- include/infiniband/verbs.h | 4 ++ src/enum_strs.c | 125 ++++++++++++++++++++++++++++++++++++++++++++ src/libibverbs.map | 5 ++ 9 files changed, 145 insertions(+), 17 deletions(-) create mode 100644 src/enum_strs.c diff --git a/Makefile.am b/Makefile.am index 705b184..46e2354 100644 --- a/Makefile.am +++ b/Makefile.am @@ -9,7 +9,8 @@ src_libibverbs_la_CFLAGS = $(AM_CFLAGS) -DIBV_CONFIG_DIR=\"$(sysconfdir)/libibve libibverbs_version_script = @LIBIBVERBS_VERSION_SCRIPT@ src_libibverbs_la_SOURCES = src/cmd.c src/compat-1_0.c src/device.c src/init.c \ - src/marshall.c src/memory.c src/sysfs.c src/verbs.c + src/marshall.c src/memory.c src/sysfs.c src/verbs.c \ + src/enum_strs.c src_libibverbs_la_LDFLAGS = -version-info 1 -export-dynamic \ $(libibverbs_version_script) src_libibverbs_la_DEPENDENCIES = $(srcdir)/src/libibverbs.map diff --git a/examples/devinfo.c b/examples/devinfo.c index 4e4316a..1fadc80 100644 --- a/examples/devinfo.c +++ b/examples/devinfo.c @@ -67,17 +67,6 @@ static const char *guid_str(uint64_t node_guid, char *str) return str; } -static const char *port_state_str(enum ibv_port_state pstate) -{ - switch (pstate) { - case IBV_PORT_DOWN: return "PORT_DOWN"; - case IBV_PORT_INIT: return "PORT_INIT"; - case IBV_PORT_ARMED: return "PORT_ARMED"; - case IBV_PORT_ACTIVE: return "PORT_ACTIVE"; - default: return "invalid state"; - } -} - static const char *port_phy_state_str(uint8_t phys_state) { switch (phys_state) { @@ -266,7 +255,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port) } printf("\t\tport:\t%d\n", port); printf("\t\t\tstate:\t\t\t%s (%d)\n", - port_state_str(port_attr.state), port_attr.state); + ibv_port_state_str(port_attr.state), port_attr.state); printf("\t\t\tmax_mtu:\t\t%s (%d)\n", mtu_str(port_attr.max_mtu), port_attr.max_mtu); printf("\t\t\tactive_mtu:\t\t%s (%d)\n", diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c index 7181914..26fa45c 100644 --- a/examples/rc_pingpong.c +++ b/examples/rc_pingpong.c @@ -709,7 +709,8 @@ int main(int argc, char *argv[]) for (i = 0; i < ne; ++i) { if (wc[i].status != IBV_WC_SUCCESS) { - fprintf(stderr, "Failed status %d for wr_id %d\n", + fprintf(stderr, "Failed status %s (%d) for wr_id %d\n", + ibv_wc_status_str(wc[i].status), wc[i].status, (int) wc[i].wr_id); return 1; } diff --git a/examples/srq_pingpong.c b/examples/srq_pingpong.c index bc869c9..95bebf4 100644 --- a/examples/srq_pingpong.c +++ b/examples/srq_pingpong.c @@ -805,7 +805,8 @@ int main(int argc, char *argv[]) for (i = 0; i < ne; ++i) { if (wc[i].status != IBV_WC_SUCCESS) { - fprintf(stderr, "Failed status %d for wr_id %d\n", + fprintf(stderr, "Failed status %s (%d) for wr_id %d\n", + ibv_wc_status_str(wc[i].status), wc[i].status, (int) wc[i].wr_id); return 1; } diff --git a/examples/uc_pingpong.c b/examples/uc_pingpong.c index 6135030..c09c8c1 100644 --- a/examples/uc_pingpong.c +++ b/examples/uc_pingpong.c @@ -697,7 +697,8 @@ int main(int argc, char *argv[]) for (i = 0; i < ne; ++i) { if (wc[i].status != IBV_WC_SUCCESS) { - fprintf(stderr, "Failed status %d for wr_id %d\n", + fprintf(stderr, "Failed status %s (%d) for wr_id %d\n", + ibv_wc_status_str(wc[i].status), wc[i].status, (int) wc[i].wr_id); return 1; } diff --git a/examples/ud_pingpong.c b/examples/ud_pingpong.c index aaee26c..8f3d50b 100644 --- a/examples/ud_pingpong.c +++ b/examples/ud_pingpong.c @@ -697,7 +697,8 @@ int main(int argc, char *argv[]) for (i = 0; i < ne; ++i) { if (wc[i].status != IBV_WC_SUCCESS) { - fprintf(stderr, "Failed status %d for wr_id %d\n", + fprintf(stderr, "Failed status %s (%d) for wr_id %d\n", + ibv_wc_status_str(wc[i].status), wc[i].status, (int) wc[i].wr_id); return 1; } diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h index a51bb9d..ccabb52 100644 --- a/include/infiniband/verbs.h +++ b/include/infiniband/verbs.h @@ -70,6 +70,7 @@ enum ibv_node_type { IBV_NODE_ROUTER, IBV_NODE_RNIC }; +const char *ibv_node_type_str(enum ibv_node_type node_type); enum ibv_transport_type { IBV_TRANSPORT_UNKNOWN = -1, @@ -160,6 +161,7 @@ enum ibv_port_state { IBV_PORT_ACTIVE = 4, IBV_PORT_ACTIVE_DEFER = 5 }; +const char *ibv_port_state_str(enum ibv_port_state port_state); struct ibv_port_attr { enum ibv_port_state state; @@ -203,6 +205,7 @@ enum ibv_event_type { IBV_EVENT_QP_LAST_WQE_REACHED, IBV_EVENT_CLIENT_REREGISTER }; +const char *ibv_event_type_str(enum ibv_event_type event); struct ibv_async_event { union { @@ -238,6 +241,7 @@ enum ibv_wc_status { IBV_WC_RESP_TIMEOUT_ERR, IBV_WC_GENERAL_ERR }; +const char *ibv_wc_status_str(enum ibv_wc_status status); enum ibv_wc_opcode { IBV_WC_SEND, diff --git a/src/enum_strs.c b/src/enum_strs.c new file mode 100644 index 0000000..7056f8a --- /dev/null +++ b/src/enum_strs.c @@ -0,0 +1,125 @@ +/* + * Copyright (c) 2008 Lawrence Livermore National Laboratory + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#include + +static const char *const __ibv_node_type_str[] = { + "UNKNOWN", + "Channel Adapter", + "Switch", + "Router", + "RNIC" +}; +const char *ibv_node_type_str(enum ibv_node_type node_type) +{ + if (node_type < IBV_NODE_CA || node_type > IBV_NODE_RNIC) + node_type = 0; + return (__ibv_node_type_str[node_type]); +} + +static const char *const __ibv_port_state_str[] = { + "No State Change (NOP)", + "DOWN", + "INIT", + "ARMED", + "ACTIVE", + "ACTDEFER", + "UNKNOWN" +}; +const char *ibv_port_state_str(enum ibv_port_state port_state) +{ + if (port_state < IBV_PORT_NOP || port_state > IBV_PORT_ACTIVE_DEFER) + port_state = IBV_PORT_ACTIVE_DEFER + 1; + return (__ibv_port_state_str[port_state]); +} + + +static const char *const __ibv_event_type_str[] = { + "CQ Error", + "QP Fatal", + "QP Request Error", + "QP Access Error", + "Communication Established", + "SQ Drained", + "Path Migrated", + "Path Migration Request Error", + "Device Fatal", + "Port Active", + "Port Error", + "LID Change", + "PKey Change", + "SM Change", + "SRQ Error", + "SRQ Limit Reached", + "QP Last WQE Reached", + "Client Reregistration", + "UNKNOWN" +}; +const char *ibv_event_type_str(enum ibv_event_type event) +{ + if (event < IBV_EVENT_CQ_ERR || event > IBV_EVENT_CLIENT_REREGISTER) + event = (IBV_EVENT_CLIENT_REREGISTER+1); + return (__ibv_event_type_str[event]); +} + +static const char *const __ibv_wc_status_str[] = { + "Success", + "Local Length Error", + "Local QP Operation Error", + "Local EE Context Operation Error", + "Local Protection Error", + "Work Request Flushed Error", + "Memory Management Operation Error", + "Bad Response Error", + "Local Access Error", + "Remote Invalid Request Error", + "Remote Access Error", + "Remote Operation Error", + "Transport Retry Counter Exceeded", + "RNR Retry Counter Exceeded", + "Local RDD Violation Error", + "Remote Invalid RD Request", + "Aborted Error", + "Invalid EE Context Number", + "Invalid EE Context State", + "Fatal Error", + "Response Timeout Error", + "General Error" +}; +const char *ibv_wc_status_str(enum ibv_wc_status status) +{ + if (status < IBV_WC_SUCCESS || status > IBV_WC_GENERAL_ERR) + status = IBV_WC_GENERAL_ERR; + return (__ibv_wc_status_str[status]); +} + diff --git a/src/libibverbs.map b/src/libibverbs.map index 3a346ed..1827da0 100644 --- a/src/libibverbs.map +++ b/src/libibverbs.map @@ -91,4 +91,9 @@ IBVERBS_1.1 { ibv_dontfork_range; ibv_dofork_range; ibv_register_driver; + + ibv_node_type_str; + ibv_port_state_str; + ibv_event_type_str; + ibv_wc_status_str; } IBVERBS_1.0; -- 1.5.1 From ogerlitz at voltaire.com Wed Apr 16 00:32:32 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 16 Apr 2008 10:32:32 +0300 Subject: [ofa-general] Pending libibverbs patches? In-Reply-To: References: <48045BF3.8040305@voltaire.com> Message-ID: <4805AB90.6060702@voltaire.com> Roland Dreier wrote: > to be honest I don't think verbs.7 is ready to merge yet. I haven't had > a chance to review in detail but I think it really is focusing on the > wrong things right now. For example, a list of the contents of > really is not useful, since we already have > verbs.h; on the other hand more detail on semantic issues such as > thread-safety, IB/iWARP differences, etc. If the section stating the different functions seems not useful it can be removed, I will be happy to hear what other people think, anyway, this section not what this man page is focusing on. I agree that more has to be said on issues such as IB/iWARP differences, thread-safety, fork, etc, so in case you prefer to see this "more" coming out before merging anything, let it be, but please note that its really uneasy for new comers to start programming to IB/iWARP without any man page that gives some generation notion on what is this libibverbs. In that respect, maybe you can merge the first portion of the page without the function listing, and later we can add more info on the various issues? Or. From yevgenyp at mellanox.co.il Wed Apr 16 00:59:02 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 16 Apr 2008 10:59:02 +0300 Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support Message-ID: <4805B1C6.80004@mellanox.co.il> Multi Protocol supplies the user with the ability to run Infiniband and Ethernet protocols on the same HCA (separately or at the same time). Main changes to mlx4_core: 1. Mlx4 device now holds the actual protocol for each port. The port types are determined through module parameters of through sysfs interface. The requested types are verified with firmware capabilities in order to determine the actual port protocol. 2. The driver now manages Mac and Vlan tables used by customers of the low level driver. Corresponding commands were added. 3. Completion eq's are created per cpu. Created cq's are attached to an eq by "Round Robin" algorithm, unless a specific eq was requested. 4. Creation of a collapsed cq support was added. 5. Additional reserved qp ranges were added. There is a range for the customers of the low level driver (IB, Ethernet, FCoE). 6. Qp allocation process changed. First a qp range should be reserved, then qps can be allocated from that range. This is to support the ability to allocate consecutive qps. Appropriate changes were made in the allocation mechanism. 7. Common actions to all HW resource management (Doorbell allocation, Buffer allocation, Mtt write) were moved to the low level driver. Signed-off-by: Yevgeny Petrilin Signed-off-by: Oren Duer Reviewed-by: Eli Cohen --- drivers/net/mlx4/Makefile | 2 +- drivers/net/mlx4/alloc.c | 258 ++++++++++++++++++++++++++++++++++- drivers/net/mlx4/cq.c | 26 +++- drivers/net/mlx4/eq.c | 41 ++++-- drivers/net/mlx4/fw.c | 18 ++- drivers/net/mlx4/fw.h | 7 +- drivers/net/mlx4/main.c | 315 +++++++++++++++++++++++++++++++++++++++++-- drivers/net/mlx4/mlx4.h | 50 +++++++- drivers/net/mlx4/mr.c | 157 ++++++++++++++++++++-- drivers/net/mlx4/port.c | 282 ++++++++++++++++++++++++++++++++++++++ drivers/net/mlx4/qp.c | 133 ++++++++++++++++--- include/linux/mlx4/cmd.h | 9 ++ include/linux/mlx4/device.h | 118 ++++++++++++++++- include/linux/mlx4/qp.h | 19 +++- 14 files changed, 1354 insertions(+), 81 deletions(-) create mode 100644 drivers/net/mlx4/port.c diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile index 0952a65..f4932d8 100644 --- a/drivers/net/mlx4/Makefile +++ b/drivers/net/mlx4/Makefile @@ -1,4 +1,4 @@ obj-$(CONFIG_MLX4_CORE) += mlx4_core.o mlx4_core-y := alloc.o catas.o cmd.o cq.o eq.o fw.o icm.o intf.o main.o mcg.o \ - mr.o pd.o profile.o qp.o reset.o srq.o + mr.o pd.o profile.o qp.o reset.o srq.o port.o diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c index 75ef9d0..044614f 100644 --- a/drivers/net/mlx4/alloc.c +++ b/drivers/net/mlx4/alloc.c @@ -44,15 +44,19 @@ u32 mlx4_bitmap_alloc(struct mlx4_bitmap *bitmap) spin_lock(&bitmap->lock); - obj = find_next_zero_bit(bitmap->table, bitmap->max, bitmap->last); - if (obj >= bitmap->max) { + obj = find_next_zero_bit(bitmap->table, + bitmap->effective_max, + bitmap->last); + if (obj >= bitmap->effective_max) { bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask; - obj = find_first_zero_bit(bitmap->table, bitmap->max); + obj = find_first_zero_bit(bitmap->table, bitmap->effective_max); } - if (obj < bitmap->max) { + if (obj < bitmap->effective_max) { set_bit(obj, bitmap->table); - bitmap->last = (obj + 1) & (bitmap->max - 1); + bitmap->last = (obj + 1); + if (bitmap->last == bitmap->effective_max) + bitmap->last = 0; obj |= bitmap->top; } else obj = -1; @@ -73,7 +77,84 @@ void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj) spin_unlock(&bitmap->lock); } -int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved) +static unsigned long find_next_zero_string_aligned(unsigned long *bitmap, + u32 start, u32 nbits, + int len, int align) +{ + unsigned long end, i; + +again: + start = ALIGN(start, align); + while ((start < nbits) && test_bit(start, bitmap)) + start += align; + if (start >= nbits) + return -1; + + end = start+len; + if (end > nbits) + return -1; + for (i = start+1; i < end; i++) { + if (test_bit(i, bitmap)) { + start = i+1; + goto again; + } + } + return start; +} + +u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align) +{ + u32 obj, i; + + if (likely(cnt == 1 && align == 1)) + return mlx4_bitmap_alloc(bitmap); + + spin_lock(&bitmap->lock); + + obj = find_next_zero_string_aligned(bitmap->table, bitmap->last, + bitmap->effective_max, cnt, align); + if (obj >= bitmap->effective_max) { + bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask; + obj = find_next_zero_string_aligned(bitmap->table, 0, + bitmap->effective_max, + cnt, align); + } + + if (obj < bitmap->effective_max) { + for (i = 0; i < cnt; i++) + set_bit(obj+i, bitmap->table); + if (obj == bitmap->last) { + bitmap->last = (obj + cnt); + if (bitmap->last >= bitmap->effective_max) + bitmap->last = 0; + } + obj |= bitmap->top; + } else + obj = -1; + + spin_unlock(&bitmap->lock); + + + return obj; +} + +void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt) +{ + u32 i; + + obj &= bitmap->max - 1; + + spin_lock(&bitmap->lock); + for (i = 0; i < cnt; i++) + clear_bit(obj+i, bitmap->table); + bitmap->last = min(bitmap->last, obj); + bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask; + spin_unlock(&bitmap->lock); +} + +int mlx4_bitmap_init_with_effective_max(struct mlx4_bitmap *bitmap, + u32 num, u32 mask, u32 reserved, + u32 effective_max) { int i; @@ -85,6 +166,7 @@ int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved bitmap->top = 0; bitmap->max = num; bitmap->mask = mask; + bitmap->effective_max = effective_max; spin_lock_init(&bitmap->lock); bitmap->table = kzalloc(BITS_TO_LONGS(num) * sizeof (long), GFP_KERNEL); if (!bitmap->table) @@ -96,6 +178,13 @@ int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved return 0; } +int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, + u32 num, u32 mask, u32 reserved) +{ + return mlx4_bitmap_init_with_effective_max(bitmap, num, mask, + reserved, num); +} + void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap) { kfree(bitmap->table); @@ -196,3 +285,160 @@ void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf) } } EXPORT_SYMBOL_GPL(mlx4_buf_free); + + +static struct mlx4_db_pgdir *mlx4_alloc_db_pgdir(struct device *dma_device) +{ + struct mlx4_db_pgdir *pgdir; + + pgdir = kzalloc(sizeof *pgdir, GFP_KERNEL); + if (!pgdir) + return NULL; + + bitmap_fill(pgdir->order1, MLX4_DB_PER_PAGE / 2); + pgdir->bits[0] = pgdir->order0; + pgdir->bits[1] = pgdir->order1; + pgdir->db_page = dma_alloc_coherent(dma_device, PAGE_SIZE, + &pgdir->db_dma, GFP_KERNEL); + if (!pgdir->db_page) { + kfree(pgdir); + return NULL; + } + + return pgdir; +} + +static int mlx4_alloc_db_from_pgdir(struct mlx4_db_pgdir *pgdir, + struct mlx4_db *db, int order) +{ + int o; + int i; + + for (o = order; o <= 1; ++o) { + i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o); + if (i < MLX4_DB_PER_PAGE >> o) + goto found; + } + + return -ENOMEM; + +found: + clear_bit(i, pgdir->bits[o]); + + i <<= o; + + if (o > order) + set_bit(i ^ 1, pgdir->bits[order]); + + db->pgdir = pgdir; + db->index = i; + db->db = pgdir->db_page + db->index; + db->dma = pgdir->db_dma + db->index * 4; + db->order = order; + + return 0; +} + +static int mlx4_db_alloc(struct mlx4_dev *dev, struct device *dma_device, + struct mlx4_db *db, int order) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + struct mlx4_db_pgdir *pgdir; + int ret = 0; + + mutex_lock(&priv->pgdir_mutex); + + list_for_each_entry(pgdir, &priv->pgdir_list, list) + if (!mlx4_alloc_db_from_pgdir(pgdir, db, order)) + goto out; + + pgdir = mlx4_alloc_db_pgdir(dma_device); + if (!pgdir) { + ret = -ENOMEM; + goto out; + } + + list_add(&pgdir->list, &priv->pgdir_list); + + /* This should never fail -- we just allocated an empty page: */ + WARN_ON(mlx4_alloc_db_from_pgdir(pgdir, db, order)); + +out: + mutex_unlock(&priv->pgdir_mutex); + + return ret; +} + +static void mlx4_db_free(struct mlx4_dev *dev, struct device *dma_device, + struct mlx4_db *db) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + int o; + int i; + + mutex_lock(&priv->pgdir_mutex); + + o = db->order; + i = db->index; + + if (db->order == 0 && test_bit(i ^ 1, db->pgdir->order0)) { + clear_bit(i ^ 1, db->pgdir->order0); + ++o; + } + + i >>= o; + set_bit(i, db->pgdir->bits[o]); + + if (bitmap_full(db->pgdir->order1, MLX4_DB_PER_PAGE / 2)) { + dma_free_coherent(dma_device, PAGE_SIZE, + db->pgdir->db_page, db->pgdir->db_dma); + list_del(&db->pgdir->list); + kfree(db->pgdir); + } + + mutex_unlock(&priv->pgdir_mutex); +} + +int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres, + struct device *dma_device, int size, int max_direct) +{ + int err; + + err = mlx4_db_alloc(dev, dma_device, &wqres->db, 1); + if (err) + return err; + *wqres->db.db = 0; + + if (mlx4_buf_alloc(dev, size, max_direct, &wqres->buf)) { + err = -ENOMEM; + goto err_db; + } + + err = mlx4_mtt_init(dev, wqres->buf.npages, wqres->buf.page_shift, + &wqres->mtt); + if (err) + goto err_buf; + err = mlx4_buf_write_mtt(dev, &wqres->mtt, &wqres->buf); + if (err) + goto err_mtt; + + return 0; + +err_mtt: + mlx4_mtt_cleanup(dev, &wqres->mtt); +err_buf: + mlx4_buf_free(dev, size, &wqres->buf); +err_db: + mlx4_db_free(dev, dma_device, &wqres->db); + return err; +} +EXPORT_SYMBOL_GPL(mlx4_alloc_hwq_res); + +void mlx4_free_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres, + struct device *dma_device, int size) +{ + mlx4_mtt_cleanup(dev, &wqres->mtt); + mlx4_buf_free(dev, size, &wqres->buf); + mlx4_db_free(dev, dma_device, &wqres->db); +} +EXPORT_SYMBOL_GPL(mlx4_free_hwq_res); diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c index caa5bcf..e905e61 100644 --- a/drivers/net/mlx4/cq.c +++ b/drivers/net/mlx4/cq.c @@ -188,7 +188,8 @@ int mlx4_cq_resize(struct mlx4_dev *dev, struct mlx4_cq *cq, EXPORT_SYMBOL_GPL(mlx4_cq_resize); int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, - struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq) + struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq, + unsigned vector, int collapsed) { struct mlx4_priv *priv = mlx4_priv(dev); struct mlx4_cq_table *cq_table = &priv->cq_table; @@ -197,6 +198,9 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, u64 mtt_addr; int err; +#define COLLAPSED_SHIFT 18 +#define ENTRIES_SHIFT 24 + cq->cqn = mlx4_bitmap_alloc(&cq_table->bitmap); if (cq->cqn == -1) return -ENOMEM; @@ -224,8 +228,22 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, cq_context = mailbox->buf; memset(cq_context, 0, sizeof *cq_context); - cq_context->logsize_usrpage = cpu_to_be32((ilog2(nent) << 24) | uar->index); - cq_context->comp_eqn = priv->eq_table.eq[MLX4_EQ_COMP].eqn; + cq_context->flags = cpu_to_be32(!!collapsed << COLLAPSED_SHIFT); + cq_context->logsize_usrpage = cpu_to_be32( + (ilog2(nent) << ENTRIES_SHIFT) | uar->index); + if(vector > priv->eq_table.num_comp_eqs) { + err = -EINVAL; + goto err_radix; + } + + if (vector == 0) { + vector = priv->eq_table.last_comp_eq % + priv->eq_table.num_comp_eqs + 1; + priv->eq_table.last_comp_eq = vector; + } + cq->comp_eq_idx = MLX4_EQ_COMP_CPU0 + vector - 1; + cq_context->comp_eqn = priv->eq_table.eq[MLX4_EQ_COMP_CPU0 + + vector - 1].eqn; cq_context->log_page_size = mtt->page_shift - MLX4_ICM_PAGE_SHIFT; mtt_addr = mlx4_mtt_addr(dev, mtt); @@ -274,7 +292,7 @@ void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq) if (err) mlx4_warn(dev, "HW2SW_CQ failed (%d) for CQN %06x\n", err, cq->cqn); - synchronize_irq(priv->eq_table.eq[MLX4_EQ_COMP].irq); + synchronize_irq(priv->eq_table.eq[cq->comp_eq_idx].irq); spin_lock_irq(&cq_table->lock); radix_tree_delete(&cq_table->tree, cq->cqn); diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c index e141a15..67af1b1 100644 --- a/drivers/net/mlx4/eq.c +++ b/drivers/net/mlx4/eq.c @@ -265,7 +265,7 @@ static irqreturn_t mlx4_interrupt(int irq, void *dev_ptr) writel(priv->eq_table.clr_mask, priv->eq_table.clr_int); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs; ++i) work |= mlx4_eq_int(dev, &priv->eq_table.eq[i]); return IRQ_RETVAL(work); @@ -482,7 +482,7 @@ static void mlx4_free_irqs(struct mlx4_dev *dev) if (eq_table->have_irq) free_irq(dev->pdev->irq, dev); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < MLX4_EQ_COMP_CPU0 + eq_table->num_comp_eqs; ++i) if (eq_table->eq[i].have_irq) free_irq(eq_table->eq[i].irq, eq_table->eq + i); } @@ -555,6 +555,7 @@ int mlx4_init_eq_table(struct mlx4_dev *dev) struct mlx4_priv *priv = mlx4_priv(dev); int err; int i; + int req_eqs; err = mlx4_bitmap_init(&priv->eq_table.bitmap, dev->caps.num_eqs, dev->caps.num_eqs - 1, dev->caps.reserved_eqs); @@ -573,11 +574,21 @@ int mlx4_init_eq_table(struct mlx4_dev *dev) priv->eq_table.clr_int = priv->clr_base + (priv->eq_table.inta_pin < 32 ? 4 : 0); - err = mlx4_create_eq(dev, dev->caps.num_cqs + MLX4_NUM_SPARE_EQE, - (dev->flags & MLX4_FLAG_MSI_X) ? MLX4_EQ_COMP : 0, - &priv->eq_table.eq[MLX4_EQ_COMP]); - if (err) - goto err_out_unmap; + priv->eq_table.num_comp_eqs = 0; + req_eqs = (dev->flags & MLX4_FLAG_MSI_X) ? num_online_cpus() : 1; + while (req_eqs) { + err = mlx4_create_eq( + dev, dev->caps.num_cqs + MLX4_NUM_SPARE_EQE, + (dev->flags & MLX4_FLAG_MSI_X) ? + (MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs) : 0, + &priv->eq_table.eq[MLX4_EQ_COMP_CPU0 + + priv->eq_table.num_comp_eqs]); + if (err) + goto err_out_comp; + priv->eq_table.num_comp_eqs++; + req_eqs--; + } + priv->eq_table.last_comp_eq = 0; err = mlx4_create_eq(dev, MLX4_NUM_ASYNC_EQE + MLX4_NUM_SPARE_EQE, (dev->flags & MLX4_FLAG_MSI_X) ? MLX4_EQ_ASYNC : 0, @@ -587,11 +598,11 @@ int mlx4_init_eq_table(struct mlx4_dev *dev) if (dev->flags & MLX4_FLAG_MSI_X) { static const char *eq_name[] = { - [MLX4_EQ_COMP] = DRV_NAME " (comp)", - [MLX4_EQ_ASYNC] = DRV_NAME " (async)" + [MLX4_EQ_ASYNC] = DRV_NAME "(async)", + [MLX4_EQ_COMP_CPU0...MLX4_NUM_EQ] = "eth" DRV_NAME, }; - - for (i = 0; i < MLX4_NUM_EQ; ++i) { + for (i = 0; + i < MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs; ++i) { err = request_irq(priv->eq_table.eq[i].irq, mlx4_msi_x_interrupt, 0, eq_name[i], priv->eq_table.eq + i); @@ -616,7 +627,7 @@ int mlx4_init_eq_table(struct mlx4_dev *dev) mlx4_warn(dev, "MAP_EQ for async EQ %d failed (%d)\n", priv->eq_table.eq[MLX4_EQ_ASYNC].eqn, err); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs; ++i) eq_set_ci(&priv->eq_table.eq[i], 1); return 0; @@ -625,9 +636,9 @@ err_out_async: mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_ASYNC]); err_out_comp: - mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_COMP]); + for (i = 0; i < priv->eq_table.num_comp_eqs; ++i) + mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_COMP_CPU0 + i]); -err_out_unmap: mlx4_unmap_clr_int(dev); mlx4_free_irqs(dev); @@ -646,7 +657,7 @@ void mlx4_cleanup_eq_table(struct mlx4_dev *dev) mlx4_free_irqs(dev); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs; ++i) mlx4_free_eq(dev, &priv->eq_table.eq[i]); mlx4_unmap_clr_int(dev); diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c index d82f275..fe0f6b3 100644 --- a/drivers/net/mlx4/fw.c +++ b/drivers/net/mlx4/fw.c @@ -314,7 +314,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) MLX4_GET(field, outbox, QUERY_DEV_CAP_VL_PORT_OFFSET); dev_cap->max_vl[i] = field >> 4; MLX4_GET(field, outbox, QUERY_DEV_CAP_MTU_WIDTH_OFFSET); - dev_cap->max_mtu[i] = field >> 4; + dev_cap->ib_mtu[i] = field >> 4; dev_cap->max_port_width[i] = field & 0xf; MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_GID_OFFSET); dev_cap->max_gids[i] = 1 << (field & 0xf); @@ -322,9 +322,11 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev_cap->max_pkeys[i] = 1 << (field & 0xf); } } else { +#define QUERY_PORT_SUPPORTED_TYPE_OFFSET 0x00 #define QUERY_PORT_MTU_OFFSET 0x01 #define QUERY_PORT_WIDTH_OFFSET 0x06 #define QUERY_PORT_MAX_GID_PKEY_OFFSET 0x07 +#define QUERY_PORT_MAX_MACVLAN_OFFSET 0x0a #define QUERY_PORT_MAX_VL_OFFSET 0x0b for (i = 1; i <= dev_cap->num_ports; ++i) { @@ -334,7 +336,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) goto out; MLX4_GET(field, outbox, QUERY_PORT_MTU_OFFSET); - dev_cap->max_mtu[i] = field & 0xf; + dev_cap->ib_mtu[i] = field & 0xf; MLX4_GET(field, outbox, QUERY_PORT_WIDTH_OFFSET); dev_cap->max_port_width[i] = field & 0xf; MLX4_GET(field, outbox, QUERY_PORT_MAX_GID_PKEY_OFFSET); @@ -342,6 +344,14 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev_cap->max_pkeys[i] = 1 << (field & 0xf); MLX4_GET(field, outbox, QUERY_PORT_MAX_VL_OFFSET); dev_cap->max_vl[i] = field & 0xf; + MLX4_GET(field, outbox, + QUERY_PORT_SUPPORTED_TYPE_OFFSET); + dev_cap->supported_port_types[i] = field & 3; + MLX4_GET(field, outbox, QUERY_PORT_MAX_MACVLAN_OFFSET); + dev_cap->log_max_macs[i] = field & 0xf; + dev_cap->log_max_vlans[i] = field >> 4; + dev_cap->eth_mtu[i] = be16_to_cpu(((u16 *) outbox)[1]); + dev_cap->def_mac[i] = be64_to_cpu(((u64 *) outbox)[2]); } } @@ -379,7 +389,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) mlx4_dbg(dev, "Max CQEs: %d, max WQEs: %d, max SRQ WQEs: %d\n", dev_cap->max_cq_sz, dev_cap->max_qp_sz, dev_cap->max_srq_sz); mlx4_dbg(dev, "Local CA ACK delay: %d, max MTU: %d, port width cap: %d\n", - dev_cap->local_ca_ack_delay, 128 << dev_cap->max_mtu[1], + dev_cap->local_ca_ack_delay, 128 << dev_cap->ib_mtu[1], dev_cap->max_port_width[1]); mlx4_dbg(dev, "Max SQ desc size: %d, max SQ S/G: %d\n", dev_cap->max_sq_desc_sz, dev_cap->max_sq_sg); @@ -787,7 +797,7 @@ int mlx4_INIT_PORT(struct mlx4_dev *dev, int port) flags |= (dev->caps.port_width_cap[port] & 0xf) << INIT_PORT_PORT_WIDTH_SHIFT; MLX4_PUT(inbox, flags, INIT_PORT_FLAGS_OFFSET); - field = 128 << dev->caps.mtu_cap[port]; + field = 128 << dev->caps.ib_mtu_cap[port]; MLX4_PUT(inbox, field, INIT_PORT_MTU_OFFSET); field = dev->caps.gid_table_len[port]; MLX4_PUT(inbox, field, INIT_PORT_MAX_GID_OFFSET); diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h index 306cb9b..ef964d5 100644 --- a/drivers/net/mlx4/fw.h +++ b/drivers/net/mlx4/fw.h @@ -61,11 +61,13 @@ struct mlx4_dev_cap { int local_ca_ack_delay; int num_ports; u32 max_msg_sz; - int max_mtu[MLX4_MAX_PORTS + 1]; + int ib_mtu[MLX4_MAX_PORTS + 1]; int max_port_width[MLX4_MAX_PORTS + 1]; int max_vl[MLX4_MAX_PORTS + 1]; int max_gids[MLX4_MAX_PORTS + 1]; int max_pkeys[MLX4_MAX_PORTS + 1]; + u64 def_mac[MLX4_MAX_PORTS + 1]; + int eth_mtu[MLX4_MAX_PORTS + 1]; u16 stat_rate_support; u32 flags; int reserved_uars; @@ -97,6 +99,9 @@ struct mlx4_dev_cap { u32 reserved_lkey; u64 max_icm_sz; int max_gso_sz; + u8 supported_port_types[MLX4_MAX_PORTS + 1]; + u8 log_max_macs[MLX4_MAX_PORTS + 1]; + u8 log_max_vlans[MLX4_MAX_PORTS + 1]; }; struct mlx4_adapter { diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 49a4aca..50b5eb7 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -38,6 +38,8 @@ #include #include #include +#include +#include #include #include @@ -81,14 +83,83 @@ static struct mlx4_profile default_profile = { .rdmarc_per_qp = 1 << 4, .num_cq = 1 << 16, .num_mcg = 1 << 13, - .num_mpt = 1 << 17, + .num_mpt = 1 << 18, .num_mtt = 1 << 20, }; +static int mod_param_num_mac = 1; +module_param_named(num_mac, mod_param_num_mac, int, 0444); +MODULE_PARM_DESC(num_mac, "Maximum number of MACs per ETH port " + "(1-127, default 1)"); + +static int mod_param_num_vlan; +module_param_named(num_vlan, mod_param_num_vlan, int, 0444); +MODULE_PARM_DESC(num_vlan, "Maximum number of VLANs per ETH port " + "(0-126, default 0)"); + +static int mod_param_use_prio; +module_param_named(use_prio, mod_param_use_prio, bool, 0444); +MODULE_PARM_DESC(use_prio, "Enable steering by VLAN priority on ETH ports " + "(0/1, default 0)"); + +static int mod_param_if_eth = 1; +module_param_named(if_eth, mod_param_if_eth, bool, 0444); +MODULE_PARM_DESC(if_eth, "Enable ETH interface be loaded (0/1, default 1)"); + +static int mod_param_if_fc = 1; +module_param_named(if_fc, mod_param_if_fc, bool, 0444); +MODULE_PARM_DESC(if_fc, "Enable FC interface be loaded (0/1, default 1)"); + +static char *mod_param_port_type[MLX4_MAX_PORTS] = + { [0 ... (MLX4_MAX_PORTS-1)] = "ib"}; +module_param_array_named(port_type, mod_param_port_type, charp, NULL, 0444); +MODULE_PARM_DESC(port_type, "Ports L2 type (ib/eth/auto, entry per port, " + "comma seperated, default ib for all)"); + +static int mod_param_port_mtu[MLX4_MAX_PORTS] = + { [0 ... (MLX4_MAX_PORTS-1)] = 9600}; +module_param_array_named(port_mtu, mod_param_port_mtu, int, NULL, 0444); +MODULE_PARM_DESC(port_mtu, "Ports max mtu in Bytes, entry per port, " + "comma seperated, default 9600 for all"); + +static int mlx4_check_port_params(struct mlx4_dev *dev, + enum mlx4_port_type *port_type) +{ + if (port_type[0] != port_type[1] && + !(dev->caps.flags & MLX4_DEV_CAP_FLAG_DPDP)) { + mlx4_err(dev, "Only same port types supported " + "on this HCA, aborting.\n"); + return -EINVAL; + } + if ((port_type[0] == MLX4_PORT_TYPE_ETH) && + (port_type[1] == MLX4_PORT_TYPE_IB)) { + mlx4_err(dev, "eth-ib configuration is not supported.\n"); + return -EINVAL; + } + return 0; +} + +static void mlx4_str2port_type(char **port_str, + enum mlx4_port_type *port_type) +{ + int i; + + for (i = 0; i < MLX4_MAX_PORTS; i++) { + if (!strcmp(port_str[i], "eth")) + port_type[i] = MLX4_PORT_TYPE_ETH; + else + port_type[i] = MLX4_PORT_TYPE_IB; + } +} + static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) { int err; int i; + int num_eth_ports = 0; + enum mlx4_port_type port_type[MLX4_MAX_PORTS]; + + mlx4_str2port_type(mod_param_port_type, port_type); err = mlx4_QUERY_DEV_CAP(dev, dev_cap); if (err) { @@ -120,10 +191,12 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev->caps.num_ports = dev_cap->num_ports; for (i = 1; i <= dev->caps.num_ports; ++i) { dev->caps.vl_cap[i] = dev_cap->max_vl[i]; - dev->caps.mtu_cap[i] = dev_cap->max_mtu[i]; + dev->caps.ib_mtu_cap[i] = dev_cap->ib_mtu[i]; dev->caps.gid_table_len[i] = dev_cap->max_gids[i]; dev->caps.pkey_table_len[i] = dev_cap->max_pkeys[i]; dev->caps.port_width_cap[i] = dev_cap->max_port_width[i]; + dev->caps.eth_mtu_cap[i] = dev_cap->eth_mtu[i]; + dev->caps.def_mac[i] = dev_cap->def_mac[i]; } dev->caps.num_uars = dev_cap->uar_size / PAGE_SIZE; @@ -134,7 +207,6 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev->caps.max_rq_sg = dev_cap->max_rq_sg; dev->caps.max_wqes = dev_cap->max_qp_sz; dev->caps.max_qp_init_rdma = dev_cap->max_requester_per_qp; - dev->caps.reserved_qps = dev_cap->reserved_qps; dev->caps.max_srq_wqes = dev_cap->max_srq_sz; dev->caps.max_srq_sge = dev_cap->max_rq_sg - 1; dev->caps.reserved_srqs = dev_cap->reserved_srqs; @@ -161,9 +233,155 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev->caps.stat_rate_support = dev_cap->stat_rate_support; dev->caps.max_gso_sz = dev_cap->max_gso_sz; + dev->caps.log_num_macs = ilog2(roundup_pow_of_two + (mod_param_num_mac + 1)); + dev->caps.log_num_vlans = ilog2(roundup_pow_of_two + (mod_param_num_vlan + 2)); + dev->caps.log_num_prios = mod_param_use_prio ? 3: 0; + + err = mlx4_check_port_params(dev, port_type); + if (err) + return err; + + for (i = 1; i <= dev->caps.num_ports; ++i) { + if (!dev_cap->supported_port_types[i]) { + mlx4_warn(dev, "FW doesn't support Multi Protocol, " + "loading IB only\n"); + dev->caps.port_type[i] = MLX4_PORT_TYPE_IB; + continue; + } + if (port_type[i-1] & dev_cap->supported_port_types[i]) + dev->caps.port_type[i] = port_type[i-1]; + else { + mlx4_err(dev, "Requested port type for port %d " + "not supported by HW\n", i); + return -ENODEV; + } + if (mod_param_port_mtu[i-1] <= dev->caps.eth_mtu_cap[i]) + dev->caps.eth_mtu_cap[i] = mod_param_port_mtu[i-1]; + else + mlx4_warn(dev, "Requested mtu for port %d is larger " + "then supported, reducing to %d\n", + i, dev->caps.eth_mtu_cap[i]); + if (dev->caps.log_num_macs > dev_cap->log_max_macs[i]) { + dev->caps.log_num_macs = dev_cap->log_max_macs[i]; + mlx4_warn(dev, "Requested number of MACs is too much " + "for port %d, reducing to %d.\n", + i, 1 << dev->caps.log_num_macs); + } + if (dev->caps.log_num_vlans > dev_cap->log_max_vlans[i]) { + dev->caps.log_num_vlans = dev_cap->log_max_vlans[i]; + mlx4_warn(dev, "Requested number of VLANs is too much " + "for port %d, reducing to %d.\n", + i, 1 << dev->caps.log_num_vlans); + } + if (dev->caps.port_type[i] == MLX4_PORT_TYPE_ETH) + ++num_eth_ports; + } + + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW] = dev_cap->reserved_qps; + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_ETH_ADDR] = + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_ADDR] = + (1 << dev->caps.log_num_macs)* + (1 << dev->caps.log_num_vlans)* + (1 << dev->caps.log_num_prios)* + num_eth_ports; + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH] = MLX4_NUM_FEXCH; + return 0; } +static int mlx4_change_port_types(struct mlx4_dev *dev, + enum mlx4_port_type *port_types) +{ + int i; + int err = 0; + int change = 0; + int port; + + for (i = 0; i < MLX4_MAX_PORTS; i++) { + if (port_types[i] != dev->caps.port_type[i + 1]) { + change = 1; + dev->caps.port_type[i + 1] = port_types[i]; + } + } + if (change) { + mlx4_unregister_device(dev); + for (port = 1; port <= dev->caps.num_ports; port++) { + mlx4_CLOSE_PORT(dev, port); + err = mlx4_SET_PORT(dev, port); + if (err) { + mlx4_err(dev, "Failed to set port %d, " + "aborting\n", port); + return err; + } + } + err = mlx4_register_device(dev); + } + return err; +} + +static ssize_t show_port_type(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct pci_dev *pdev = to_pci_dev(dev); + struct mlx4_dev *mdev = pci_get_drvdata(pdev); + int i; + + sprintf(buf, "Current port types:\n"); + for (i = 1; i <= MLX4_MAX_PORTS; i++) { + sprintf(buf, "%sPort%d: %s\n", buf, i, + (mdev->caps.port_type[i] == MLX4_PORT_TYPE_IB)? + "ib": "eth"); + } + return strlen(buf); +} + + +static ssize_t set_port_type(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct pci_dev *pdev = to_pci_dev(dev); + struct mlx4_dev *mdev = pci_get_drvdata(pdev); + char *type; + enum mlx4_port_type port_types[MLX4_MAX_PORTS]; + char *loc_buf; + char *ptr; + int i; + int err = 0; + + loc_buf = kmalloc(count + 1, GFP_KERNEL); + if (!loc_buf) + return -ENOMEM; + + ptr = loc_buf; + memcpy(loc_buf, buf, count + 1); + for (i = 0; i < MLX4_MAX_PORTS; i++) { + type = strsep(&loc_buf, ","); + if (!strcmp(type, "ib")) + port_types[i] = MLX4_PORT_TYPE_IB; + else if (!strcmp(type, "eth")) + port_types[i] = MLX4_PORT_TYPE_ETH; + else { + dev_warn(dev, "%s is not acceptable port type " + "(use 'eth' or 'ib' only)\n", type); + err = -EINVAL; + goto out; + } + } + err = mlx4_check_port_params(mdev, port_types); + if (err) + goto out; + + err = mlx4_change_port_types(mdev, port_types); +out: + kfree(ptr); + return err ? err: count; +} +static DEVICE_ATTR(mlx4_port_type, S_IWUGO | S_IRUGO, show_port_type, set_port_type); + static int mlx4_load_fw(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); @@ -209,7 +427,8 @@ static int mlx4_init_cmpt_table(struct mlx4_dev *dev, u64 cmpt_base, ((u64) (MLX4_CMPT_TYPE_QP * cmpt_entry_sz) << MLX4_CMPT_SHIFT), cmpt_entry_sz, dev->caps.num_qps, - dev->caps.reserved_qps, 0, 0); + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], + 0, 0); if (err) goto err; @@ -334,7 +553,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap, init_hca->qpc_base, dev_cap->qpc_entry_sz, dev->caps.num_qps, - dev->caps.reserved_qps, 0, 0); + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], + 0, 0); if (err) { mlx4_err(dev, "Failed to map QP context memory, aborting.\n"); goto err_unmap_dmpt; @@ -344,7 +564,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap, init_hca->auxc_base, dev_cap->aux_entry_sz, dev->caps.num_qps, - dev->caps.reserved_qps, 0, 0); + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], + 0, 0); if (err) { mlx4_err(dev, "Failed to map AUXC context memory, aborting.\n"); goto err_unmap_qp; @@ -354,7 +575,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap, init_hca->altc_base, dev_cap->altc_entry_sz, dev->caps.num_qps, - dev->caps.reserved_qps, 0, 0); + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], + 0, 0); if (err) { mlx4_err(dev, "Failed to map ALTC context memory, aborting.\n"); goto err_unmap_auxc; @@ -364,7 +586,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap, init_hca->rdmarc_base, dev_cap->rdmarc_entry_sz << priv->qp_table.rdmarc_shift, dev->caps.num_qps, - dev->caps.reserved_qps, 0, 0); + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], + 0, 0); if (err) { mlx4_err(dev, "Failed to map RDMARC context memory, aborting\n"); goto err_unmap_altc; @@ -556,6 +779,7 @@ static int mlx4_setup_hca(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); int err; + int port; err = mlx4_init_uar_table(dev); if (err) { @@ -654,8 +878,25 @@ static int mlx4_setup_hca(struct mlx4_dev *dev) goto err_qp_table_free; } + for (port = 1; port <= dev->caps.num_ports; port++) { + err = mlx4_SET_PORT(dev, port); + if (err) { + mlx4_err(dev, "Failed to set port %d, aborting\n", + port); + goto err_mcg_table_free; + } + } + + for (port = 0; port < dev->caps.num_ports; port++) { + mlx4_init_mac_table(dev, port); + mlx4_init_vlan_table(dev, port); + } + return 0; +err_mcg_table_free: + mlx4_cleanup_mcg_table(dev); + err_qp_table_free: mlx4_cleanup_qp_table(dev); @@ -692,22 +933,25 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); struct msix_entry entries[MLX4_NUM_EQ]; + int needed_vectors = MLX4_EQ_COMP_CPU0 + num_online_cpus(); int err; int i; if (msi_x) { - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < needed_vectors; ++i) entries[i].entry = i; - err = pci_enable_msix(dev->pdev, entries, ARRAY_SIZE(entries)); + err = pci_enable_msix(dev->pdev, entries, needed_vectors); if (err) { if (err > 0) - mlx4_info(dev, "Only %d MSI-X vectors available, " - "not using MSI-X\n", err); + mlx4_info(dev, "Only %d MSI-X vectors " + "available, need %d. " + "Not using MSI-X\n", + err, needed_vectors); goto no_msi; } - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < needed_vectors; ++i) priv->eq_table.eq[i].irq = entries[i].vector; dev->flags |= MLX4_FLAG_MSI_X; @@ -715,7 +959,7 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev) } no_msi: - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < needed_vectors; ++i) priv->eq_table.eq[i].irq = dev->pdev->irq; } @@ -798,6 +1042,9 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id) INIT_LIST_HEAD(&priv->ctx_list); spin_lock_init(&priv->ctx_lock); + INIT_LIST_HEAD(&priv->pgdir_list); + mutex_init(&priv->pgdir_mutex); + /* * Now reset the HCA before we touch the PCI capabilities or * attempt a firmware command, since a boot ROM may have left @@ -836,8 +1083,14 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id) pci_set_drvdata(pdev, dev); + if (device_create_file(&pdev->dev, &dev_attr_mlx4_port_type)) + goto sysfs_failed; + return 0; +sysfs_failed: + mlx4_unregister_device(dev); + err_cleanup: mlx4_cleanup_mcg_table(dev); mlx4_cleanup_qp_table(dev); @@ -893,6 +1146,7 @@ static void mlx4_remove_one(struct pci_dev *pdev) int p; if (dev) { + device_remove_file(&pdev->dev, &dev_attr_mlx4_port_type); mlx4_unregister_device(dev); for (p = 1; p <= dev->caps.num_ports; ++p) @@ -948,10 +1202,43 @@ static struct pci_driver mlx4_driver = { .remove = __devexit_p(mlx4_remove_one) }; +static int __init mlx4_verify_params(void) +{ + int i; + + for (i = 0; i < MLX4_MAX_PORTS; ++i) { + if (strcmp(mod_param_port_type[i], "eth") && + strcmp(mod_param_port_type[i], "ib")) { + printk(KERN_WARNING "mlx4_core: bad port_type for " + "port %d: %s\n", + i, mod_param_port_type[i]); + return -1; + } + } + if ((mod_param_num_mac < 1) || + (mod_param_num_mac > 127)) { + printk(KERN_WARNING "mlx4_core: bad num_mac: %d\n", + mod_param_num_mac); + return -1; + } + + if ((mod_param_num_vlan < 0) || + (mod_param_num_vlan > 126)) { + printk(KERN_WARNING "mlx4_core: bad num_vlan: %d\n", + mod_param_num_vlan); + return -1; + } + + return 0; +} + static int __init mlx4_init(void) { int ret; + if (mlx4_verify_params()) + return -EINVAL; + ret = mlx4_catas_init(); if (ret) return ret; diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index 7333681..2af3d07 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -64,8 +64,8 @@ enum { enum { MLX4_EQ_ASYNC, - MLX4_EQ_COMP, - MLX4_NUM_EQ + MLX4_EQ_COMP_CPU0, + MLX4_NUM_EQ = MLX4_EQ_COMP_CPU0 + NR_CPUS, }; enum { @@ -111,6 +111,7 @@ struct mlx4_bitmap { u32 last; u32 top; u32 max; + u32 effective_max; u32 mask; spinlock_t lock; unsigned long *table; @@ -210,6 +211,8 @@ struct mlx4_eq_table { void __iomem *uar_map[(MLX4_NUM_EQ + 6) / 4]; u32 clr_mask; struct mlx4_eq eq[MLX4_NUM_EQ]; + int num_comp_eqs; + int last_comp_eq; u64 icm_virt; struct page *icm_page; dma_addr_t icm_dma; @@ -250,6 +253,35 @@ struct mlx4_catas_err { struct list_head list; }; +struct mlx4_mac_table { +#define MLX4_MAX_MAC_NUM 128 +#define MLX4_MAC_MASK 0xffffffffffff +#define MLX4_MAC_VALID_SHIFT 63 +#define MLX4_MAC_TABLE_SIZE MLX4_MAX_MAC_NUM << 3 + __be64 entries[MLX4_MAX_MAC_NUM]; + int refs[MLX4_MAX_MAC_NUM]; + struct semaphore mac_sem; + int total; + int max; +}; + +struct mlx4_vlan_table { +#define MLX4_MAX_VLAN_NUM 126 +#define MLX4_VLAN_MASK 0xfff +#define MLX4_VLAN_VALID 1 << 31 +#define MLX4_VLAN_TABLE_SIZE MLX4_MAX_VLAN_NUM << 2 + __be32 entries[MLX4_MAX_VLAN_NUM]; + int refs[MLX4_MAX_VLAN_NUM]; + struct semaphore vlan_sem; + int total; + int max; +}; + +struct mlx4_port_info { + struct mlx4_mac_table mac_table; + struct mlx4_vlan_table vlan_table; +}; + struct mlx4_priv { struct mlx4_dev dev; @@ -257,6 +289,9 @@ struct mlx4_priv { struct list_head ctx_list; spinlock_t ctx_lock; + struct list_head pgdir_list; + struct mutex pgdir_mutex; + struct mlx4_fw fw; struct mlx4_cmd cmd; @@ -275,6 +310,7 @@ struct mlx4_priv { struct mlx4_uar driver_uar; void __iomem *kar; + struct mlx4_port_info port[MLX4_MAX_PORTS]; }; static inline struct mlx4_priv *mlx4_priv(struct mlx4_dev *dev) @@ -284,7 +320,12 @@ static inline struct mlx4_priv *mlx4_priv(struct mlx4_dev *dev) u32 mlx4_bitmap_alloc(struct mlx4_bitmap *bitmap); void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj); +u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align); +void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt); int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved); +int mlx4_bitmap_init_with_effective_max(struct mlx4_bitmap *bitmap, + u32 num, u32 mask, u32 reserved, + u32 effective_max); void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap); int mlx4_reset(struct mlx4_dev *dev); @@ -336,10 +377,15 @@ void mlx4_cmd_use_polling(struct mlx4_dev *dev); void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn); void mlx4_cq_event(struct mlx4_dev *dev, u32 cqn, int event_type); +void mlx4_init_mac_table(struct mlx4_dev *dev, u8 port); +void mlx4_init_vlan_table(struct mlx4_dev *dev, u8 port); + void mlx4_qp_event(struct mlx4_dev *dev, u32 qpn, int event_type); void mlx4_srq_event(struct mlx4_dev *dev, u32 srqn, int event_type); void mlx4_handle_catas_err(struct mlx4_dev *dev); +int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port); + #endif /* MLX4_H */ diff --git a/drivers/net/mlx4/mr.c b/drivers/net/mlx4/mr.c index 79b317b..2fbf6a3 100644 --- a/drivers/net/mlx4/mr.c +++ b/drivers/net/mlx4/mr.c @@ -52,7 +52,9 @@ struct mlx4_mpt_entry { __be64 length; __be32 lkey; __be32 win_cnt; - u8 reserved1[3]; + u8 reserved1; + u8 flags2; + u8 reserved2; u8 mtt_rep; __be64 mtt_seg; __be32 mtt_sz; @@ -68,6 +70,8 @@ struct mlx4_mpt_entry { #define MLX4_MTT_FLAG_PRESENT 1 +#define MLX4_MPT_FLAG2_FBO_EN (1 << 7) + #define MLX4_MPT_STATUS_SW 0xF0 #define MLX4_MPT_STATUS_HW 0x00 @@ -250,6 +254,21 @@ static int mlx4_HW2SW_MPT(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox !mailbox, MLX4_CMD_HW2SW_MPT, MLX4_CMD_TIME_CLASS_B); } +int mlx4_mr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd, + u64 iova, u64 size, u32 access, int npages, + int page_shift, struct mlx4_mr *mr) +{ + mr->iova = iova; + mr->size = size; + mr->pd = pd; + mr->access = access; + mr->enabled = 0; + mr->key = hw_index_to_key(mridx); + + return mlx4_mtt_init(dev, npages, page_shift, &mr->mtt); +} +EXPORT_SYMBOL_GPL(mlx4_mr_alloc_reserved); + int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access, int npages, int page_shift, struct mlx4_mr *mr) { @@ -261,14 +280,8 @@ int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access, if (index == -1) return -ENOMEM; - mr->iova = iova; - mr->size = size; - mr->pd = pd; - mr->access = access; - mr->enabled = 0; - mr->key = hw_index_to_key(index); - - err = mlx4_mtt_init(dev, npages, page_shift, &mr->mtt); + err = mlx4_mr_alloc_reserved(dev, index, pd, iova, size, + access, npages, page_shift, mr); if (err) mlx4_bitmap_free(&priv->mr_table.mpt_bitmap, index); @@ -276,9 +289,8 @@ int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access, } EXPORT_SYMBOL_GPL(mlx4_mr_alloc); -void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr) +void mlx4_mr_free_reserved(struct mlx4_dev *dev, struct mlx4_mr *mr) { - struct mlx4_priv *priv = mlx4_priv(dev); int err; if (mr->enabled) { @@ -290,6 +302,13 @@ void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr) } mlx4_mtt_cleanup(dev, &mr->mtt); +} +EXPORT_SYMBOL_GPL(mlx4_mr_free_reserved); + +void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + mlx4_mr_free_reserved(dev, mr); mlx4_bitmap_free(&priv->mr_table.mpt_bitmap, key_to_hw_index(mr->key)); } EXPORT_SYMBOL_GPL(mlx4_mr_free); @@ -435,8 +454,15 @@ int mlx4_init_mr_table(struct mlx4_dev *dev) struct mlx4_mr_table *mr_table = &mlx4_priv(dev)->mr_table; int err; - err = mlx4_bitmap_init(&mr_table->mpt_bitmap, dev->caps.num_mpts, - ~0, dev->caps.reserved_mrws); + if (!is_power_of_2(dev->caps.num_mpts)) + return -EINVAL; + + dev->caps.reserved_fexch_mpts_base = dev->caps.num_mpts - + (2 * dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH]); + err = mlx4_bitmap_init_with_effective_max(&mr_table->mpt_bitmap, + dev->caps.num_mpts, + ~0, dev->caps.reserved_mrws, + dev->caps.reserved_fexch_mpts_base); if (err) return err; @@ -544,6 +570,56 @@ int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list } EXPORT_SYMBOL_GPL(mlx4_map_phys_fmr); +int mlx4_map_phys_fmr_fbo(struct mlx4_dev *dev, + struct mlx4_fmr *fmr, + u64 *page_list, int npages, + u64 iova, u32 fbo, u32 len, + u32 *lkey, u32 *rkey) +{ + u32 key; + int i, err; + + err = mlx4_check_fmr(fmr, page_list, npages, iova); + if (err) + return err; + + ++fmr->maps; + + key = key_to_hw_index(fmr->mr.key); + + *lkey = *rkey = fmr->mr.key = hw_index_to_key(key); + + *(u8 *) fmr->mpt = MLX4_MPT_STATUS_SW; + + /* Make sure MPT status is visible before writing MTT entries */ + wmb(); + + for (i = 0; i < npages; ++i) + fmr->mtts[i] = cpu_to_be64(page_list[i] | + MLX4_MTT_FLAG_PRESENT); + + dma_sync_single(&dev->pdev->dev, fmr->dma_handle, + npages * sizeof(u64), DMA_TO_DEVICE); + + fmr->mpt->key = cpu_to_be32(key); + fmr->mpt->lkey = cpu_to_be32(key); + fmr->mpt->length = cpu_to_be64(len); + fmr->mpt->start = cpu_to_be64(iova); + fmr->mpt->first_byte_offset = cpu_to_be32(fbo & 0x001fffff); + fmr->mpt->flags2 = (fbo ? MLX4_MPT_FLAG2_FBO_EN : 0); + + /* Make MTT entries are visible before setting MPT status */ + wmb(); + + *(u8 *) fmr->mpt = MLX4_MPT_STATUS_HW; + + /* Make sure MPT status is visible before consumer can use FMR */ + wmb(); + + return 0; +} +EXPORT_SYMBOL_GPL(mlx4_map_phys_fmr_fbo); + int mlx4_fmr_alloc(struct mlx4_dev *dev, u32 pd, u32 access, int max_pages, int max_maps, u8 page_shift, struct mlx4_fmr *fmr) { @@ -586,6 +662,49 @@ err_free: } EXPORT_SYMBOL_GPL(mlx4_fmr_alloc); +int mlx4_fmr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, + u32 pd, u32 access, int max_pages, + int max_maps, u8 page_shift, struct mlx4_fmr *fmr) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + u64 mtt_seg; + int err = -ENOMEM; + + if (page_shift < 12 || page_shift >= 32) + return -EINVAL; + + /* All MTTs must fit in the same page */ + if (max_pages * sizeof *fmr->mtts > PAGE_SIZE) + return -EINVAL; + + fmr->page_shift = page_shift; + fmr->max_pages = max_pages; + fmr->max_maps = max_maps; + fmr->maps = 0; + + err = mlx4_mr_alloc_reserved(dev, mridx, pd, 0, 0, access, max_pages, + page_shift, &fmr->mr); + if (err) + return err; + + mtt_seg = fmr->mr.mtt.first_seg * dev->caps.mtt_entry_sz; + + fmr->mtts = mlx4_table_find(&priv->mr_table.mtt_table, + fmr->mr.mtt.first_seg, + &fmr->dma_handle); + if (!fmr->mtts) { + err = -ENOMEM; + goto err_free; + } + + return 0; + +err_free: + mlx4_mr_free_reserved(dev, &fmr->mr); + return err; +} +EXPORT_SYMBOL_GPL(mlx4_fmr_alloc_reserved); + int mlx4_fmr_enable(struct mlx4_dev *dev, struct mlx4_fmr *fmr) { struct mlx4_priv *priv = mlx4_priv(dev); @@ -634,6 +753,18 @@ int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr) } EXPORT_SYMBOL_GPL(mlx4_fmr_free); +int mlx4_fmr_free_reserved(struct mlx4_dev *dev, struct mlx4_fmr *fmr) +{ + if (fmr->maps) + return -EBUSY; + + fmr->mr.enabled = 0; + mlx4_mr_free_reserved(dev, &fmr->mr); + + return 0; +} +EXPORT_SYMBOL_GPL(mlx4_fmr_free_reserved); + int mlx4_SYNC_TPT(struct mlx4_dev *dev) { return mlx4_cmd(dev, 0, 0, 0, MLX4_CMD_SYNC_TPT, 1000); diff --git a/drivers/net/mlx4/port.c b/drivers/net/mlx4/port.c new file mode 100644 index 0000000..5e685ca --- /dev/null +++ b/drivers/net/mlx4/port.c @@ -0,0 +1,282 @@ +/* + * Copyright (c) 2007 Mellanox Technologies. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#include +#include + +#include + +#include "mlx4.h" + +void mlx4_init_mac_table(struct mlx4_dev *dev, u8 port) +{ + struct mlx4_mac_table *table = &mlx4_priv(dev)->port[port].mac_table; + int i; + + sema_init(&table->mac_sem, 1); + for (i = 0; i < MLX4_MAX_MAC_NUM; i++) { + table->entries[i] = 0; + table->refs[i] = 0; + } + table->max = 1 << dev->caps.log_num_macs; + table->total = 0; +} + +void mlx4_init_vlan_table(struct mlx4_dev *dev, u8 port) +{ + struct mlx4_vlan_table *table = &mlx4_priv(dev)->port[port].vlan_table; + int i; + + sema_init(&table->vlan_sem, 1); + for (i = 0; i < MLX4_MAX_MAC_NUM; i++) { + table->entries[i] = 0; + table->refs[i] = 0; + } + table->max = 1 << dev->caps.log_num_vlans; + table->total = 0; +} + +static int mlx4_SET_PORT_mac_table(struct mlx4_dev *dev, u8 port, + __be64 *entries) +{ + struct mlx4_cmd_mailbox *mailbox; + u32 in_mod; + int err; + + mailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(mailbox)) + return PTR_ERR(mailbox); + + memcpy(mailbox->buf, entries, MLX4_MAC_TABLE_SIZE); + + in_mod = MLX4_SET_PORT_MAC_TABLE << 8 | port; + err = mlx4_cmd(dev, mailbox->dma, in_mod, 1, MLX4_CMD_SET_PORT, + MLX4_CMD_TIME_CLASS_B); + + mlx4_free_cmd_mailbox(dev, mailbox); + return err; +} + +int mlx4_register_mac(struct mlx4_dev *dev, u8 port, u64 mac, int *index) +{ + struct mlx4_mac_table *table = + &mlx4_priv(dev)->port[port - 1].mac_table; + int i, err = 0; + int free = -1; + u64 valid = 1; + + mlx4_dbg(dev, "Registering mac : 0x%llx\n", mac); + down(&table->mac_sem); + for (i = 0; i < MLX4_MAX_MAC_NUM - 1; i++) { + if (free < 0 && !table->refs[i]) { + free = i; + continue; + } + + if (mac == (MLX4_MAC_MASK & be64_to_cpu(table->entries[i]))) { + /* Mac already registered, increase refernce count */ + *index = i; + ++table->refs[i]; + goto out; + } + } + mlx4_dbg(dev, "Free mac index is %d\n", free); + + if (table->total == table->max) { + /* No free mac entries */ + err = -ENOSPC; + goto out; + } + + /* Register new MAC */ + table->refs[free] = 1; + table->entries[free] = cpu_to_be64(mac | valid << MLX4_MAC_VALID_SHIFT); + + err = mlx4_SET_PORT_mac_table(dev, port, table->entries); + if (unlikely(err)) { + mlx4_err(dev, "Failed adding mac: 0x%llx\n", mac); + table->refs[free] = 0; + table->entries[free] = 0; + goto out; + } + + *index = free; + ++table->total; +out: + up(&table->mac_sem); + return err; +} +EXPORT_SYMBOL_GPL(mlx4_register_mac); + +void mlx4_unregister_mac(struct mlx4_dev *dev, u8 port, int index) +{ + struct mlx4_mac_table *table = + &mlx4_priv(dev)->port[port - 1].mac_table; + + down(&table->mac_sem); + if (!table->refs[index]) { + mlx4_warn(dev, "No mac entry for index %d\n", index); + goto out; + } + if (--table->refs[index]) { + mlx4_warn(dev, "Have more references for index %d," + "no need to modify mac table\n", index); + goto out; + } + table->entries[index] = 0; + mlx4_SET_PORT_mac_table(dev, port, table->entries); + --table->total; +out: + up(&table->mac_sem); +} +EXPORT_SYMBOL_GPL(mlx4_unregister_mac); + +static int mlx4_SET_PORT_vlan_table(struct mlx4_dev *dev, u8 port, + __be32 *entries) +{ + struct mlx4_cmd_mailbox *mailbox; + u32 in_mod; + int err; + + mailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(mailbox)) + return PTR_ERR(mailbox); + + memcpy(mailbox->buf, entries, MLX4_VLAN_TABLE_SIZE); + in_mod = MLX4_SET_PORT_VLAN_TABLE << 8 | port; + err = mlx4_cmd(dev, mailbox->dma, in_mod, 1, MLX4_CMD_SET_PORT, + MLX4_CMD_TIME_CLASS_B); + + mlx4_free_cmd_mailbox(dev, mailbox); + + return err; +} + +int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index) +{ + struct mlx4_vlan_table *table = + &mlx4_priv(dev)->port[port - 1].vlan_table; + int i, err = 0; + int free = -1; + + down(&table->vlan_sem); + for (i = 0; i < MLX4_MAX_VLAN_NUM; i++) { + if (free < 0 && (table->refs[i] == 0)) { + free = i; + continue; + } + + if (table->refs[i] && + (vlan == (MLX4_VLAN_MASK & + be32_to_cpu(table->entries[i])))) { + /* Vlan already registered, increase refernce count */ + *index = i; + ++table->refs[i]; + goto out; + } + } + + if (table->total == table->max) { + /* No free vlan entries */ + err = -ENOSPC; + goto out; + } + + /* Register new MAC */ + table->refs[free] = 1; + table->entries[free] = cpu_to_be32(vlan | MLX4_VLAN_VALID); + + err = mlx4_SET_PORT_vlan_table(dev, port, table->entries); + if (unlikely(err)) { + mlx4_warn(dev, "Failed adding vlan: %u\n", vlan); + table->refs[free] = 0; + table->entries[free] = 0; + goto out; + } + + *index = free; + ++table->total; +out: + up(&table->vlan_sem); + return err; +} +EXPORT_SYMBOL_GPL(mlx4_register_vlan); + +void mlx4_unregister_vlan(struct mlx4_dev *dev, u8 port, int index) +{ + struct mlx4_vlan_table *table = + &mlx4_priv(dev)->port[port - 1].vlan_table; + + down(&table->vlan_sem); + if (!table->refs[index]) { + mlx4_warn(dev, "No vlan entry for index %d\n", index); + goto out; + } + if (--table->refs[index]) { + mlx4_dbg(dev, "Have more references for index %d," + "no need to modify vlan table\n", index); + goto out; + } + table->entries[index] = 0; + mlx4_SET_PORT_vlan_table(dev, port, table->entries); + --table->total; +out: + up(&table->vlan_sem); +} +EXPORT_SYMBOL_GPL(mlx4_unregister_vlan); + +int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port) +{ + struct mlx4_cmd_mailbox *mailbox; + int err; + u8 is_eth = (dev->caps.port_type[port] == MLX4_PORT_TYPE_ETH) ? 1 : 0; + + mailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(mailbox)) + return PTR_ERR(mailbox); + + memset(mailbox->buf, 0, 256); + if (is_eth) { + ((u8 *) mailbox->buf)[3] = 7; + ((__be16 *) mailbox->buf)[3] = + cpu_to_be16(dev->caps.eth_mtu_cap[port] + + ETH_HLEN + ETH_FCS_LEN); + ((__be16 *) mailbox->buf)[4] = cpu_to_be16(1 << 15); + ((__be16 *) mailbox->buf)[6] = cpu_to_be16(1 << 15); + } + err = mlx4_cmd(dev, mailbox->dma, port, is_eth, MLX4_CMD_SET_PORT, + MLX4_CMD_TIME_CLASS_B); + + mlx4_free_cmd_mailbox(dev, mailbox); + return err; +} diff --git a/drivers/net/mlx4/qp.c b/drivers/net/mlx4/qp.c index fa24e65..1b2b7c4 100644 --- a/drivers/net/mlx4/qp.c +++ b/drivers/net/mlx4/qp.c @@ -147,19 +147,42 @@ int mlx4_qp_modify(struct mlx4_dev *dev, struct mlx4_mtt *mtt, } EXPORT_SYMBOL_GPL(mlx4_qp_modify); -int mlx4_qp_alloc(struct mlx4_dev *dev, int sqpn, struct mlx4_qp *qp) +int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + struct mlx4_qp_table *qp_table = &priv->qp_table; + int qpn; + + qpn = mlx4_bitmap_alloc_range(&qp_table->bitmap, cnt, align); + if (qpn == -1) + return -ENOMEM; + + *base = qpn; + return 0; +} +EXPORT_SYMBOL_GPL(mlx4_qp_reserve_range); + +void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + struct mlx4_qp_table *qp_table = &priv->qp_table; + if (base_qpn < dev->caps.sqp_start + 8) + return; + + mlx4_bitmap_free_range(&qp_table->bitmap, base_qpn, cnt); +} +EXPORT_SYMBOL_GPL(mlx4_qp_release_range); + +int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp) { struct mlx4_priv *priv = mlx4_priv(dev); struct mlx4_qp_table *qp_table = &priv->qp_table; int err; - if (sqpn) - qp->qpn = sqpn; - else { - qp->qpn = mlx4_bitmap_alloc(&qp_table->bitmap); - if (qp->qpn == -1) - return -ENOMEM; - } + if (!qpn) + return -EINVAL; + + qp->qpn = qpn; err = mlx4_table_get(dev, &qp_table->qp_table, qp->qpn); if (err) @@ -208,9 +231,6 @@ err_put_qp: mlx4_table_put(dev, &qp_table->qp_table, qp->qpn); err_out: - if (!sqpn) - mlx4_bitmap_free(&qp_table->bitmap, qp->qpn); - return err; } EXPORT_SYMBOL_GPL(mlx4_qp_alloc); @@ -240,8 +260,6 @@ void mlx4_qp_free(struct mlx4_dev *dev, struct mlx4_qp *qp) mlx4_table_put(dev, &qp_table->auxc_table, qp->qpn); mlx4_table_put(dev, &qp_table->qp_table, qp->qpn); - if (qp->qpn >= dev->caps.sqp_start + 8) - mlx4_bitmap_free(&qp_table->bitmap, qp->qpn); } EXPORT_SYMBOL_GPL(mlx4_qp_free); @@ -255,6 +273,7 @@ int mlx4_init_qp_table(struct mlx4_dev *dev) { struct mlx4_qp_table *qp_table = &mlx4_priv(dev)->qp_table; int err; + int reserved_from_top = 0; spin_lock_init(&qp_table->lock); INIT_RADIX_TREE(&dev->qp_table_tree, GFP_ATOMIC); @@ -264,9 +283,45 @@ int mlx4_init_qp_table(struct mlx4_dev *dev) * block of special QPs must be aligned to a multiple of 8, so * round up. */ - dev->caps.sqp_start = ALIGN(dev->caps.reserved_qps, 8); - err = mlx4_bitmap_init(&qp_table->bitmap, dev->caps.num_qps, - (1 << 24) - 1, dev->caps.sqp_start + 8); + dev->caps.sqp_start = + ALIGN(dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], 8); + + { + int sort[MLX4_QP_REGION_COUNT]; + int i, j, tmp; + int last_base = dev->caps.num_qps; + + for (i = 1; i < MLX4_QP_REGION_COUNT; ++i) + sort[i] = i; + + for (i = MLX4_QP_REGION_COUNT; i > 0; --i) { + for (j = 2; j < i; ++j) { + if (dev->caps.reserved_qps_cnt[sort[j]] > + dev->caps.reserved_qps_cnt[sort[j - 1]]) { + tmp = sort[j]; + sort[j] = sort[j - 1]; + sort[j - 1] = tmp; + } + } + } + + for (i = 1; i < MLX4_QP_REGION_COUNT; ++i) { + last_base -= dev->caps.reserved_qps_cnt[sort[i]]; + dev->caps.reserved_qps_base[sort[i]] = last_base; + reserved_from_top += + dev->caps.reserved_qps_cnt[sort[i]]; + } + + } + + err = mlx4_bitmap_init_with_effective_max(&qp_table->bitmap, + dev->caps.num_qps, + (1 << 23) - 1, + dev->caps.sqp_start + 8, + dev->caps.num_qps - + reserved_from_top); + + if (err) return err; @@ -279,6 +334,20 @@ void mlx4_cleanup_qp_table(struct mlx4_dev *dev) mlx4_bitmap_cleanup(&mlx4_priv(dev)->qp_table.bitmap); } +int mlx4_qp_get_region(struct mlx4_dev *dev, + enum qp_region region, + int *base_qpn, int *cnt) +{ + if ((region < 0) || (region >= MLX4_QP_REGION_COUNT)) + return -EINVAL; + + *base_qpn = dev->caps.reserved_qps_base[region]; + *cnt = dev->caps.reserved_qps_cnt[region]; + + return 0; +} +EXPORT_SYMBOL_GPL(mlx4_qp_get_region); + int mlx4_qp_query(struct mlx4_dev *dev, struct mlx4_qp *qp, struct mlx4_qp_context *context) { @@ -299,3 +368,35 @@ int mlx4_qp_query(struct mlx4_dev *dev, struct mlx4_qp *qp, } EXPORT_SYMBOL_GPL(mlx4_qp_query); +int mlx4_qp_to_ready(struct mlx4_dev *dev, + struct mlx4_mtt *mtt, + struct mlx4_qp_context *context, + struct mlx4_qp *qp, + enum mlx4_qp_state *qp_state) +{ +#define STATE_ARR_SIZE 4 + int err = 0; + int i; + enum mlx4_qp_state states[STATE_ARR_SIZE] = { + MLX4_QP_STATE_RST, + MLX4_QP_STATE_INIT, + MLX4_QP_STATE_RTR, + MLX4_QP_STATE_RTS + }; + + for (i = 0; i < STATE_ARR_SIZE - 1; i++) { + context->flags |= cpu_to_be32(states[i+1] << 28); + err = mlx4_qp_modify(dev, mtt, states[i], + states[i+1], context, 0, 0, qp); + if (err) { + mlx4_err(dev, "Failed to bring qp to state:" + "%d with error: %d\n", + states[i+1], err); + return err; + } + *qp_state = states[i+1]; + } + return 0; +} +EXPORT_SYMBOL_GPL(mlx4_qp_to_ready); + diff --git a/include/linux/mlx4/cmd.h b/include/linux/mlx4/cmd.h index 77323a7..cf9c679 100644 --- a/include/linux/mlx4/cmd.h +++ b/include/linux/mlx4/cmd.h @@ -132,6 +132,15 @@ enum { MLX4_MAILBOX_SIZE = 4096 }; +enum { + /* set port opcode modifiers */ + MLX4_SET_PORT_GENERAL = 0x0, + MLX4_SET_PORT_RQP_CALC = 0x1, + MLX4_SET_PORT_MAC_TABLE = 0x2, + MLX4_SET_PORT_VLAN_TABLE = 0x3, + MLX4_SET_PORT_PRIO_MAP = 0x4, +}; + struct mlx4_dev; struct mlx4_cmd_mailbox { diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index ff7df1a..2d08c4f 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -60,6 +60,7 @@ enum { MLX4_DEV_CAP_FLAG_IPOIB_CSUM = 1 << 7, MLX4_DEV_CAP_FLAG_BAD_PKEY_CNTR = 1 << 8, MLX4_DEV_CAP_FLAG_BAD_QKEY_CNTR = 1 << 9, + MLX4_DEV_CAP_FLAG_DPDP = 1 << 12, MLX4_DEV_CAP_FLAG_MEM_WINDOW = 1 << 16, MLX4_DEV_CAP_FLAG_APM = 1 << 17, MLX4_DEV_CAP_FLAG_ATOMIC = 1 << 18, @@ -133,6 +134,23 @@ enum { MLX4_STAT_RATE_OFFSET = 5 }; +enum qp_region { + MLX4_QP_REGION_FW = 0, + MLX4_QP_REGION_ETH_ADDR, + MLX4_QP_REGION_FC_ADDR, + MLX4_QP_REGION_FC_EXCH, + MLX4_QP_REGION_COUNT /* Must be last */ +}; + +enum mlx4_port_type { + MLX4_PORT_TYPE_IB = 1 << 0, + MLX4_PORT_TYPE_ETH = 1 << 1, +}; + +enum { + MLX4_NUM_FEXCH = 64 * 1024, +}; + static inline u64 mlx4_fw_ver(u64 major, u64 minor, u64 subminor) { return (major << 32) | (minor << 16) | subminor; @@ -142,7 +160,9 @@ struct mlx4_caps { u64 fw_ver; int num_ports; int vl_cap[MLX4_MAX_PORTS + 1]; - int mtu_cap[MLX4_MAX_PORTS + 1]; + int ib_mtu_cap[MLX4_MAX_PORTS + 1]; + u64 def_mac[MLX4_MAX_PORTS + 1]; + int eth_mtu_cap[MLX4_MAX_PORTS + 1]; int gid_table_len[MLX4_MAX_PORTS + 1]; int pkey_table_len[MLX4_MAX_PORTS + 1]; int local_ca_ack_delay; @@ -157,7 +177,6 @@ struct mlx4_caps { int max_rq_desc_sz; int max_qp_init_rdma; int max_qp_dest_rdma; - int reserved_qps; int sqp_start; int num_srqs; int max_srq_wqes; @@ -187,6 +206,13 @@ struct mlx4_caps { u16 stat_rate_support; u8 port_width_cap[MLX4_MAX_PORTS + 1]; int max_gso_sz; + int reserved_qps_cnt[MLX4_QP_REGION_COUNT]; + int reserved_qps_base[MLX4_QP_REGION_COUNT]; + int log_num_macs; + int log_num_vlans; + int log_num_prios; + enum mlx4_port_type port_type[MLX4_MAX_PORTS + 1]; + int reserved_fexch_mpts_base; }; struct mlx4_buf_list { @@ -208,6 +234,34 @@ struct mlx4_mtt { int page_shift; }; +enum { + MLX4_DB_PER_PAGE = PAGE_SIZE / 4 +}; + +struct mlx4_db_pgdir { + struct list_head list; + DECLARE_BITMAP(order0, MLX4_DB_PER_PAGE); + DECLARE_BITMAP(order1, MLX4_DB_PER_PAGE / 2); + unsigned long *bits[2]; + __be32 *db_page; + dma_addr_t db_dma; +}; + +struct mlx4_db { + __be32 *db; + struct mlx4_db_pgdir *pgdir; + dma_addr_t dma; + int index; + int order; +}; + + +struct mlx4_hwq_resources { + struct mlx4_db db; + struct mlx4_mtt mtt; + struct mlx4_buf buf; +}; + struct mlx4_mr { struct mlx4_mtt mtt; u64 iova; @@ -247,6 +301,7 @@ struct mlx4_cq { int arm_sn; int cqn; + int comp_eq_idx; atomic_t refcount; struct completion free; @@ -309,6 +364,36 @@ struct mlx4_init_port_param { u64 si_guid; }; +static inline void mlx4_query_steer_cap(struct mlx4_dev *dev, int *log_mac, + int *log_vlan, int *log_prio) +{ + *log_mac = dev->caps.log_num_macs; + *log_vlan = dev->caps.log_num_vlans; + *log_prio = dev->caps.log_num_prios; +} + +static inline u32 mlx4_get_ports_of_type(struct mlx4_dev *dev, + enum mlx4_port_type ptype) +{ + u32 ret = 0; + int i; + + for (i = 1; i <= dev->caps.num_ports; ++i) { + if (dev->caps.port_type[i] == ptype) + ret |= 1 << (i-1); + } + return ret; +} + +#define foreach_port(port, bitmap) \ + for ((port) = 1; (port) <= MLX4_MAX_PORTS; ++(port)) \ + if (bitmap & 1 << ((port)-1)) + +static inline int mlx4_get_fexch_mpts_base(struct mlx4_dev *dev) +{ + return dev->caps.reserved_fexch_mpts_base; +} + int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct, struct mlx4_buf *buf); void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf); @@ -332,8 +417,12 @@ int mlx4_mtt_init(struct mlx4_dev *dev, int npages, int page_shift, void mlx4_mtt_cleanup(struct mlx4_dev *dev, struct mlx4_mtt *mtt); u64 mlx4_mtt_addr(struct mlx4_dev *dev, struct mlx4_mtt *mtt); +int mlx4_mr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd, + u64 iova, u64 size, u32 access, int npages, + int page_shift, struct mlx4_mr *mr); int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access, int npages, int page_shift, struct mlx4_mr *mr); +void mlx4_mr_free_reserved(struct mlx4_dev *dev, struct mlx4_mr *mr); void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr); int mlx4_mr_enable(struct mlx4_dev *dev, struct mlx4_mr *mr); int mlx4_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt, @@ -341,11 +430,20 @@ int mlx4_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt, int mlx4_buf_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt, struct mlx4_buf *buf); +int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres, + struct device *dma_device, int size, int max_direct); +void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres, + struct device *dma_device, int size); + int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, - struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq); + struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq, + unsigned vector, int collapsed); void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq); -int mlx4_qp_alloc(struct mlx4_dev *dev, int sqpn, struct mlx4_qp *qp); +int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base); +void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt); + +int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp); void mlx4_qp_free(struct mlx4_dev *dev, struct mlx4_qp *qp); int mlx4_srq_alloc(struct mlx4_dev *dev, u32 pdn, struct mlx4_mtt *mtt, @@ -360,14 +458,26 @@ int mlx4_CLOSE_PORT(struct mlx4_dev *dev, int port); int mlx4_multicast_attach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16]); int mlx4_multicast_detach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16]); +int mlx4_register_mac(struct mlx4_dev *dev, u8 port, u64 mac, int *index); +void mlx4_unregister_mac(struct mlx4_dev *dev, u8 port, int index); +int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index); +void mlx4_unregister_vlan(struct mlx4_dev *dev, u8 port, int index); + int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list, int npages, u64 iova, u32 *lkey, u32 *rkey); +int mlx4_map_phys_fmr_fbo(struct mlx4_dev *dev, struct mlx4_fmr *fmr, + u64 *page_list, int npages, u64 iova, + u32 fbo, u32 len, u32 *lkey, u32 *rkey); int mlx4_fmr_alloc(struct mlx4_dev *dev, u32 pd, u32 access, int max_pages, int max_maps, u8 page_shift, struct mlx4_fmr *fmr); +int mlx4_fmr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd, + u32 access, int max_pages, int max_maps, + u8 page_shift, struct mlx4_fmr *fmr); int mlx4_fmr_enable(struct mlx4_dev *dev, struct mlx4_fmr *fmr); void mlx4_fmr_unmap(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u32 *lkey, u32 *rkey); int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr); +int mlx4_fmr_free_reserved(struct mlx4_dev *dev, struct mlx4_fmr *fmr); int mlx4_SYNC_TPT(struct mlx4_dev *dev); #endif /* MLX4_DEVICE_H */ diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h index a5e43fe..5a02980 100644 --- a/include/linux/mlx4/qp.h +++ b/include/linux/mlx4/qp.h @@ -151,7 +151,16 @@ struct mlx4_qp_context { u8 reserved4[2]; u8 mtt_base_addr_h; __be32 mtt_base_addr_l; - u32 reserved5[10]; + u8 VE; + u8 reserved5; + __be16 VFT_id_prio; + u8 reserved6; + u8 exch_size; + __be16 exch_base; + u8 VFT_hop_cnt; + u8 my_fc_id_idx; + __be16 reserved7; + u32 reserved8[7]; }; /* Which firmware version adds support for NEC (NoErrorCompletion) bit */ @@ -296,6 +305,10 @@ int mlx4_qp_modify(struct mlx4_dev *dev, struct mlx4_mtt *mtt, int mlx4_qp_query(struct mlx4_dev *dev, struct mlx4_qp *qp, struct mlx4_qp_context *context); +int mlx4_qp_to_ready(struct mlx4_dev *dev, struct mlx4_mtt *mtt, + struct mlx4_qp_context *context, + struct mlx4_qp *qp, enum mlx4_qp_state *qp_state); + static inline struct mlx4_qp *__mlx4_qp_lookup(struct mlx4_dev *dev, u32 qpn) { return radix_tree_lookup(&dev->qp_table_tree, qpn & (dev->caps.num_qps - 1)); @@ -303,4 +316,8 @@ static inline struct mlx4_qp *__mlx4_qp_lookup(struct mlx4_dev *dev, u32 qpn) void mlx4_qp_remove(struct mlx4_dev *dev, struct mlx4_qp *qp); +int mlx4_qp_get_region(struct mlx4_dev *dev, + enum qp_region region, + int *base_qpn, int *cnt); + #endif /* MLX4_QP_H */ -- 1.5.4 _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From yevgenyp at mellanox.co.il Wed Apr 16 01:05:45 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 16 Apr 2008 11:05:45 +0300 Subject: [ofa-general][PATCH] mlx4_ib: Multi Protocol support Message-ID: <4805B359.2070906@mellanox.co.il> Multi Protocol supplies the user with the ability to run Infiniband and Ethernet protocols on the same HCA (separately or at the same time). Main changes to mlx4_ib: 1. Mlx4_ib driver queries the low level driver for number of IB ports. 2. Qps are being reserved prior to being allocated. 3. Cq allocation API change. Signed-off-by: Yevgeny Petrilin Reviewed-by: Eli Cohen --- drivers/infiniband/hw/mlx4/cq.c | 2 +- drivers/infiniband/hw/mlx4/mad.c | 6 +++--- drivers/infiniband/hw/mlx4/main.c | 15 ++++++++++++--- drivers/infiniband/hw/mlx4/mlx4_ib.h | 2 ++ drivers/infiniband/hw/mlx4/qp.c | 9 +++++++++ 5 files changed, 27 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 3557e7e..912b35c 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -221,7 +221,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector } err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, uar, - cq->db.dma, &cq->mcq); + cq->db.dma, &cq->mcq, vector, 0); if (err) goto err_dbmap; diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c index 4c1e72f..d91ba56 100644 --- a/drivers/infiniband/hw/mlx4/mad.c +++ b/drivers/infiniband/hw/mlx4/mad.c @@ -297,7 +297,7 @@ int mlx4_ib_mad_init(struct mlx4_ib_dev *dev) int p, q; int ret; - for (p = 0; p < dev->dev->caps.num_ports; ++p) + for (p = 0; p < dev->num_ports; ++p) for (q = 0; q <= 1; ++q) { agent = ib_register_mad_agent(&dev->ib_dev, p + 1, q ? IB_QPT_GSI : IB_QPT_SMI, @@ -313,7 +313,7 @@ int mlx4_ib_mad_init(struct mlx4_ib_dev *dev) return 0; err: - for (p = 0; p < dev->dev->caps.num_ports; ++p) + for (p = 0; p < dev->num_ports; ++p) for (q = 0; q <= 1; ++q) if (dev->send_agent[p][q]) ib_unregister_mad_agent(dev->send_agent[p][q]); @@ -326,7 +326,7 @@ void mlx4_ib_mad_cleanup(struct mlx4_ib_dev *dev) struct ib_mad_agent *agent; int p, q; - for (p = 0; p < dev->dev->caps.num_ports; ++p) { + for (p = 0; p < dev->num_ports; ++p) { for (q = 0; q <= 1; ++q) { agent = dev->send_agent[p][q]; dev->send_agent[p][q] = NULL; diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 136c76c..fd0b8c0 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -112,7 +112,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, props->max_mr_size = ~0ull; props->page_size_cap = dev->dev->caps.page_size_cap; - props->max_qp = dev->dev->caps.num_qps - dev->dev->caps.reserved_qps; + props->max_qp = dev->dev->caps.num_qps - + dev->dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW]; props->max_qp_wr = dev->dev->caps.max_wqes; props->max_sge = min(dev->dev->caps.max_sq_sg, dev->dev->caps.max_rq_sg); @@ -552,11 +553,15 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) mutex_init(&ibdev->pgdir_mutex); ibdev->dev = dev; + ibdev->ports_map = mlx4_get_ports_of_type(dev, MLX4_PORT_TYPE_IB); strlcpy(ibdev->ib_dev.name, "mlx4_%d", IB_DEVICE_NAME_MAX); ibdev->ib_dev.owner = THIS_MODULE; ibdev->ib_dev.node_type = RDMA_NODE_IB_CA; - ibdev->ib_dev.phys_port_cnt = dev->caps.num_ports; + ibdev->num_ports = 0; + foreach_port(i, ibdev->ports_map) + ibdev->num_ports++; + ibdev->ib_dev.phys_port_cnt = ibdev->num_ports; ibdev->ib_dev.num_comp_vectors = 1; ibdev->ib_dev.dma_device = &dev->pdev->dev; @@ -670,7 +675,7 @@ static void mlx4_ib_remove(struct mlx4_dev *dev, void *ibdev_ptr) struct mlx4_ib_dev *ibdev = ibdev_ptr; int p; - for (p = 1; p <= dev->caps.num_ports; ++p) + for (p = 1; p <= ibdev->num_ports; ++p) mlx4_CLOSE_PORT(dev, p); mlx4_ib_mad_cleanup(ibdev); @@ -685,6 +690,10 @@ static void mlx4_ib_event(struct mlx4_dev *dev, void *ibdev_ptr, enum mlx4_dev_event event, int port) { struct ib_event ibev; + struct mlx4_ib_dev *ibdev = to_mdev((struct ib_device *) ibdev_ptr); + + if (port > ibdev->num_ports) + return; switch (event) { case MLX4_DEV_EVENT_PORT_UP: diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index 9e63732..7a8111c 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -173,6 +173,8 @@ struct mlx4_ib_ah { struct mlx4_ib_dev { struct ib_device ib_dev; struct mlx4_dev *dev; + u32 ports_map; + int num_ports; void __iomem *uar_map; struct list_head pgdir_list; diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index b75efae..59f7284 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -544,6 +544,11 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, } } + if (!sqpn) + err = mlx4_qp_reserve_range(dev->dev, 1, 1, &sqpn); + if (err) + goto err_wrid; + err = mlx4_qp_alloc(dev->dev, sqpn, &qp->mqp); if (err) goto err_wrid; @@ -654,6 +659,10 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp, mlx4_ib_unlock_cqs(send_cq, recv_cq); mlx4_qp_free(dev->dev, &qp->mqp); + + if (!is_sqp(dev, qp)) + mlx4_qp_release_range(dev->dev, qp->mqp.qpn, 1); + mlx4_mtt_cleanup(dev->dev, &qp->mtt); if (is_user) { -- 1.5.4 From dorfman.eli at gmail.com Wed Apr 16 01:22:04 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Wed, 16 Apr 2008 11:22:04 +0300 Subject: [ofa-general] Re: [Ips] Calculating the VA in iSER header In-Reply-To: References: <4804B03C.6060507@voltaire.com> Message-ID: <694d48600804160122l1cc97b8aka8986ee6deb7dec8@mail.gmail.com> According to Mike's explanation below it seems that we have a bug in iSER initiator. Fixing this bug will require a fix in the stgt iSER code. The problem is that the initiator send a VA which already includes an offset for the unsolicited data (which is wrong). In iser_initiator.c::iser_prepare_write_cmd the code looks like this: hdr->write_va = cpu_to_be64(regd_buf->reg.va + unsol_sz); we think that it should be modified to: hdr->write_va = cpu_to_be64(regd_buf->reg.va); Let's discuss this and verify we interpret the spec correctly. If agreed we will send a patch. Eli 2008/4/15 Mike Ko : > > VA is a concept introduced in an Infiniband annex to support iSER. It > appears in the expanded iSER header for Infiniband use only to support the > non-Zero Based Virtual Address (non-ZBVA) used in Infiniband vs the ZBVA > used in IETF. > > "The DataDescriptorOut describes the I/O buffer starting with the immediate > unsolicited data (if any), followed by the non-immediate unsolicited data > (if any) and solicited data." If non-ZBVA mode is used, then VA points to > the beginning of this buffer. So in your example, the VA field in the > expanded iSER header will be zero. Note that for IETF, ZBVA is assumed and > there is no provision to specify a different VA in the iSER header. > > Tagged offset (TO) refers to the offset within a tagged buffer in RDMA Write > and RDMA Read Request Messages. When sending non-immediate unsolicited > data, Send Message types are used and the TO field is not present. Instead, > the buffer offset is appropriately represented by the Buffer Offset field in > the SCSI Data-Out PDU. Note that Tagged Offset is not the same as write VA > and it does not appear in the iSER header. > > Mike > > > > Erez Zilber > Sent by: ips-bounces at ietf.org > > 04/15/2008 06:40 AM > > To ips at ietf.org > > cc > > Subject [Ips] Calculating the VA in iSER header > > > > > > > We're trying to understand what should be the write VA (tagged offset) > in the iSER header for WRITE commands. If unsolicited data is to be > sent, should the VA be the original VA or should it be original VA + > FirstBurstLength? > > > Example: > > > InitialR2T=No > > FirstBurstLength = 1000 > > > Base address of the registered buffer = 0 > > > Now, what should be the VA in the iSER header? 0 or 1000? > > > We read the following paragraph in the iSER spec, but didn't get an > answer from there: > > > * If there is solicited data to be transferred for the SCSI write or > bidirectional command, as indicated by the Expected Data Transfer > Length in the SCSI Command PDU exceeding the value of > UnsolicitedDataSize, the iSER layer at the initiator MUST do the > following: > > a. It MUST allocate a Write STag for the I/O Buffer defined by > the qualifier DataDescriptorOut. The DataDescriptorOut > describes the I/O buffer starting with the immediate > unsolicited data (if any), followed by the non-immediate > unsolicited data (if any) and solicited data. This means > that the BufferOffset for the SCSI Data-out for this > command is equal to the TO. This implies that a zero TO > for this STag points to the beginning of this I/O Buffer. > > > Thanks, > > -- > > ____________________________________________________________ > > Erez Zilber | 972-9-971-7689 > > Software Engineer, Storage Solutions > > Voltaire – _The Grid Backbone_ > > __ > > www.voltaire.com > > > > _______________________________________________ > Ips mailing list > Ips at ietf.org > https://www.ietf.org/mailman/listinfo/ips > > > _______________________________________________ > Ips mailing list > Ips at ietf.org > https://www.ietf.org/mailman/listinfo/ips > > From vlad at dev.mellanox.co.il Wed Apr 16 05:52:34 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 16 Apr 2008 15:52:34 +0300 Subject: [ofa-general] ofed_kernel git tree for OFED-1.4 (based on 2.6.25-rc7) Message-ID: <4805F692.1040101@dev.mellanox.co.il> Hi Ralph, I prepared ofed_kernel git tree: git://git.openfabrics.org/ofed_1_4/linux-2.6.git branch ofed_kernel. This tree merged with 2.6.25-rc7. Currently ofed_scripts/ofed_makedist.sh fails on ipath_0180_header_file_changes_to_support_IBA7220.patch: > ./ofed_scripts/ofed_makedist.sh git clone -q -s -n /local/scm/ofed-1.4/linux-2.6 /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11 Initialized empty Git repository in /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11/.git/ pushd /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11 /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11 /local/scm/ofed-1.4/linux-2.6 /local/scm/ofed-1.4/linux-2.6/ofed_scripts/ofed_checkout.sh 3bb85a2f1c15d1e58cd8b0b2da0577a3ab98977a cdbdfc5cc29c4add1a2d6967b137a3347112a199 >> /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log /local/scm/ofed-1.4/linux-2.6/ofed_scripts/ofed_patch.sh --with-backport=2.6.11 >> /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log Failed executing /local/scm/ofed-1.4/linux-2.6/ofed_scripts/ofed_patch.sh --with-backport=2.6.11 >> /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log Hunk #7 FAILED at 565. Hunk #8 succeeded at 582 (offset 1 line). Hunk #9 succeeded at 595 (offset 1 line). Hunk #10 FAILED at 613. Hunk #11 succeeded at 719 (offset 2 lines). Hunk #12 FAILED at 857. 3 out of 12 hunks FAILED -- rejects in file drivers/infiniband/hw/ipath/ipath_verbs.h Patch ipath_0180_header_file_changes_to_support_IBA7220.patch does not apply (enforce with -f) Failed executing /usr/bin/quiltBuild failed in /tmp/build-ofed_kernel-d23175 See log file /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log Should ipath patches be removed from the git tree (kernel_patches/fixes/ipath*)? Regards, Vladimir From amar.mudrankit at qlogic.com Wed Apr 16 07:25:23 2008 From: amar.mudrankit at qlogic.com (Amar Mudrankit (Contractor - )) Date: Wed, 16 Apr 2008 09:25:23 -0500 Subject: [ofa-general] Kernel Panic on Stress Testing of OFED-1.3 IPoIB Driver Message-ID: Hello, I observed a kernel panic while performing stress tests on IPoIB driver over OFED-1.3. The stress test was ran on a test setup consisting of one mthca machine and one connectX machine. 5 iperf(4 UDP and 1 TCP) streams were started over IPoIB interfaces on both machines. The details of the panic as well as test steps are captured in the bug: https://bugs.openfabrics.org//show_bug.cgi?id=1004 Thanks, Amar S Mudrankit -------------- next part -------------- An HTML attachment was scrubbed... URL: From pw at osc.edu Wed Apr 16 07:48:30 2008 From: pw at osc.edu (Pete Wyckoff) Date: Wed, 16 Apr 2008 10:48:30 -0400 Subject: [ofa-general] Re: [Ips] Calculating the VA in iSER header In-Reply-To: <694d48600804160122l1cc97b8aka8986ee6deb7dec8@mail.gmail.com> References: <4804B03C.6060507@voltaire.com> <694d48600804160122l1cc97b8aka8986ee6deb7dec8@mail.gmail.com> Message-ID: <20080416144830.GC23861@osc.edu> dorfman.eli at gmail.com wrote on Wed, 16 Apr 2008 11:22 +0300: > According to Mike's explanation below it seems that we have a bug in > iSER initiator. > Fixing this bug will require a fix in the stgt iSER code. > > The problem is that the initiator send a VA which already includes an > offset for the unsolicited data (which is wrong). > In iser_initiator.c::iser_prepare_write_cmd the code looks like this: > hdr->write_va = cpu_to_be64(regd_buf->reg.va + unsol_sz); > > we think that it should be modified to: > hdr->write_va = cpu_to_be64(regd_buf->reg.va); > > Let's discuss this and verify we interpret the spec correctly. > If agreed we will send a patch. Agree with the interpretation of the spec, and it's probably a bit clearer that way too. But we have working initiators and targets that do it the "wrong" way. The transition involved in fixing both sides will lead to problems. How does a target detect an unfixed initiator and vice versa? A mismatched pair will lead to data corruption. We could address this in a few ways: 1. Flag day: all initiators and targets change at the same time. Will see data corruption if someone unluckily runs one or the other using old non-fixed code. 2. Rewrite the IB Annex to codify what's done in practice, and don't "fix" any code. 3. Start using the Hello messages and extend them to specify if the VA marks the start of the buffer or the unsol offset. I really don't look forward to the bug reports we'll get from a flag da approach. Old linux versions tend to hang around for a very long time, and people are often reluctant to upgrade. -- Pete > 2008/4/15 Mike Ko : > > > > VA is a concept introduced in an Infiniband annex to support iSER. It > > appears in the expanded iSER header for Infiniband use only to support the > > non-Zero Based Virtual Address (non-ZBVA) used in Infiniband vs the ZBVA > > used in IETF. > > > > "The DataDescriptorOut describes the I/O buffer starting with the immediate > > unsolicited data (if any), followed by the non-immediate unsolicited data > > (if any) and solicited data." If non-ZBVA mode is used, then VA points to > > the beginning of this buffer. So in your example, the VA field in the > > expanded iSER header will be zero. Note that for IETF, ZBVA is assumed and > > there is no provision to specify a different VA in the iSER header. > > > > Tagged offset (TO) refers to the offset within a tagged buffer in RDMA Write > > and RDMA Read Request Messages. When sending non-immediate unsolicited > > data, Send Message types are used and the TO field is not present. Instead, > > the buffer offset is appropriately represented by the Buffer Offset field in > > the SCSI Data-Out PDU. Note that Tagged Offset is not the same as write VA > > and it does not appear in the iSER header. > > > > Mike > > > > Erez Zilber > > Sent by: ips-bounces at ietf.org > > > > 04/15/2008 06:40 AM > > > > To ips at ietf.org > > > > cc > > > > Subject [Ips] Calculating the VA in iSER header > > > > We're trying to understand what should be the write VA (tagged offset) > > in the iSER header for WRITE commands. If unsolicited data is to be > > sent, should the VA be the original VA or should it be original VA + > > FirstBurstLength? > > > > > > Example: > > > > > > InitialR2T=No > > > > FirstBurstLength = 1000 > > > > > > Base address of the registered buffer = 0 > > > > > > Now, what should be the VA in the iSER header? 0 or 1000? > > > > > > We read the following paragraph in the iSER spec, but didn't get an > > answer from there: > > > > > > * If there is solicited data to be transferred for the SCSI write or > > bidirectional command, as indicated by the Expected Data Transfer > > Length in the SCSI Command PDU exceeding the value of > > UnsolicitedDataSize, the iSER layer at the initiator MUST do the > > following: > > > > a. It MUST allocate a Write STag for the I/O Buffer defined by > > the qualifier DataDescriptorOut. The DataDescriptorOut > > describes the I/O buffer starting with the immediate > > unsolicited data (if any), followed by the non-immediate > > unsolicited data (if any) and solicited data. This means > > that the BufferOffset for the SCSI Data-out for this > > command is equal to the TO. This implies that a zero TO > > for this STag points to the beginning of this I/O Buffer. From rdreier at cisco.com Wed Apr 16 08:34:23 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 16 Apr 2008 08:34:23 -0700 Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support In-Reply-To: <4805B1C6.80004@mellanox.co.il> (Yevgeny Petrilin's message of "Wed, 16 Apr 2008 10:59:02 +0300") References: <4805B1C6.80004@mellanox.co.il> Message-ID: Your email has > Content-Type: text/plain; charset=ISO-8859-1; format=flowed and the format=flowed means that the patch gets corrupted and won't apply. So when you resend, please fix. I don't think we can really apply this as one patch -- it does too many things at once and needs to be split up... I think pretty much each of these items is independent and could be a separate patch: > 1. Mlx4 device now holds the actual protocol for each port. > The port types are determined through module parameters of through sysfs > interface. The requested types are verified with firmware capabilities > in order to determine the actual port protocol. > 2. The driver now manages Mac and Vlan tables used by customers of the low > level driver. Corresponding commands were added. > 3. Completion eq's are created per cpu. Created cq's are attached to an eq by > "Round Robin" algorithm, unless a specific eq was requested. > 4. Creation of a collapsed cq support was added. > 5. Additional reserved qp ranges were added. There is a range for the customers > of the low level driver (IB, Ethernet, FCoE). > 6. Qp allocation process changed. > First a qp range should be reserved, then qps can be allocated from that > range. This is to support the ability to allocate consecutive qps. > Appropriate changes were made in the allocation mechanism. > 7. Common actions to all HW resource management (Doorbell allocation, > Buffer allocation, Mtt write) were moved to the low level driver. Also, on the other hand, the current two patches are too split up: if I apply this patch then mlx4_ib won't compile until the second patch goes in too. Which means someone trying to bisect an mlx4 bug gets into trouble. So please make sure that everything still compiles and works after each patch is applied. By the way, the multiple EQ stuff is a pretty major change in behavior... are we really ready for this? Round robin seems like it could easily lead to worst-case behavior for some plausible workloads. Finally, checkpatch.pl shows a few minor whitespace problems... please fix when you resend. - R. From rdreier at cisco.com Wed Apr 16 08:42:47 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 16 Apr 2008 08:42:47 -0700 Subject: [ofa-general][PATCH] mlx4_ib: Multi Protocol support In-Reply-To: <4805B359.2070906@mellanox.co.il> (Yevgeny Petrilin's message of "Wed, 16 Apr 2008 11:05:45 +0300") References: <4805B359.2070906@mellanox.co.il> Message-ID: > Main changes to mlx4_ib: > 1. Mlx4_ib driver queries the low level driver for number of IB ports. > 2. Qps are being reserved prior to being allocated. > 3. Cq allocation API change. As I said before, these mlx4_ib changes should be rolled into the mlx4_core patches that change these interfaces. Also, I don't understand exactly how you're handling which ports are IB and which aren't. Have you tested this code in the case where port 1 is non-IB and port 2 is IB? It seems that you have a bitmap of which ports are IB: > + foreach_port(i, ibdev->ports_map) > + ibdev->num_ports++; (By the way, foreach_port() is too generic a name to expose, since it could easily collide with some general API -- I would use mlx4_foreach_port() instead) But then you do stuff like: > - for (p = 1; p <= dev->caps.num_ports; ++p) > + for (p = 1; p <= ibdev->num_ports; ++p) > mlx4_CLOSE_PORT(dev, p); which doesn't seem to work if you only have one IB port but it isn't port 1. I think there are two sane ways to handle non-IB ports in mlx4_ib: - Have mlx4_ib report the number of IB ports as phys_port_cnt and have an indirection table that maps from IB port # to physical HCA port # (to handle the case where only port 2 is IB, so you need to map IB port 1 to HCA physical port 2). This leads to some confusion with the real-world labels on ports I guess, and also I guess you need some SMA trickery to report the right port # to the SM. - Report the number of physical HCA ports as phys_port_cnt and just have non-IB ports always say they're DOWN. This makes changing config on the fly easier, since a port going from DOWN to INIT is a pretty normal thing. I guess there is a little bit of hackery involved in handling requests to mlx4_ib that involve non-IB ports. However your changes seem to take a third way and I don't understand how it can work. Perhaps you can clarify? - R. From rdreier at cisco.com Wed Apr 16 08:46:25 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 16 Apr 2008 08:46:25 -0700 Subject: [ofa-general] Re: [Ips] Calculating the VA in iSER header In-Reply-To: <20080416144830.GC23861@osc.edu> (Pete Wyckoff's message of "Wed, 16 Apr 2008 10:48:30 -0400") References: <4804B03C.6060507@voltaire.com> <694d48600804160122l1cc97b8aka8986ee6deb7dec8@mail.gmail.com> <20080416144830.GC23861@osc.edu> Message-ID: > Agree with the interpretation of the spec, and it's probably a bit > clearer that way too. But we have working initiators and targets > that do it the "wrong" way. Yes... I guess the key question is whether there are any initiators that do things the "right" way. > 1. Flag day: all initiators and targets change at the same time. > Will see data corruption if someone unluckily runs one or the other > using old non-fixed code. Seems unacceptable to me... it doesn't make sense at all to break every setup in the world just to be "right" according to the spec. > 2. Rewrite the IB Annex to codify what's done in practice, and don't > "fix" any code. If existing practice is universally to do things "wrong" then this seems to me by far the best way to proceed. > 3. Start using the Hello messages and extend them to specify if the > VA marks the start of the buffer or the unsol offset. this seems like a pain for not much benefit... every initiator and target needs new code to handle the negotiation, and you don't get anything except the satisfaction of following the letter of the spec. From rdreier at cisco.com Wed Apr 16 08:47:12 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 16 Apr 2008 08:47:12 -0700 Subject: [ofa-general] Kernel Panic on Stress Testing of OFED-1.3 IPoIB Driver In-Reply-To: (Amar Mudrankit's message of "Wed, 16 Apr 2008 09:25:23 -0500") References: Message-ID: > https://bugs.openfabrics.org//show_bug.cgi?id=1004 Has anyone tried this with an upstream kernel (rather than OFED-1.3)? 2.6.25-rc9 or my for-2.6.26 branch would both be useful. - R. From holt at sgi.com Wed Apr 16 09:33:37 2008 From: holt at sgi.com (Robin Holt) Date: Wed, 16 Apr 2008 11:33:37 -0500 Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen In-Reply-To: References: Message-ID: <20080416163337.GJ22493@sgi.com> I don't think this lock mechanism is completely working. I have gotten a few failures trying to dereference 0x100100 which appears to be LIST_POISON1. Thanks, Robin From jeff at splitrockpr.com Wed Apr 16 09:55:12 2008 From: jeff at splitrockpr.com (Jeffrey Scott) Date: Wed, 16 Apr 2008 09:55:12 -0700 Subject: [ofa-general] OpenFabrics Sonoma presentations now available Message-ID: <89260B536D004F29B5FD9E10996DEF13@Gaucho> Presentations from the Sonoma Workshop are now available for download on the OpenFabrics website. http://www.openfabrics.org/archives/april2008sonoma.htm ----------------------------------- Jeffrey Scott Split Rock Communications 408-884-4017 408-348-3651 Mobile 408-884-3900 Fax www.SplitRockPR.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From contact at cncmachinenet.com Wed Apr 16 08:17:46 2008 From: contact at cncmachinenet.com (hari alica) Date: Wed, 16 Apr 2008 15:17:46 +0000 Subject: [ofa-general] All Popular Watch Makes and Models. Message-ID: <000901c89fe4$06fc34fa$419423a3@otyvojws> Discover Our Range of Extraordinary Rolex Timepieces for Men and Women. Designer Watches http://nonesrumnarrow.com/ From oqhaoaldx at bobtheitguy.com Wed Apr 16 10:18:09 2008 From: oqhaoaldx at bobtheitguy.com (Sal Whitaker) Date: Thu, 17 Apr 2008 02:18:09 +0900 Subject: [ofa-general] Hi Message-ID: <01c8a031$474b0680$7ca262de@oqhaoaldx> + Eliminate all body fat + Tighten and Increase skin vitality + Increase muscle mass and energy + Increase your memory retention tenfold + Increase hair thickeness and growth + Increase metabolism for faster weight loss Feel young again today! http://www.ilevoyk.net/g/ From weiny2 at llnl.gov Wed Apr 16 10:47:29 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Wed, 16 Apr 2008 10:47:29 -0700 Subject: [ofa-general] OpenFabrics Sonoma presentations now available In-Reply-To: <89260B536D004F29B5FD9E10996DEF13@Gaucho> References: <89260B536D004F29B5FD9E10996DEF13@Gaucho> Message-ID: <20080416104729.6e203753.weiny2@llnl.gov> On Wed, 16 Apr 2008 09:55:12 -0700 "Jeffrey Scott" wrote: > Presentations from the Sonoma Workshop are now available for download on the > OpenFabrics website. > > > > http://www.openfabrics.org/archives/april2008sonoma.htm > Are you still waiting for slides from some participants? I don't see slides for Endance or "Experience with Ranger System". Thanks, Ira From ralph.campbell at qlogic.com Wed Apr 16 10:47:25 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 16 Apr 2008 10:47:25 -0700 Subject: [ofa-general] Re: ofed_kernel git tree for OFED-1.4 (based on 2.6.25-rc7) In-Reply-To: <4805F692.1040101@dev.mellanox.co.il> References: <4805F692.1040101@dev.mellanox.co.il> Message-ID: <1208368045.8715.187.camel@brick.pathscale.com> On Wed, 2008-04-16 at 15:52 +0300, Vladimir Sokolovsky wrote: > Hi Ralph, > I prepared ofed_kernel git tree: git://git.openfabrics.org/ofed_1_4/linux-2.6.git branch ofed_kernel. > This tree merged with 2.6.25-rc7. > Currently ofed_scripts/ofed_makedist.sh fails on ipath_0180_header_file_changes_to_support_IBA7220.patch: > > > ./ofed_scripts/ofed_makedist.sh > > git clone -q -s -n /local/scm/ofed-1.4/linux-2.6 /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11 > Initialized empty Git repository in /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11/.git/ > pushd /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11 > /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11 /local/scm/ofed-1.4/linux-2.6 /local/scm/ofed-1.4/linux-2.6/ofed_scripts/ofed_checkout.sh 3bb85a2f1c15d1e58cd8b0b2da0577a3ab98977a > cdbdfc5cc29c4add1a2d6967b137a3347112a199 >> /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log > /local/scm/ofed-1.4/linux-2.6/ofed_scripts/ofed_patch.sh --with-backport=2.6.11 >> /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log > Failed executing /local/scm/ofed-1.4/linux-2.6/ofed_scripts/ofed_patch.sh --with-backport=2.6.11 >> /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log > Hunk #7 FAILED at 565. > Hunk #8 succeeded at 582 (offset 1 line). > Hunk #9 succeeded at 595 (offset 1 line). > Hunk #10 FAILED at 613. > Hunk #11 succeeded at 719 (offset 2 lines). > Hunk #12 FAILED at 857. > 3 out of 12 hunks FAILED -- rejects in file drivers/infiniband/hw/ipath/ipath_verbs.h > Patch ipath_0180_header_file_changes_to_support_IBA7220.patch does not apply (enforce with -f) > > Failed executing /usr/bin/quiltBuild failed in /tmp/build-ofed_kernel-d23175 See log file /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log > > Should ipath patches be removed from the git tree (kernel_patches/fixes/ipath*)? > > Regards, > Vladimir No, I will take a look and fix things. Once 2.6.26 opens we can probably delete kernel_patches/fixes/ipath*. From clameter at sgi.com Wed Apr 16 11:35:38 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 16 Apr 2008 11:35:38 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen In-Reply-To: <20080416163337.GJ22493@sgi.com> References: <20080416163337.GJ22493@sgi.com> Message-ID: On Wed, 16 Apr 2008, Robin Holt wrote: > I don't think this lock mechanism is completely working. I have > gotten a few failures trying to dereference 0x100100 which appears to > be LIST_POISON1. How does xpmem unregistering of notifiers work? From Jeffrey.C.Becker at nasa.gov Wed Apr 16 11:38:34 2008 From: Jeffrey.C.Becker at nasa.gov (Jeff Becker) Date: Wed, 16 Apr 2008 11:38:34 -0700 Subject: [ofa-general] OpenFabrics Sonoma presentations now available In-Reply-To: <20080416104729.6e203753.weiny2@llnl.gov> References: <89260B536D004F29B5FD9E10996DEF13@Gaucho> <20080416104729.6e203753.weiny2@llnl.gov> Message-ID: <480647AA.60905@nasa.gov> Hi Ira. Ira Weiny wrote: > On Wed, 16 Apr 2008 09:55:12 -0700 > "Jeffrey Scott" wrote: > > >> Presentations from the Sonoma Workshop are now available for download on the >> OpenFabrics website. >> >> >> >> http://www.openfabrics.org/archives/april2008sonoma.htm >> >> > > Are you still waiting for slides from some participants? I don't see slides > for Endance or "Experience with Ranger System". > Yes, that is one of the few presentations I'm still waiting for. -jeff > Thanks, > Ira > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Wed Apr 16 11:49:22 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 16 Apr 2008 11:49:22 -0700 Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support In-Reply-To: <4805B1C6.80004@mellanox.co.il> (Yevgeny Petrilin's message of "Wed, 16 Apr 2008 10:59:02 +0300") References: <4805B1C6.80004@mellanox.co.il> Message-ID: You have > +static struct mlx4_db_pgdir *mlx4_alloc_db_pgdir(struct device *dma_device) > +{ > + struct mlx4_db_pgdir *pgdir; > + > + pgdir = kzalloc(sizeof *pgdir, GFP_KERNEL); > + if (!pgdir) > + return NULL; > + > + bitmap_fill(pgdir->order1, MLX4_DB_PER_PAGE / 2); and so on... If you're going to move the doorbell stuff from mlx4_ib to mlx4_core, that's fine, but really move it: you should remove the code from mlx4_ib and use the stuff in mlx4_core rather than having the same stuff duplicated in two places. Especially since as this patch stands now, there are *no* users for the doorbell code in mlx4_core. - R. From rdreier at cisco.com Wed Apr 16 11:52:27 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 16 Apr 2008 11:52:27 -0700 Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support In-Reply-To: <4805B1C6.80004@mellanox.co.il> (Yevgeny Petrilin's message of "Wed, 16 Apr 2008 10:59:02 +0300") References: <4805B1C6.80004@mellanox.co.il> Message-ID: > + if (vector == 0) { > + vector = priv->eq_table.last_comp_eq % > + priv->eq_table.num_comp_eqs + 1; > + priv->eq_table.last_comp_eq = vector; > + } The current IB code is written assuming that 0 is a normal completion vector I think. Making 0 be a special "round robin" value is a pretty big change of policy. Also there is no locking of last_comp_eq that I can see here, although maybe it doesn't matter. > + req_eqs = (dev->flags & MLX4_FLAG_MSI_X) ? num_online_cpus() : 1; I don't think num_online_cpus() is the right thing really... what if a CPU is hot-plugged later? num_possible_cpus() seems better to me. - R. From rdreier at cisco.com Wed Apr 16 11:53:12 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 16 Apr 2008 11:53:12 -0700 Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support In-Reply-To: <4805B1C6.80004@mellanox.co.il> (Yevgeny Petrilin's message of "Wed, 16 Apr 2008 10:59:02 +0300") References: <4805B1C6.80004@mellanox.co.il> Message-ID: > - .num_mpt = 1 << 17, > + .num_mpt = 1 << 18, Why this change? From rdreier at cisco.com Wed Apr 16 11:53:43 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 16 Apr 2008 11:53:43 -0700 Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support In-Reply-To: <4805B1C6.80004@mellanox.co.il> (Yevgeny Petrilin's message of "Wed, 16 Apr 2008 10:59:02 +0300") References: <4805B1C6.80004@mellanox.co.il> Message-ID: > +static int mod_param_num_mac = 1; > +module_param_named(num_mac, mod_param_num_mac, int, 0444); Why prefix these with "mod_param_"? Seems to make things a little harder to read. From rdreier at cisco.com Wed Apr 16 11:56:21 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 16 Apr 2008 11:56:21 -0700 Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support In-Reply-To: <4805B1C6.80004@mellanox.co.il> (Yevgeny Petrilin's message of "Wed, 16 Apr 2008 10:59:02 +0300") References: <4805B1C6.80004@mellanox.co.il> Message-ID: > +static int mod_param_if_eth = 1; > +module_param_named(if_eth, mod_param_if_eth, bool, 0444); > +MODULE_PARM_DESC(if_eth, "Enable ETH interface be loaded (0/1, default 1)"); > + > +static int mod_param_if_fc = 1; > +module_param_named(if_fc, mod_param_if_fc, bool, 0444); > +MODULE_PARM_DESC(if_fc, "Enable FC interface be loaded (0/1, default 1)"); I don't see any place where these values are checked. And I don't quite know why they would be necessary anyway. Why would someone want to set one of these to 0? Couldn't they get the same effect by just not loading the module in question? - R. From rdreier at cisco.com Wed Apr 16 12:00:33 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 16 Apr 2008 12:00:33 -0700 Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support In-Reply-To: <4805B1C6.80004@mellanox.co.il> (Yevgeny Petrilin's message of "Wed, 16 Apr 2008 10:59:02 +0300") References: <4805B1C6.80004@mellanox.co.il> Message-ID: > t mlx4_qp_to_ready(struct mlx4_dev *dev, > struct mlx4_mtt *mtt, > struct mlx4_qp_context *context, > struct mlx4_qp *qp, > enum mlx4_qp_state *qp_state) I don't see any callers of this function? > > +#define STATE_ARR_SIZE 4 > + int err = 0; > + int i; > + enum mlx4_qp_state states[STATE_ARR_SIZE] = { > + MLX4_QP_STATE_RST, > + MLX4_QP_STATE_INIT, > + MLX4_QP_STATE_RTR, > + MLX4_QP_STATE_RTS > + }; > + > + for (i = 0; i < STATE_ARR_SIZE - 1; i++) { I think it's more idiomatic to write this as: enum mlx4_qp_state states[] = { MLX4_QP_STATE_RST, MLX4_QP_STATE_INIT, MLX4_QP_STATE_RTR, MLX4_QP_STATE_RTS }; for (i = 0; i < ARRAY_SIZE(states) - 1; i++) { > + context->flags |= cpu_to_be32(states[i+1] << 28); Do you really want the |= here? INIT == 1, RTR == 2, so on the transition from INIT to RTR the value will be 1|2, ie 3. From holt at sgi.com Wed Apr 16 12:02:13 2008 From: holt at sgi.com (Robin Holt) Date: Wed, 16 Apr 2008 14:02:13 -0500 Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen In-Reply-To: References: <20080416163337.GJ22493@sgi.com> Message-ID: <20080416190213.GK22493@sgi.com> On Wed, Apr 16, 2008 at 11:35:38AM -0700, Christoph Lameter wrote: > On Wed, 16 Apr 2008, Robin Holt wrote: > > > I don't think this lock mechanism is completely working. I have > > gotten a few failures trying to dereference 0x100100 which appears to > > be LIST_POISON1. > > How does xpmem unregistering of notifiers work? For the tests I have been running, we are waiting for the release callout as part of exit. Thanks, Robin From bmaimone at maps-inc.org Wed Apr 16 10:18:37 2008 From: bmaimone at maps-inc.org (hogan regis) Date: Wed, 16 Apr 2008 17:18:37 +0000 Subject: [ofa-general] we caught you naked general! check the video Message-ID: <000501c89ff4$02f1aa61$a805bd9e@cmnvoi> Watch it :) AVaTPXLxCJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From clameter at sgi.com Wed Apr 16 12:15:08 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 16 Apr 2008 12:15:08 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen In-Reply-To: <20080416190213.GK22493@sgi.com> References: <20080416163337.GJ22493@sgi.com> <20080416190213.GK22493@sgi.com> Message-ID: On Wed, 16 Apr 2008, Robin Holt wrote: > On Wed, Apr 16, 2008 at 11:35:38AM -0700, Christoph Lameter wrote: > > On Wed, 16 Apr 2008, Robin Holt wrote: > > > > > I don't think this lock mechanism is completely working. I have > > > gotten a few failures trying to dereference 0x100100 which appears to > > > be LIST_POISON1. > > > > How does xpmem unregistering of notifiers work? > > For the tests I have been running, we are waiting for the release > callout as part of exit. Some more details on the failure may be useful. AFAICT list_del[_rcu] is the culprit here and that is only used on release or unregister. From rdreier at cisco.com Wed Apr 16 12:34:41 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 16 Apr 2008 12:34:41 -0700 Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support In-Reply-To: <4805B1C6.80004@mellanox.co.il> (Yevgeny Petrilin's message of "Wed, 16 Apr 2008 10:59:02 +0300") References: <4805B1C6.80004@mellanox.co.il> Message-ID: > +int mlx4_fmr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, > + u32 pd, u32 access, int max_pages, > + int max_maps, u8 page_shift, struct mlx4_fmr *fmr) So reading this over in more detail, I now really think it has to be split up. There are too many new things added without any users for it to be possible to review. - r. From yevgenyp at mellanox.co.il Wed Apr 16 13:15:53 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 16 Apr 2008 23:15:53 +0300 Subject: [ofa-general][PATCH] mlx4_ib: Multi Protocol support In-Reply-To: References: <4805B359.2070906@mellanox.co.il> Message-ID: <6C2C79E72C305246B504CBA17B5500C903CE58A4@mtlexch01.mtl.com> The mlx4_core driver doesn't allow the configuration you described. The mlx4_ib module can always assume that if it has only one IB port, It would always be port number 1. Yevgeny Petrilin Mellanox Technologies phone: +972-4-9097200 (ext. 7677) cell: +972-54-7839222 mailto: yevgenyp at mellanox.co.il -----Original Message----- From: Roland Dreier [mailto:rdreier at cisco.com] Sent: Wednesday, April 16, 2008 6:43 PM To: Yevgeny Petrilin Cc: general at lists.openfabrics.org Subject: Re: [ofa-general][PATCH] mlx4_ib: Multi Protocol support > Main changes to mlx4_ib: > 1. Mlx4_ib driver queries the low level driver for number of IB ports. > 2. Qps are being reserved prior to being allocated. > 3. Cq allocation API change. As I said before, these mlx4_ib changes should be rolled into the mlx4_core patches that change these interfaces. Also, I don't understand exactly how you're handling which ports are IB and which aren't. Have you tested this code in the case where port 1 is non-IB and port 2 is IB? It seems that you have a bitmap of which ports are IB: > + foreach_port(i, ibdev->ports_map) > + ibdev->num_ports++; (By the way, foreach_port() is too generic a name to expose, since it could easily collide with some general API -- I would use mlx4_foreach_port() instead) But then you do stuff like: > - for (p = 1; p <= dev->caps.num_ports; ++p) > + for (p = 1; p <= ibdev->num_ports; ++p) > mlx4_CLOSE_PORT(dev, p); which doesn't seem to work if you only have one IB port but it isn't port 1. I think there are two sane ways to handle non-IB ports in mlx4_ib: - Have mlx4_ib report the number of IB ports as phys_port_cnt and have an indirection table that maps from IB port # to physical HCA port # (to handle the case where only port 2 is IB, so you need to map IB port 1 to HCA physical port 2). This leads to some confusion with the real-world labels on ports I guess, and also I guess you need some SMA trickery to report the right port # to the SM. - Report the number of physical HCA ports as phys_port_cnt and just have non-IB ports always say they're DOWN. This makes changing config on the fly easier, since a port going from DOWN to INIT is a pretty normal thing. I guess there is a little bit of hackery involved in handling requests to mlx4_ib that involve non-IB ports. However your changes seem to take a third way and I don't understand how it can work. Perhaps you can clarify? - R. From rdreier at cisco.com Wed Apr 16 13:24:23 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 16 Apr 2008 13:24:23 -0700 Subject: [ofa-general][PATCH] mlx4_ib: Multi Protocol support In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903CE58A4@mtlexch01.mtl.com> (Yevgeny Petrilin's message of "Wed, 16 Apr 2008 23:15:53 +0300") References: <4805B359.2070906@mellanox.co.il> <6C2C79E72C305246B504CBA17B5500C903CE58A4@mtlexch01.mtl.com> Message-ID: > The mlx4_core driver doesn't allow the configuration you described. > The mlx4_ib module can always assume that if it has only one IB port, > It would always be port number 1. Hmm... why this limitation? - R. From rdreier at cisco.com Wed Apr 16 14:05:16 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 16 Apr 2008 14:05:16 -0700 Subject: [ofa-general] Pending libibverbs patches? In-Reply-To: <4805AB90.6060702@voltaire.com> (Or Gerlitz's message of "Wed, 16 Apr 2008 10:32:32 +0300") References: <48045BF3.8040305@voltaire.com> <4805AB90.6060702@voltaire.com> Message-ID: > If the section stating the different functions seems not useful it can > be removed, I will be happy to hear what other people think, anyway, > this section not what this man page is focusing on. I agree that more > has to be said on issues such as IB/iWARP differences, thread-safety, > fork, etc, so in case you prefer to see this "more" coming out before > merging anything, let it be, but please note that its really uneasy > for new comers to start programming to IB/iWARP without any man page > that gives some generation notion on what is this libibverbs. In that > respect, maybe you can merge the first portion of the page without the > function listing, and later we can add more info on the various > issues? OK, if you can send a verbs.7 page that includes what you see as the critical information then I can look at including it. By the way, I went through libibverbs and tried to make everything transport agnostic rather that talking only about InfiniBand. How does this diff look to people? diff --git a/README b/README index 0b1b114..848eb05 100644 --- a/README +++ b/README @@ -1,10 +1,11 @@ Introduction ============ -libibverbs is a library that allows programs to use InfiniBand "verbs" -for direct access to IB hardware from userspace. For more information -on verbs, see the InfiniBand Architecture Specification vol. 1, -especially chapter 11. +libibverbs is a library that allows programs to use RDMA "verbs" for +direct access to RDMA (currently InfiniBand and iWARP) hardware from +userspace. For more information on RDMA verbs, see the InfiniBand +Architecture Specification vol. 1, especially chapter 11, and the RDMA +Consortium's RDMA Protocol Verbs Specification. Using libibverbs ================ @@ -28,9 +29,9 @@ can be used. This will create device nodes named /dev/infiniband/uverbs0 -and so on. Since the InfiniBand userspace verbs should be safe for -use by non-privileged, you may want to add an appropriate MODE or -GROUP to your udev rule. +and so on. Since the RDMA userspace verbs should be safe for use by +non-privileged users, you may want to add an appropriate MODE or GROUP +to your udev rule. Permissions ----------- @@ -102,7 +103,7 @@ Bugs should be reported to the OpenFabrics mailing list * Information about your system: - Linux distribution and version - Linux kernel and version - - InfiniBand hardware and firmware version + - InfiniBand/iWARP hardware and firmware version - ... any other relevant information * How to reproduce the bug. Command line arguments for a libibverbs diff --git a/debian/changelog b/debian/changelog index 24582f0..982760d 100644 --- a/debian/changelog +++ b/debian/changelog @@ -4,9 +4,11 @@ libibverbs (1.1.1-2) unstable; urgency=low * Use DEB_DH_MAKESHLIBS_ARGS_ALL to pass appropriate -V option to dh_makeshlibs, since new symbols were added in libibverbs 1.1.0. (Closes: #465435) - * Add debian/watch file + * Add debian/watch file. + * Update control file to talk about generic RDMA and iWARP, not just + InfiniBand, since libibverbs works with both IB and iWARP. - -- Roland Dreier Wed, 12 Mar 2008 10:39:38 -0700 + -- Roland Dreier Wed, 16 Apr 2008 14:01:58 -0700 libibverbs (1.1.1-1) unstable; urgency=low diff --git a/debian/control.in b/debian/control.in index 62299fd..7cf933a 100644 --- a/debian/control.in +++ b/debian/control.in @@ -10,13 +10,14 @@ Package: libibverbs1 Section: libs Architecture: any Depends: ${shlibs:Depends}, ${misc:Depends}, adduser -Description: A library for direct userspace use of InfiniBand - libibverbs is a library that allows userspace processes to use - InfiniBand "verbs" as described in the InfiniBand Architecture - Specification. InfiniBand is a high-throughput, low-latency - networking technology. InfiniBand host channel adapters (HCAs) - commonly support direct hardware access from userspace (kernel - bypass), and libibverbs supports this when available. +Description: A library for direct userspace use of RDMA (InfiniBand/iWARP) + libibverbs is a library that allows userspace processes to use RDMA + "verbs" as described in the InfiniBand Architecture Specification and + the RDMA Protocol Verbs Specification. iWARP NICs support RDMA over + ethernet, while InfiniBand is a high-throughput, low-latency + networking technology. InfiniBand host channel adapters (HCAs) and + iWARP NICs commonly support direct hardware access from userspace + (kernel bypass), and libibverbs supports this when available. . For this library to be useful, a device-specific plug-in module should also be installed. @@ -28,12 +29,13 @@ Section: libdevel Architecture: any Depends: ${misc:Depends}, libibverbs1 (= ${binary:Version}) Description: Development files for the libibverbs library - libibverbs is a library that allows userspace processes to use - InfiniBand "verbs" as described in the InfiniBand Architecture - Specification. InfiniBand is a high-throughput, low-latency - networking technology. InfiniBand host channel adapters (HCAs) - commonly support direct hardware access from userspace (kernel - bypass), and libibverbs supports this when available. + libibverbs is a library that allows userspace processes to use RDMA + "verbs" as described in the InfiniBand Architecture Specification and + the RDMA Protocol Verbs Specification. iWARP NICs support RDMA over + ethernet, while InfiniBand is a high-throughput, low-latency + networking technology. InfiniBand host channel adapters (HCAs) and + iWARP NICs commonly support direct hardware access from userspace + (kernel bypass), and libibverbs supports this when available. . This package is needed to compile programs against libibverbs1. It contains the header files and static libraries (optionally) @@ -45,12 +47,13 @@ Priority: extra Architecture: any Depends: ${misc:Depends}, libibverbs1 (= ${binary:Version}) Description: Debugging symbols for the libibverbs library - libibverbs is a library that allows userspace processes to use - InfiniBand "verbs" as described in the InfiniBand Architecture - Specification. InfiniBand is a high-throughput, low-latency - networking technology. InfiniBand host channel adapters (HCAs) - commonly support direct hardware access from userspace (kernel - bypass), and libibverbs supports this when available. + libibverbs is a library that allows userspace processes to use RDMA + "verbs" as described in the InfiniBand Architecture Specification and + the RDMA Protocol Verbs Specification. iWARP NICs support RDMA over + ethernet, while InfiniBand is a high-throughput, low-latency + networking technology. InfiniBand host channel adapters (HCAs) and + iWARP NICs commonly support direct hardware access from userspace + (kernel bypass), and libibverbs supports this when available. . This package contains the debugging symbols associated with libibverbs1. They will automatically be used by gdb for debugging @@ -61,12 +64,13 @@ Section: net Architecture: any Depends: ${shlibs:Depends}, ${misc:Depends} Description: Examples for the libibverbs library - libibverbs is a library that allows userspace processes to use - InfiniBand "verbs" as described in the InfiniBand Architecture - Specification. InfiniBand is a high-throughput, low-latency - networking technology. InfiniBand host channel adapters (HCAs) - commonly support direct hardware access from userspace (kernel - bypass), and libibverbs supports this when available. + libibverbs is a library that allows userspace processes to use RDMA + "verbs" as described in the InfiniBand Architecture Specification and + the RDMA Protocol Verbs Specification. iWARP NICs support RDMA over + ethernet, while InfiniBand is a high-throughput, low-latency + networking technology. InfiniBand host channel adapters (HCAs) and + iWARP NICs commonly support direct hardware access from userspace + (kernel bypass), and libibverbs supports this when available. . This package contains useful libibverbs1 example programs such as ibv_devinfo, which displays information about InfiniBand devices. diff --git a/libibverbs.spec.in b/libibverbs.spec.in index ad57c61..f092b68 100644 --- a/libibverbs.spec.in +++ b/libibverbs.spec.in @@ -1,7 +1,7 @@ Name: libibverbs Version: 1.1.1 Release: 1%{?dist} -Summary: A library for direct userspace use of InfiniBand hardware +Summary: A library for direct userspace use of RDMA (InfiniBand/iWARP) hardware Group: System Environment/Libraries License: GPLv2 or BSD @@ -12,10 +12,10 @@ Requires(post): /sbin/ldconfig Requires(postun): /sbin/ldconfig %description -libibverbs is a library that allows userspace processes to use -InfiniBand "verbs" as described in the InfiniBand Architecture -Specification. This includes direct hardware access for fast path -operations. +libibverbs is a library that allows userspace processes to use RDMA +"verbs" as described in the InfiniBand Architecture Specification and +the RDMA Protocol Verbs Specification. This includes direct hardware +access for fast path operations. For this library to be useful, a device-specific plug-in module should also be installed. @@ -41,7 +41,7 @@ Requires: %{name} = %{version}-%{release} %description utils Useful libibverbs1 example programs such as ibv_devinfo, which -displays information about InfiniBand devices. +displays information about RDMA devices. %prep %setup -q -n %{name}- at VERSION@ diff --git a/man/ibv_alloc_pd.3 b/man/ibv_alloc_pd.3 index 017ab32..28b7953 100644 --- a/man/ibv_alloc_pd.3 +++ b/man/ibv_alloc_pd.3 @@ -13,7 +13,7 @@ ibv_alloc_pd, ibv_dealloc_pd \- allocate or deallocate a protection domain (PDs) .fi .SH "DESCRIPTION" .B ibv_alloc_pd() -allocates a PD for the InfiniBand device context +allocates a PD for the RDMA device context .I context\fR. .PP .B ibv_dealloc_pd() @@ -27,8 +27,8 @@ returns a pointer to the allocated PD, or NULL if the request fails. returns 0 on success, or the value of errno on failure (which indicates the failure reason). .SH "NOTES" .B ibv_dealloc_pd() -may fail if any other InfiniBand resource is still associated with the -PD being freed. +may fail if any other resource is still associated with the PD being +freed. .SH "SEE ALSO" .BR ibv_reg_mr (3), .BR ibv_create_srq (3), diff --git a/man/ibv_asyncwatch.1 b/man/ibv_asyncwatch.1 index aed316d..ece25f8 100644 --- a/man/ibv_asyncwatch.1 +++ b/man/ibv_asyncwatch.1 @@ -8,7 +8,7 @@ ibv_asyncwatch \- display asynchronous events .SH DESCRIPTION .PP -Display asynchronous events forwarded to userspace for an InfiniBand device. +Display asynchronous events forwarded to userspace for an RDMA device. .SH AUTHORS .TP diff --git a/man/ibv_create_ah_from_wc.3 b/man/ibv_create_ah_from_wc.3 index 487f053..bc5d135 100644 --- a/man/ibv_create_ah_from_wc.3 +++ b/man/ibv_create_ah_from_wc.3 @@ -21,7 +21,7 @@ address handle (AH) from a work completion .B ibv_init_ah_from_wc() initializes the address handle (AH) attribute structure .I ah_attr -for the InfiniBand device context +for the RDMA device context .I context using the port number .I port_num\fR, diff --git a/man/ibv_create_comp_channel.3 b/man/ibv_create_comp_channel.3 index e0e1e68..d8e17f1 100644 --- a/man/ibv_create_comp_channel.3 +++ b/man/ibv_create_comp_channel.3 @@ -15,7 +15,7 @@ destroy a completion event channel .fi .SH "DESCRIPTION" .B ibv_create_comp_channel() -creates a completion event channel for the InfiniBand device context +creates a completion event channel for the RDMA device context .I context\fR. .PP .B ibv_destroy_comp_channel() @@ -29,13 +29,14 @@ returns a pointer to the created completion event channel, or NULL if the reques returns 0 on success, or the value of errno on failure (which indicates the failure reason). .SH "NOTES" A "completion channel" is an abstraction introduced by libibverbs that -does not exist in the InfiniBand Architecture verbs specification. A -completion channel is essentially file descriptor that is used to -deliver completion notifications to a userspace process. When a -completion event is generated for a completion queue (CQ), the event -is delivered via the completion channel attached to that CQ. This may -be useful to steer completion events to different threads by using -multiple completion channels. +does not exist in the InfiniBand Architecture verbs specification or +RDMA Protocol Verbs Specification. A completion channel is +essentially file descriptor that is used to deliver completion +notifications to a userspace process. When a completion event is +generated for a completion queue (CQ), the event is delivered via the +completion channel attached to that CQ. This may be useful to steer +completion events to different threads by using multiple completion +channels. .PP .B ibv_destroy_comp_channel() fails if any CQs are still associated with the completion event diff --git a/man/ibv_create_cq.3 b/man/ibv_create_cq.3 index bb256d5..211feea 100644 --- a/man/ibv_create_cq.3 +++ b/man/ibv_create_cq.3 @@ -18,7 +18,7 @@ ibv_create_cq, ibv_destroy_cq \- create or destroy a completion queue (CQ) .B ibv_create_cq() creates a completion queue (CQ) with at least .I cqe -entries for the InfiniBand device context +entries for the RDMA device context .I context\fR. The pointer .I cq_context diff --git a/man/ibv_devices.1 b/man/ibv_devices.1 index 084d01a..99b27e5 100644 --- a/man/ibv_devices.1 +++ b/man/ibv_devices.1 @@ -1,14 +1,14 @@ .TH IBV_DEVICES 1 "August 30, 2005" "libibverbs" "USER COMMANDS" .SH NAME -ibv_devices \- list InfiniBand devices +ibv_devices \- list RDMA devices .SH SYNOPSIS .B ibv_devices .SH DESCRIPTION .PP -List InfiniBand devices available for use from userspace. +List RDMA devices available for use from userspace. .SH SEE ALSO .BR ibv_devinfo (1) diff --git a/man/ibv_devinfo.1 b/man/ibv_devinfo.1 index 5656e14..41878b2 100644 --- a/man/ibv_devinfo.1 +++ b/man/ibv_devinfo.1 @@ -1,7 +1,7 @@ .TH IBV_DEVINFO 1 "August 30, 2005" "libibverbs" "USER COMMANDS" .SH NAME -ibv_devinfo \- query InfiniBand devices +ibv_devinfo \- query RDMA devices .SH SYNOPSIS .B ibv_devinfo @@ -9,7 +9,7 @@ ibv_devinfo \- query InfiniBand devices .SH DESCRIPTION .PP -Print information about InfiniBand devices available for use from userspace. +Print information about RDMA devices available for use from userspace. .SH OPTIONS @@ -22,10 +22,10 @@ use IB device \fIDEVICE\fR (default first device found) query port \fIPORT\fR (default all ports) \fB\-l\fR, \fB\-\-list\fR -only list names of InfiniBand devices +only list names of RDMA devices \fB\-v\fR, \fB\-\-verbose\fR -print all available information about InfiniBand devices +print all available information about RDMA devices .SH SEE ALSO .BR ibv_devices (1) diff --git a/man/ibv_get_async_event.3 b/man/ibv_get_async_event.3 index 77e8be8..076f757 100644 --- a/man/ibv_get_async_event.3 +++ b/man/ibv_get_async_event.3 @@ -14,7 +14,7 @@ ibv_get_async_event, ibv_ack_async_event \- get or acknowledge asynchronous even .fi .SH "DESCRIPTION" .B ibv_get_async_event() -waits for the next async event of the InfiniBand device context +waits for the next async event of the RDMA device context .I context and returns it through the pointer .I event\fR, diff --git a/man/ibv_get_device_guid.3 b/man/ibv_get_device_guid.3 index 03f444a..98c0499 100644 --- a/man/ibv_get_device_guid.3 +++ b/man/ibv_get_device_guid.3 @@ -2,7 +2,7 @@ .\" .TH IBV_GET_DEVICE_GUID 3 2006-10-31 libibverbs "Libibverbs Programmer's Manual" .SH "NAME" -ibv_get_device_guid \- get an InfiniBand device's GUID +ibv_get_device_guid \- get an RDMA device's GUID .SH "SYNOPSIS" .nf .B #include @@ -11,7 +11,7 @@ ibv_get_device_guid \- get an InfiniBand device's GUID .fi .SH "DESCRIPTION" .B ibv_get_device_name() -returns the Global Unique IDentifier (GUID) of the InfiniBand device +returns the Global Unique IDentifier (GUID) of the RDMA device .I device\fR. .SH "RETURN VALUE" .B ibv_get_device_guid() diff --git a/man/ibv_get_device_list.3 b/man/ibv_get_device_list.3 index 4dd8180..104c137 100644 --- a/man/ibv_get_device_list.3 +++ b/man/ibv_get_device_list.3 @@ -2,7 +2,7 @@ .\" .TH IBV_GET_DEVICE_LIST 3 2006-10-31 libibverbs "Libibverbs Programmer's Manual" .SH "NAME" -ibv_get_device_list, ibv_free_device_list \- get and release list of available InfiniBand devices +ibv_get_device_list, ibv_free_device_list \- get and release list of available RDMA devices .SH "SYNOPSIS" .nf .B #include @@ -13,7 +13,7 @@ ibv_get_device_list, ibv_free_device_list \- get and release list of available I .fi .SH "DESCRIPTION" .B ibv_get_device_list() -returns a NULL-terminated array of InfiniBand devices currently available. +returns a NULL-terminated array of RDMA devices currently available. The argument .I num_devices is optional; if not NULL, it is set to the number of devices returned in the array. @@ -25,7 +25,7 @@ returned by .B ibv_get_device_list()\fR. .SH "RETURN VALUE" .B ibv_get_device_list() -returns the array of available InfiniBand devices, or NULL if the request fails. +returns the array of available RDMA devices, or NULL if the request fails. .PP .B ibv_free_device_list() returns no value. diff --git a/man/ibv_get_device_name.3 b/man/ibv_get_device_name.3 index c53f97d..284ea9f 100644 --- a/man/ibv_get_device_name.3 +++ b/man/ibv_get_device_name.3 @@ -2,7 +2,7 @@ .\" .TH IBV_GET_DEVICE_NAME 3 2006-10-31 libibverbs "Libibverbs Programmer's Manual" .SH "NAME" -ibv_get_device_name \- get an InfiniBand device's name +ibv_get_device_name \- get an RDMA device's name .SH "SYNOPSIS" .nf .B #include @@ -11,7 +11,7 @@ ibv_get_device_name \- get an InfiniBand device's name .fi .SH "DESCRIPTION" .B ibv_get_device_name() -returns a human-readable name associated with the InfiniBand device +returns a human-readable name associated with the RDMA device .I device\fR. .SH "RETURN VALUE" .B ibv_get_device_name() diff --git a/man/ibv_open_device.3 b/man/ibv_open_device.3 index 1858a42..61fa82b 100644 --- a/man/ibv_open_device.3 +++ b/man/ibv_open_device.3 @@ -2,7 +2,7 @@ .\" .TH IBV_OPEN_DEVICE 3 2006-10-31 libibverbs "Libibverbs Programmer's Manual" .SH "NAME" -ibv_open_device, ibv_close_device \- open and close an InfiniBand device context +ibv_open_device, ibv_close_device \- open and close an RDMA device context .SH "SYNOPSIS" .nf .B #include diff --git a/man/ibv_query_device.3 b/man/ibv_query_device.3 index f327769..3bf7511 100644 --- a/man/ibv_query_device.3 +++ b/man/ibv_query_device.3 @@ -2,7 +2,7 @@ .\" .TH IBV_QUERY_DEVICE 3 2006-10-31 libibverbs "Libibverbs Programmer's Manual" .SH "NAME" -ibv_query_device \- query an InfiniBand device's attributes +ibv_query_device \- query an RDMA device's attributes .SH "SYNOPSIS" .nf .B #include diff --git a/man/ibv_query_port.3 b/man/ibv_query_port.3 index fd61eb9..c6b3b63 100644 --- a/man/ibv_query_port.3 +++ b/man/ibv_query_port.3 @@ -2,7 +2,7 @@ .\" .TH IBV_QUERY_PORT 3 2006-10-31 libibverbs "Libibverbs Programmer's Manual" .SH "NAME" -ibv_query_port \- query an InfiniBand port's attributes +ibv_query_port \- query an RDMA port's attributes .SH "SYNOPSIS" .nf .B #include diff --git a/man/ibv_query_qp.3 b/man/ibv_query_qp.3 index fd1f41d..8da270e 100644 --- a/man/ibv_query_qp.3 +++ b/man/ibv_query_qp.3 @@ -68,7 +68,7 @@ returns 0 on success, or the value of errno on failure (which indicates the fail The argument .I attr_mask is a hint that specifies the minimum list of attributes to retrieve. -Some InfiniBand devices may return extra attributes not requested, for +Some RDMA devices may return extra attributes not requested, for example if the value can be returned cheaply. .PP Attribute values are valid if they have been set using From sean.hefty at intel.com Wed Apr 16 15:05:37 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 16 Apr 2008 15:05:37 -0700 Subject: [ofa-general] Pending libibverbs patches? In-Reply-To: References: <48045BF3.8040305@voltaire.com> <4805AB90.6060702@voltaire.com> Message-ID: <000201c8a00e$00f32460$7de0180a@amr.corp.intel.com> >+ the RDMA Protocol Verbs Specification. iWARP NICs support RDMA over >+ ethernet, while InfiniBand is a high-throughput, low-latency I'm not convinced this is really better for a high-level readme, but would saying "iWarp Ethernet NICs support RDMA over TCP" be clearer? I'm thinking about ConnectX, which provides RDMA over Ethernet, but doesn't use the iWarp protocol. And iWarp RDMA should be usable even if crossing non-Ethernet subnets. - Sean From eddiem at sgi.com Wed Apr 16 15:27:14 2008 From: eddiem at sgi.com (Edward Mascarenhas) Date: Wed, 16 Apr 2008 15:27:14 -0700 Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4 plans In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com> <47F4E0C3.2030100@voltaire.com> <1207233121.29024.410.camel@hrosenstock-ws.xsigo.com> <15ddcffd0804032117o21e6d62br9def3e46d4d513c4@mail.gmail.com> Message-ID: <48067D42.3020303@sgi.com> The SGI Altix ICE cluster system supports 2 InfiniBand fabrics. http://www.sgi.com/products/servers/altix/ice/ Each compute node has 2 HCAs and each is connected to a separate fabric. We recommend that users use one fabric for storage traffic and the other for MPI, but there is no reason why both fabrics could not be used for MPI. OpenMPI requires setting a separate subnet prefix for each fabric to use both fabrics for MPI and OpenSM supports this setting of subnet prefix. Other MPIs do not require this. Edward on 04/04/2008 08:08 AM Tang, Changqing said the following: > What I mean "claim to support" is to have more people to test with this config. > > --CQ > >> -----Original Message----- >> From: Or Gerlitz [mailto:or.gerlitz at gmail.com] >> Sent: Thursday, April 03, 2008 11:18 PM >> To: Tang, Changqing >> Cc: general at lists.openfabrics.org; ewg at lists.openfabrics.org >> Subject: Re: [ofa-general] Re: [ewg] OFED March 24 meeting >> summary on OFED 1.4 plans >> >> On Thu, Apr 3, 2008 at 5:40 PM, Tang, Changqing >> wrote: >> >>> The problem is, from MPI side, (and by default), we don't >> know which >>> port is on which fabric, since the subnet prefix is the >> same. We rely >>> on system admin to config two different subnet prefixes >> for HP-MPI to work. >>> No vendor has claimed to support this. >> CQ, not supporting a different subnet prefix per IB subnet is >> against IB nature, I don't think there should be any problem >> to configure a different prefix at each open SM instance and >> the Linux host stack would work perfectly under this config. >> If you are a ware to any problem in the opensm and/or the >> host stack please let the community know and the maintainers >> will fix it. >> >> Or. >> > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg From rdreier at cisco.com Wed Apr 16 15:54:13 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 16 Apr 2008 15:54:13 -0700 Subject: [ofa-general] Pending libibverbs patches? In-Reply-To: <000201c8a00e$00f32460$7de0180a@amr.corp.intel.com> (Sean Hefty's message of "Wed, 16 Apr 2008 15:05:37 -0700") References: <48045BF3.8040305@voltaire.com> <4805AB90.6060702@voltaire.com> <000201c8a00e$00f32460$7de0180a@amr.corp.intel.com> Message-ID: > I'm not convinced this is really better for a high-level readme, but would > saying "iWarp Ethernet NICs support RDMA over TCP" be clearer? I'm thinking > about ConnectX, which provides RDMA over Ethernet, but doesn't use the iWarp > protocol. And iWarp RDMA should be usable even if crossing non-Ethernet > subnets. It's a good point... I wasn't sure how to phrase things in the best way. All current iWARP NICs do TCP, but there is an IETF RFC for iWARP over SCTP too. So one could say "RDMA over IP" but then again the ConnectX ethernet thing is really RDMA over IP too... - R. From rdreier at cisco.com Wed Apr 16 21:26:32 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 16 Apr 2008 21:26:32 -0700 Subject: [ofa-general] Pending libibverbs patches? In-Reply-To: <000201c8a00e$00f32460$7de0180a@amr.corp.intel.com> (Sean Hefty's message of "Wed, 16 Apr 2008 15:05:37 -0700") References: <48045BF3.8040305@voltaire.com> <4805AB90.6060702@voltaire.com> <000201c8a00e$00f32460$7de0180a@amr.corp.intel.com> Message-ID: How about "iWARP ethernet NICs support RDMA over hardware-offloaded TCP"? From dorfman.eli at gmail.com Thu Apr 17 04:13:01 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Thu, 17 Apr 2008 14:13:01 +0300 Subject: [ofa-general] Re: [Ips] Calculating the VA in iSER header In-Reply-To: References: <4804B03C.6060507@voltaire.com> <694d48600804160122l1cc97b8aka8986ee6deb7dec8@mail.gmail.com> <20080416144830.GC23861@osc.edu> Message-ID: <694d48600804170413g4d54cd9g447abd345a1f6301@mail.gmail.com> On Wed, Apr 16, 2008 at 6:46 PM, Roland Dreier wrote: > > Agree with the interpretation of the spec, and it's probably a bit > > clearer that way too. But we have working initiators and targets > > that do it the "wrong" way. > > Yes... I guess the key question is whether there are any initiators that > do things the "right" way. > > > > 1. Flag day: all initiators and targets change at the same time. > > Will see data corruption if someone unluckily runs one or the other > > using old non-fixed code. > > Seems unacceptable to me... it doesn't make sense at all to break every > setup in the world just to be "right" according to the spec. This will break only when both initiator and target will use InitialR2T=No, which means allow unsolicited data. As far as I know, STGT is not very common (and its version in RHEL5.1 is considered experimental). Its default is also InitialR2T=Yes. Voltaire's iSCSI over iSER target also uses default InitialR2T=Yes. So it seems that nothing will break. > > > > 2. Rewrite the IB Annex to codify what's done in practice, and don't > > "fix" any code. > > If existing practice is universally to do things "wrong" then this seems > to me by far the best way to proceed. Assuming there aren't many iSER installation that currently work with unsolicited data, then it is the right time to do it right. Future implementation will rely on the spec and unless you modify the spec this will lead to greater confusion. From holt at sgi.com Thu Apr 17 04:14:04 2008 From: holt at sgi.com (Robin Holt) Date: Thu, 17 Apr 2008 06:14:04 -0500 Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen In-Reply-To: References: <20080416163337.GJ22493@sgi.com> <20080416190213.GK22493@sgi.com> Message-ID: <20080417111404.GL22493@sgi.com> On Wed, Apr 16, 2008 at 12:15:08PM -0700, Christoph Lameter wrote: > On Wed, 16 Apr 2008, Robin Holt wrote: > > > On Wed, Apr 16, 2008 at 11:35:38AM -0700, Christoph Lameter wrote: > > > On Wed, 16 Apr 2008, Robin Holt wrote: > > > > > > > I don't think this lock mechanism is completely working. I have > > > > gotten a few failures trying to dereference 0x100100 which appears to > > > > be LIST_POISON1. > > > > > > How does xpmem unregistering of notifiers work? > > > > For the tests I have been running, we are waiting for the release > > callout as part of exit. > > Some more details on the failure may be useful. AFAICT list_del[_rcu] is > the culprit here and that is only used on release or unregister. I think I have this understood now. It happens quite quickly (within 10 minutes) on a 128 rank job of small data set in a loop. In these failing jobs, all the ranks are nearly symmetric. There is a certain part of each ranks address space that has access granted. All the ranks have included all the other ranks including themselves in exactly the same layout at exactly the same virtual address. Rank 3 has hit _release and is beginning to clean up, but has not deleted the notifier from its list. Rank 9 calls the xpmem_invalidate_page() callout. That page was attached by rank 3 so we call zap_page_range on rank 3 which then calls back into xpmem's invalidate_range_start callout. The rank 3 _release callout begins and deletes its notifier from the list. Rank 9's call to rank 3's zap_page_range notifier returns and dereferences LIST_POISON1. I often confuse myself while trying to explain these so please kick me where the holes in the flow appear. The console output from the simple debugging stuff I put in is a bit overwhelming. I am trying to figure out now which locks we hold as part of the zap callout that should have prevented the _release callout. Thanks, Robin From liranl at mellanox.co.il Thu Apr 17 05:36:44 2008 From: liranl at mellanox.co.il (Liran Liss) Date: Thu, 17 Apr 2008 15:36:44 +0300 Subject: [ofa-general][PATCH] mlx4_ib: Multi Protocol support In-Reply-To: Message-ID: <40FA0A8088E8A441973D37502F00933E39FD@mtlexch01.mtl.com> > > I think there are two sane ways to handle non-IB ports in mlx4_ib: > > - Have mlx4_ib report the number of IB ports as phys_port_cnt and have > an indirection table that maps from IB port # to physical HCA port # > (to handle the case where only port 2 is IB, so you need to map IB > port 1 to HCA physical port 2). This leads to some confusion with > the real-world labels on ports I guess, and also I guess you need > some SMA trickery to report the right port # to the SM. > > - Report the number of physical HCA ports as phys_port_cnt and just > have non-IB ports always say they're DOWN. This makes changing > config on the fly easier, since a port going from DOWN to INIT is a > pretty normal thing. I guess there is a little bit of hackery > involved in handling requests to mlx4_ib that involve non-IB ports. > > However your changes seem to take a third way and I don't understand how > it can work. Perhaps you can clarify? > > - R. We intend to handle non-IB ports (Ethernet) just as IB ports, where all IB traffic that passes through Ethernet ports is IBoE. So basically, we will register ConnectX as a dual-ported HCA for all configurations. Many ULPs would run transparently on IB or IBoE, depending on the port type. In addition, port numbers always remain true to their physical ports. Until the IBoE implementation is completed, we temporarily disallow the configuration in which port 1 is eth and port 2 is ib. This allows us to register ConnectX as a single-port HCA to the ib core when port 2 is eth, without the aforementioned (and temporary) hacks. --Liran From liranl at mellanox.co.il Thu Apr 17 05:59:48 2008 From: liranl at mellanox.co.il (Liran Liss) Date: Thu, 17 Apr 2008 15:59:48 +0300 Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support In-Reply-To: Message-ID: <40FA0A8088E8A441973D37502F00933E39FE@mtlexch01.mtl.com> > > + if (vector == 0) { > > + vector = priv->eq_table.last_comp_eq % > > + priv->eq_table.num_comp_eqs + 1; > > + priv->eq_table.last_comp_eq = vector; > > + } > > The current IB code is written assuming that 0 is a normal completion > vector I think. Making 0 be a special "round robin" value is a pretty > big change of policy. > This is a change in policy that was unknown and not configured anywhere... Generally, distributing the interrupt load (and the software interrupt handling associated with it) among all CPUs is a good thing, especially when the ULPs using theses interrupts are unrelated. For example, distributing TCP flows among multiple cores is important for 10GE devices to sustain wire-speed with lots of connections. So, for applications that don't care how many vectors are there and which vector they want to use, we should support some VECTOR_ANY value that enables mlx4_core to optimize and load balance the interrupt load. A round-robin scheme seems like a good start. We could also initially make the VECTOR_ANY policy a module parameter (i.e., use either CPU0 or round-robin) until we obtain more experience with actual deployments. As for the VECTOR_ANY value, we can make it 0 (good for "porting" all existing ULPs and user-apps but doesn't match the CPU numbering, which is zero based) or some other designated value, e.g., 0xff (will require modifying all ULPs that don't use specific vectors). Any preferences? --Liran From yevgenyp at mellanox.co.il Thu Apr 17 06:03:30 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Thu, 17 Apr 2008 16:03:30 +0300 Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support In-Reply-To: References: <4805B1C6.80004@mellanox.co.il> Message-ID: <6C2C79E72C305246B504CBA17B5500C903D36D43@mtlexch01.mtl.com> Thank you for the comrehensive review. I will split the patches by topics and send them separately, along with the fixes to other remarks you made Yevgeny -----Original Message----- From: Roland Dreier [mailto:rdreier at cisco.com] Sent: Wednesday, April 16, 2008 6:34 PM To: Yevgeny Petrilin Cc: general at lists.openfabrics.org Subject: Re: [ofa-general][PATCH] mlx4_core: Multi Protocol support Your email has > Content-Type: text/plain; charset=ISO-8859-1; format=flowed and the format=flowed means that the patch gets corrupted and won't apply. So when you resend, please fix. I don't think we can really apply this as one patch -- it does too many things at once and needs to be split up... I think pretty much each of these items is independent and could be a separate patch: > 1. Mlx4 device now holds the actual protocol for each port. > The port types are determined through module parameters of through sysfs > interface. The requested types are verified with firmware capabilities > in order to determine the actual port protocol. > 2. The driver now manages Mac and Vlan tables used by customers of the low > level driver. Corresponding commands were added. > 3. Completion eq's are created per cpu. Created cq's are attached to an eq by > "Round Robin" algorithm, unless a specific eq was requested. > 4. Creation of a collapsed cq support was added. > 5. Additional reserved qp ranges were added. There is a range for the customers > of the low level driver (IB, Ethernet, FCoE). > 6. Qp allocation process changed. > First a qp range should be reserved, then qps can be allocated from that > range. This is to support the ability to allocate consecutive qps. > Appropriate changes were made in the allocation mechanism. > 7. Common actions to all HW resource management (Doorbell allocation, > Buffer allocation, Mtt write) were moved to the low level driver. Also, on the other hand, the current two patches are too split up: if I apply this patch then mlx4_ib won't compile until the second patch goes in too. Which means someone trying to bisect an mlx4 bug gets into trouble. So please make sure that everything still compiles and works after each patch is applied. By the way, the multiple EQ stuff is a pretty major change in behavior... are we really ready for this? Round robin seems like it could easily lead to worst-case behavior for some plausible workloads. Finally, checkpatch.pl shows a few minor whitespace problems... please fix when you resend. - R. From amirv at mellanox.co.il Thu Apr 17 06:23:00 2008 From: amirv at mellanox.co.il (Amir Vadai) Date: Thu, 17 Apr 2008 16:23:00 +0300 Subject: [ofa-general] CM goes to timewait state without waiting for disconnect reply Message-ID: <6C2C79E72C305246B504CBA17B5500C903D36D79@mtlexch01.mtl.com> Sean Hi, I'm working on some SDP bugs in OFED 1.3 In the spec, a normal flow to close a connection is at the client side: State "Established" ---- send DREQ ---> State "DREQ sent" --- receive DREP ---> State "TimeWait" ---> State "Idle" According to the code and tests I did, it seems that ib_cm doesn't wait for DREP and goes directly from "DREQ sent" into "TimeWait". This is obviously not good - Because the client might think the connection is closed while the CM in the server side isn't in listen/timewait mode. I think that this is a bug, am I right? --- Amir -------------- next part -------------- An HTML attachment was scrubbed... URL: From moshek at voltaire.com Thu Apr 17 06:47:35 2008 From: moshek at voltaire.com (Moshe Kazir) Date: Thu, 17 Apr 2008 16:47:35 +0300 Subject: [ofa-general] Starting openibd before the network service In-Reply-To: <4805F692.1040101@dev.mellanox.co.il> References: <4805F692.1040101@dev.mellanox.co.il> Message-ID: <39C75744D164D948A170E9792AF8E7CAC5AF06@exil.voltaire.com> >From bonding and ipoib point of view, it's better to have openibd started before the network service is started . In the openibd script we find that in SUSE network service is started before openibd -> ### BEGIN INIT INFO # Provides: openibd # Required-Start: $local_fs $network Can someone explain why ? Can we change it before OFED-1.3.1 ? Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com From ufkyus at bldgpreservation.com Thu Apr 17 06:46:50 2008 From: ufkyus at bldgpreservation.com (Norris Blanco) Date: Thu, 17 Apr 2008 14:46:50 +0100 Subject: [ofa-general] Re: Hello you can do more Message-ID: <01c8a099$de4ba900$ce853055@ufkyus> More wonderful nights. More relations. More love. You can get it. Get it! Watch attached file for our site and information. -------------- next part -------------- A non-text attachment was scrubbed... Name: my_file.zip Type: application/zip Size: 330 bytes Desc: not available URL: From hrosenstock at xsigo.com Thu Apr 17 07:30:08 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Thu, 17 Apr 2008 07:30:08 -0700 Subject: [ofa-general] mlx4_core internal error with OFED 1.2.5.4 Message-ID: <1208442608.26936.143.camel@hrosenstock-ws.xsigo.com> Hi, I'm running OFED 1.2.5.4 and got the following: mlx4_core 0000:01:00.0: Internal error detected: mlx4_core 0000:01:00.0: buf[00]: 00020000 mlx4_core 0000:01:00.0: buf[01]: c0010eb6 mlx4_core 0000:01:00.0: buf[02]: 20030000 mlx4_core 0000:01:00.0: buf[03]: 00000000 mlx4_core 0000:01:00.0: buf[04]: 00000000 mlx4_core 0000:01:00.0: buf[05]: 00000000 mlx4_core 0000:01:00.0: buf[06]: 00000000 mlx4_core 0000:01:00.0: buf[07]: 00000000 mlx4_core 0000:01:00.0: buf[08]: 00000000 mlx4_core 0000:01:00.0: buf[09]: 00000000 mlx4_core 0000:01:00.0: buf[0a]: 00000000 mlx4_core 0000:01:00.0: buf[0b]: 00000000 mlx4_core 0000:01:00.0: buf[0c]: 00000000 mlx4_core 0000:01:00.0: buf[0d]: 00000000 mlx4_core 0000:01:00.0: buf[0e]: 00000000 mlx4_core 0000:01:00.0: buf[0f]: 00000000 Is there any more information that can be provided by decoding this as to what the error was ? Thanks. -- Hal From rdreier at cisco.com Thu Apr 17 07:53:33 2008 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 17 Apr 2008 07:53:33 -0700 Subject: [ofa-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will get the first batch of things queued for 2.6.26: sparse cleanups, new HW support for the ipath driver, IPoIB updates, and miscellaneous fixes all over. Arthur Jones (7): IB/ipath: Fix sparse warning about pointer signedness IB/ipath: Misc sparse warning cleanup IB/ipath: Provide I/O bus speeds for diagnostic purposes IB/ipath: Fix link up LED display IB/ipath: User mode send DMA header file IB/ipath: User mode send DMA IB/ipath: Misc changes to prepare for IB7220 introduction Dave Olson (10): IB/ipath: Make some constants chip-specific, related cleanup IB/ipath: Shared context code needs to be sure device is usable IB/ipath: Enable 4KB MTU IB/ipath: HW workaround for case where chip can send but not receive IB/ipath: Make link state transition code ignore (transient) link recovery IB/ipath: Add support for IBTA 1.2 Heartbeat IB/ipath: Set LID filtering for HCAs that support it. IB/ipath: Enable reduced PIO update for HCAs that support it. IB/ipath: Fix check for no interrupts to reliably fallback to INTx IB/ipath: add calls to new 7220 code and enable in build David Dillow (1): IB/srp: Enforce protocol limit on srp_sg_tablesize Dotan Barak (3): IB/core: Check optional verbs before using them IB/mthca: Update QP state if query QP succeeds IB/mlx4: Update QP state if query QP succeeds Eli Cohen (13): IPoIB: Use checksum offload support if available IB/mlx4: Add IPoIB checksum offload support IB/mthca: Add IPoIB checksum offload support IB/core: Add creation flags to struct ib_qp_init_attr IB/core: Add IPoIB UD LSO support IPoIB: Add LSO support IB/mlx4: Add IPoIB LSO support IPoIB: Add basic ethtool support IB/core: Add support for modify CQ IPoIB: Support modifying IPoIB CQ event moderation IB/mlx4: Add support for modifying CQ moderation parameters IB/mlx4: Fix race when detaching a QP from a multicast group IB/mlx4: Fix incorrect comment Erez Zilber (2): IB/iser: Release connection resources on RDMA_CM_EVENT_DEVICE_REMOVAL event IB/iser: Don't change itt endianness Harvey Harrison (1): IB: Replace remaining __FUNCTION__ occurrences with __func__ Hoang-Nam Nguyen (1): IB/ehca: Remove tgid checking Jack Morgenstein (3): mlx4_core: Increase max number of QPs to 128K IB/mthca: Update module version and release date IB/mlx4: Update module version and release date John Gregor (2): IB/ipath: Head of Line blocking vs forward progress of user apps IB/ipath: Add code for IBA7220 send DMA Julia Lawall (1): RDMA/iwcm: Test rdma_create_id() for IS_ERR rather than 0 Michael Albaugh (5): IB/ipath: Prevent link-recovery code from negating admin disable IB/ipath: EEPROM support for 7220 devices, robustness improvements, cleanup IB/ipath: Allow old and new diagnostic packet formats IB/ipath: Isolate 7220-specific content IB/ipath: Support for SerDes portion of IBA7220 Ralph Campbell (18): IB/ipath: Fix byte order of pioavail in handle_errors() IB/ipath: Fix error recovery for send buffer status after chip freeze mode IB/ipath: Don't try to handle freeze mode HW errors if diagnostic mode IB/ipath: Make debug error message match the constraint that is checked for IB/ipath: Add code to support multiple link speeds and widths IB/ipath: Remove useless comments IB/ipath: Fix sanity checks on QP number of WRs and SGEs IB/ipath: Change the module author IB/ipath: Remove some useless (void) casts IB/ipath: Make send buffers available for kernel if not allocated to user IB/ipath: Use PIO buffer for RC ACKs IB/ipath: Fix some white space and code style issues IB/ipath: Add support for 7220 receive queue changes IB/ipath: Fix up error handling IB/ipath: Header file changes to support IBA7220 IB/ipath: HCA-specific code to support IBA7220 IB/ipath: Add IBA7220-specific SERDES initialization data IB/ipath: Update copyright dates for files changed in 2008 Robert P. J. Day (3): IB: Use shorter list_splice_init() for brevity RDMA/nes: Use more concise list_for_each_entry() IB/ipath: Fix time comparison to use time_after_eq() Roland Dreier (31): IB/mthca: Formatting cleanups IB/mlx4: Convert "if(foo)" to "if (foo)" mlx4_core: Move opening brace of function onto a new line RDMA/amso1100: Don't use 0UL as a NULL pointer RDMA/cxgb3: IDR IDs are signed IB: Make struct ib_uobject.id a signed int IB/ipath: Fix sparse warning about shadowed symbol IB/mlx4: Endianness annotations IB/cm: Endianness annotations RDMA/ucma: Endian annotation RDMA/nes: Trivial endianness annotations RDMA/nes: Delete unused variables RDMA/amso1100: Start of endianness annotation RDMA/amso1100: Endian annotate mqsq allocator mlx4_core: Fix confusion between mlx4_event and mlx4_dev_event enums IB/uverbs: Don't store struct file * for event files IB/uverbs: Use alloc_file() instead of get_empty_filp() RDMA/nes: Remove redundant NULL check in nes_unregister_ofa_device() RDMA/nes: Remove unused nes_netdev_exit() function RDMA/nes: Use proper format and cast to print dma_addr_t RDMA/nes: Make symbols used only in a single source file static IB/ehca: Make symbols used only in a single source file static IB/mthca: Avoid integer overflow when dealing with profile size IB/mthca: Avoid integer overflow when allocating huge ICM table IB/ipath: Fix PCI config write size used to clear linkctrl error bits RDMA/nes: Remove session_id from nes_cm stuff IB/mlx4: Micro-optimize mlx4_ib_post_send() IB/core: Add support for "send with invalidate" work requests RDMA/amso1100: Add support for "send with invalidate" work requests RDMA/nes: Free IRQ before killing tasklet IPoIB: Handle case when P_Key is deleted and re-added at same index Stefan Roscher (1): IB/ehca: Support all ibv_devinfo values in query_device() and query_port() Tom Tucker (1): RDMA/amso1100: Add check for NULL reply_msg in c2_intr() Vladimir Sokolovsky (1): IB/mlx4: Add support for resizing CQs drivers/infiniband/core/cm.c | 63 +- drivers/infiniband/core/cma.c | 2 +- drivers/infiniband/core/fmr_pool.c | 3 +- drivers/infiniband/core/ucma.c | 2 +- drivers/infiniband/core/uverbs.h | 4 +- drivers/infiniband/core/uverbs_cmd.c | 14 +- drivers/infiniband/core/uverbs_main.c | 28 +- drivers/infiniband/core/verbs.c | 14 +- drivers/infiniband/hw/amso1100/c2.c | 80 +- drivers/infiniband/hw/amso1100/c2.h | 16 +- drivers/infiniband/hw/amso1100/c2_ae.c | 10 +- drivers/infiniband/hw/amso1100/c2_alloc.c | 12 +- drivers/infiniband/hw/amso1100/c2_cq.c | 4 +- drivers/infiniband/hw/amso1100/c2_intr.c | 6 +- drivers/infiniband/hw/amso1100/c2_mm.c | 2 +- drivers/infiniband/hw/amso1100/c2_mq.c | 4 +- drivers/infiniband/hw/amso1100/c2_mq.h | 2 +- drivers/infiniband/hw/amso1100/c2_provider.c | 85 +- drivers/infiniband/hw/amso1100/c2_qp.c | 30 +- drivers/infiniband/hw/amso1100/c2_rnic.c | 31 +- drivers/infiniband/hw/amso1100/c2_vq.c | 2 +- drivers/infiniband/hw/amso1100/c2_wr.h | 212 +- drivers/infiniband/hw/cxgb3/cxio_dbg.c | 24 +- drivers/infiniband/hw/cxgb3/cxio_hal.c | 84 +- drivers/infiniband/hw/cxgb3/cxio_resource.c | 12 +- drivers/infiniband/hw/cxgb3/iwch.c | 6 +- drivers/infiniband/hw/cxgb3/iwch.h | 2 +- drivers/infiniband/hw/cxgb3/iwch_cm.c | 166 +- drivers/infiniband/hw/cxgb3/iwch_cm.h | 4 +- drivers/infiniband/hw/cxgb3/iwch_cq.c | 4 +- drivers/infiniband/hw/cxgb3/iwch_ev.c | 12 +- drivers/infiniband/hw/cxgb3/iwch_mem.c | 6 +- drivers/infiniband/hw/cxgb3/iwch_provider.c | 79 +- drivers/infiniband/hw/cxgb3/iwch_provider.h | 4 +- drivers/infiniband/hw/cxgb3/iwch_qp.c | 42 +- drivers/infiniband/hw/ehca/ehca_av.c | 31 - drivers/infiniband/hw/ehca/ehca_classes.h | 2 - drivers/infiniband/hw/ehca/ehca_cq.c | 19 - drivers/infiniband/hw/ehca/ehca_hca.c | 129 +- drivers/infiniband/hw/ehca/ehca_main.c | 19 +- drivers/infiniband/hw/ehca/ehca_mrmw.c | 42 +- drivers/infiniband/hw/ehca/ehca_pd.c | 11 - drivers/infiniband/hw/ehca/ehca_qp.c | 51 +- drivers/infiniband/hw/ehca/ehca_reqs.c | 2 +- drivers/infiniband/hw/ehca/ehca_tools.h | 16 +- drivers/infiniband/hw/ehca/ehca_uverbs.c | 19 - drivers/infiniband/hw/ipath/Makefile | 3 + drivers/infiniband/hw/ipath/ipath_7220.h | 57 + drivers/infiniband/hw/ipath/ipath_common.h | 54 +- drivers/infiniband/hw/ipath/ipath_debug.h | 2 + drivers/infiniband/hw/ipath/ipath_diag.c | 35 +- drivers/infiniband/hw/ipath/ipath_driver.c | 1041 +++++++--- drivers/infiniband/hw/ipath/ipath_eeprom.c | 428 ++++- drivers/infiniband/hw/ipath/ipath_file_ops.c | 176 ++- drivers/infiniband/hw/ipath/ipath_iba6110.c | 51 +- drivers/infiniband/hw/ipath/ipath_iba6120.c | 203 ++- drivers/infiniband/hw/ipath/ipath_iba7220.c | 2571 ++++++++++++++++++++++++ drivers/infiniband/hw/ipath/ipath_init_chip.c | 312 ++-- drivers/infiniband/hw/ipath/ipath_intr.c | 656 ++++--- drivers/infiniband/hw/ipath/ipath_kernel.h | 304 +++- drivers/infiniband/hw/ipath/ipath_mad.c | 110 +- drivers/infiniband/hw/ipath/ipath_qp.c | 59 +- drivers/infiniband/hw/ipath/ipath_rc.c | 67 +- drivers/infiniband/hw/ipath/ipath_registers.h | 168 +- drivers/infiniband/hw/ipath/ipath_ruc.c | 22 +- drivers/infiniband/hw/ipath/ipath_sd7220.c | 1462 ++++++++++++++ drivers/infiniband/hw/ipath/ipath_sd7220_img.c | 1082 ++++++++++ drivers/infiniband/hw/ipath/ipath_sdma.c | 790 ++++++++ drivers/infiniband/hw/ipath/ipath_srq.c | 5 +- drivers/infiniband/hw/ipath/ipath_stats.c | 33 +- drivers/infiniband/hw/ipath/ipath_sysfs.c | 104 +- drivers/infiniband/hw/ipath/ipath_uc.c | 8 +- drivers/infiniband/hw/ipath/ipath_ud.c | 7 +- drivers/infiniband/hw/ipath/ipath_user_sdma.c | 879 ++++++++ drivers/infiniband/hw/ipath/ipath_user_sdma.h | 54 + drivers/infiniband/hw/ipath/ipath_verbs.c | 413 ++++- drivers/infiniband/hw/ipath/ipath_verbs.h | 32 +- drivers/infiniband/hw/mlx4/cq.c | 319 +++- drivers/infiniband/hw/mlx4/mad.c | 2 +- drivers/infiniband/hw/mlx4/main.c | 25 +- drivers/infiniband/hw/mlx4/mlx4_ib.h | 15 + drivers/infiniband/hw/mlx4/qp.c | 117 +- drivers/infiniband/hw/mthca/mthca_cmd.c | 6 +- drivers/infiniband/hw/mthca/mthca_cmd.h | 1 + drivers/infiniband/hw/mthca/mthca_cq.c | 14 +- drivers/infiniband/hw/mthca/mthca_dev.h | 14 +- drivers/infiniband/hw/mthca/mthca_eq.c | 4 +- drivers/infiniband/hw/mthca/mthca_mad.c | 2 +- drivers/infiniband/hw/mthca/mthca_main.c | 15 +- drivers/infiniband/hw/mthca/mthca_memfree.c | 6 +- drivers/infiniband/hw/mthca/mthca_profile.c | 4 +- drivers/infiniband/hw/mthca/mthca_profile.h | 2 +- drivers/infiniband/hw/mthca/mthca_provider.c | 5 +- drivers/infiniband/hw/mthca/mthca_qp.c | 28 +- drivers/infiniband/hw/mthca/mthca_wqe.h | 16 +- drivers/infiniband/hw/nes/nes.c | 15 +- drivers/infiniband/hw/nes/nes.h | 32 +- drivers/infiniband/hw/nes/nes_cm.c | 131 +- drivers/infiniband/hw/nes/nes_cm.h | 35 - drivers/infiniband/hw/nes/nes_hw.c | 49 +- drivers/infiniband/hw/nes/nes_nic.c | 26 +- drivers/infiniband/hw/nes/nes_utils.c | 2 +- drivers/infiniband/hw/nes/nes_verbs.c | 29 +- drivers/infiniband/ulp/ipoib/Makefile | 3 +- drivers/infiniband/ulp/ipoib/ipoib.h | 10 + drivers/infiniband/ulp/ipoib/ipoib_cm.c | 15 +- drivers/infiniband/ulp/ipoib/ipoib_ethtool.c | 99 + drivers/infiniband/ulp/ipoib/ipoib_ib.c | 126 +- drivers/infiniband/ulp/ipoib/ipoib_main.c | 33 +- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 3 + drivers/infiniband/ulp/iser/iser_initiator.c | 4 +- drivers/infiniband/ulp/iser/iser_verbs.c | 5 +- drivers/infiniband/ulp/srp/ib_srp.c | 7 +- drivers/net/mlx4/catas.c | 2 +- drivers/net/mlx4/cmd.c | 3 +- drivers/net/mlx4/cq.c | 72 +- drivers/net/mlx4/eq.c | 5 +- drivers/net/mlx4/fw.c | 13 + drivers/net/mlx4/fw.h | 1 + drivers/net/mlx4/intf.c | 8 +- drivers/net/mlx4/main.c | 6 +- drivers/net/mlx4/mcg.c | 12 +- drivers/net/mlx4/mlx4.h | 4 +- include/linux/mlx4/cmd.h | 2 +- include/linux/mlx4/cq.h | 19 +- include/linux/mlx4/device.h | 1 + include/linux/mlx4/driver.h | 3 +- include/linux/mlx4/qp.h | 15 +- include/rdma/ib_user_verbs.h | 5 +- include/rdma/ib_verbs.h | 35 +- net/sunrpc/xprtrdma/verbs.c | 1 - 131 files changed, 11739 insertions(+), 2287 deletions(-) create mode 100644 drivers/infiniband/hw/ipath/ipath_7220.h create mode 100644 drivers/infiniband/hw/ipath/ipath_iba7220.c create mode 100644 drivers/infiniband/hw/ipath/ipath_sd7220.c create mode 100644 drivers/infiniband/hw/ipath/ipath_sd7220_img.c create mode 100644 drivers/infiniband/hw/ipath/ipath_sdma.c create mode 100644 drivers/infiniband/hw/ipath/ipath_user_sdma.c create mode 100644 drivers/infiniband/hw/ipath/ipath_user_sdma.h create mode 100644 drivers/infiniband/ulp/ipoib/ipoib_ethtool.c From andrea at qumranet.com Thu Apr 17 08:51:57 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Thu, 17 Apr 2008 17:51:57 +0200 Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen In-Reply-To: References: <20080416163337.GJ22493@sgi.com> Message-ID: <20080417155157.GC17187@duo.random> On Wed, Apr 16, 2008 at 11:35:38AM -0700, Christoph Lameter wrote: > On Wed, 16 Apr 2008, Robin Holt wrote: > > > I don't think this lock mechanism is completely working. I have > > gotten a few failures trying to dereference 0x100100 which appears to > > be LIST_POISON1. > > How does xpmem unregistering of notifiers work? Especially are you using mmu_notifier_unregister? From sean.hefty at intel.com Thu Apr 17 08:58:36 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 17 Apr 2008 08:58:36 -0700 Subject: [ofa-general] Pending libibverbs patches? In-Reply-To: References: <48045BF3.8040305@voltaire.com> <4805AB90.6060702@voltaire.com><000201c8a00e$00f32460$7de0180a@amr.corp.intel.com> Message-ID: <000001c8a0a3$e5a7c530$9c98070a@amr.corp.intel.com> >How about "iWARP ethernet NICs support RDMA over hardware-offloaded TCP"? This is more descriptive and fine with me. From sean.hefty at intel.com Thu Apr 17 09:14:15 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 17 Apr 2008 09:14:15 -0700 Subject: [ofa-general] RE: CM goes to timewait state without waiting for disconnect reply In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903D36D79@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C903D36D79@mtlexch01.mtl.com> Message-ID: <000101c8a0a6$153306a0$9c98070a@amr.corp.intel.com> > In the spec, a normal flow to close a connection is > at the client side: State "Established" ---- send DREQ ---> > State "DREQ sent" --- receive DREP ---> State "TimeWait"  ---> > State "Idle" Yes - the CM kernel code follows this state machine.  > According to the code and tests I did, it seems that ib_cm doesn't > wait for DREP and goes directly from "DREQ sent" into "TimeWait". This can happen in specific situations, such as errors, if the user destroys the cm_id without waiting for the DREP (treated as a DREQ timeout), or if both sides initiate a DREQ. > I think that this is a bug, am I right? I don't see that the code follows the behavior that you're describing. In ib_send_cm_dreq(), the cm_id state changes to DREQ_SENT. In cm_drep_handler() (called when a DREP is received), the cm_id state is verified to be DREQ_SENT, then transitioned to TIMEWAIT. If you can describe the test details more, I can try to find the most likely code path that's being hit. It's possible that you're hitting one of the situations mentioned above. - Sean From holt at sgi.com Thu Apr 17 09:36:42 2008 From: holt at sgi.com (Robin Holt) Date: Thu, 17 Apr 2008 11:36:42 -0500 Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen In-Reply-To: <20080417155157.GC17187@duo.random> References: <20080416163337.GJ22493@sgi.com> <20080417155157.GC17187@duo.random> Message-ID: <20080417163642.GE11364@sgi.com> On Thu, Apr 17, 2008 at 05:51:57PM +0200, Andrea Arcangeli wrote: > On Wed, Apr 16, 2008 at 11:35:38AM -0700, Christoph Lameter wrote: > > On Wed, 16 Apr 2008, Robin Holt wrote: > > > > > I don't think this lock mechanism is completely working. I have > > > gotten a few failures trying to dereference 0x100100 which appears to > > > be LIST_POISON1. > > > > How does xpmem unregistering of notifiers work? > > Especially are you using mmu_notifier_unregister? In this case, we are not making the call to unregister, we are waiting for the _release callout which has already removed it from the list. In the event that the user has removed all the grants, we use unregister. That typically does not occur. We merely wait for exit processing to clean up the structures. Thanks, Robin From andrea at qumranet.com Thu Apr 17 10:14:43 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Thu, 17 Apr 2008 19:14:43 +0200 Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen In-Reply-To: <20080417163642.GE11364@sgi.com> References: <20080416163337.GJ22493@sgi.com> <20080417155157.GC17187@duo.random> <20080417163642.GE11364@sgi.com> Message-ID: <20080417171443.GM17187@duo.random> On Thu, Apr 17, 2008 at 11:36:42AM -0500, Robin Holt wrote: > In this case, we are not making the call to unregister, we are waiting > for the _release callout which has already removed it from the list. > > In the event that the user has removed all the grants, we use unregister. > That typically does not occur. We merely wait for exit processing to > clean up the structures. Then it's very strange. LIST_POISON1 is set in n->next. If it was a second hlist_del triggering the bug in theory list_poison2 should trigger first, so perhaps it's really a notifier running despite a mm_lock is taken? Could you post a full stack trace so I can see who's running into LIST_POISON1? If it's really a notifier running outside of some mm_lock that will be _immediately_ visible from the stack trace that triggered the LIST_POISON1! Also note, EMM isn't using the clean hlist_del, it's implementing list by hand (with zero runtime gain) so all the debugging may not be existent in EMM, so if it's really a mm_lock race, and it only triggers with mmu notifiers and not with EMM, it doesn't necessarily mean EMM is bug free. If you've a full stack trace it would greatly help to verify what is mangling over the list when the oops triggers. Thanks! Andrea From holt at sgi.com Thu Apr 17 10:25:56 2008 From: holt at sgi.com (Robin Holt) Date: Thu, 17 Apr 2008 12:25:56 -0500 Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen In-Reply-To: <20080417171443.GM17187@duo.random> References: <20080416163337.GJ22493@sgi.com> <20080417155157.GC17187@duo.random> <20080417163642.GE11364@sgi.com> <20080417171443.GM17187@duo.random> Message-ID: <20080417172556.GF11364@sgi.com> On Thu, Apr 17, 2008 at 07:14:43PM +0200, Andrea Arcangeli wrote: > On Thu, Apr 17, 2008 at 11:36:42AM -0500, Robin Holt wrote: > > In this case, we are not making the call to unregister, we are waiting > > for the _release callout which has already removed it from the list. > > > > In the event that the user has removed all the grants, we use unregister. > > That typically does not occur. We merely wait for exit processing to > > clean up the structures. > > Then it's very strange. LIST_POISON1 is set in n->next. If it was a > second hlist_del triggering the bug in theory list_poison2 should > trigger first, so perhaps it's really a notifier running despite a > mm_lock is taken? Could you post a full stack trace so I can see who's > running into LIST_POISON1? If it's really a notifier running outside > of some mm_lock that will be _immediately_ visible from the stack > trace that triggered the LIST_POISON1! > > Also note, EMM isn't using the clean hlist_del, it's implementing list > by hand (with zero runtime gain) so all the debugging may not be > existent in EMM, so if it's really a mm_lock race, and it only > triggers with mmu notifiers and not with EMM, it doesn't necessarily > mean EMM is bug free. If you've a full stack trace it would greatly > help to verify what is mangling over the list when the oops triggers. The stack trace is below. I did not do this level of testing on emm so I can not compare the two in this area. This is for a different, but equivalent failure. I just reproduce the LIST_POISON1 failure without trying to reproduce the exact same failure as I had documented earlier (lost that stack trace, sorry). Thanks, Robin <1>Unable to handle kernel paging request at virtual address 0000000000100100 <4>mpi006.f.x[23403]: Oops 11012296146944 [1] <4>Modules linked in: nfs lockd sunrpc binfmt_misc thermal processor fan button loop md_mod dm_mod xpmem xp mspec sg <4> <4>Pid: 23403, CPU 114, comm: mpi006.f.x <4>psr : 0000121008526010 ifs : 800000000000038b ip : [] Not tainted (2.6.25-rc8) <4>ip is at __mmu_notifier_invalidate_range_start+0x81/0x120 <4>unat: 0000000000000000 pfs : 000000000000038b rsc : 0000000000000003 <4>rnat: a000000100149a00 bsps: a000000000010740 pr : 66555666a9599aa9 <4>ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f <4>csd : 0000000000000000 ssd : 0000000000000000 <4>b0 : a00000010015d670 b6 : a0000002101ddb40 b7 : a00000010000eb50 <4>f6 : 1003e2222222222222222 f7 : 000000000000000000000 <4>f8 : 000000000000000000000 f9 : 000000000000000000000 <4>f10 : 000000000000000000000 f11 : 000000000000000000000 <4>r1 : a000000100ef1190 r2 : e0000e6080cc1940 r3 : a0000002101edd10 <4>r8 : e0000e6080cc1970 r9 : 0000000000000000 r10 : e0000e6080cc19c8 <4>r11 : 20000003a6480000 r12 : e0000c60d31efb90 r13 : e0000c60d31e0000 <4>r14 : 000000000000004d r15 : e0000e6080cc1914 r16 : e0000e6080cc1970 <4>r17 : 20000003a6480000 r18 : 20000007bf900000 r19 : 0000000000040000 <4>r20 : e0000c60d31e0000 r21 : 0000000000000010 r22 : e0000e6080cc19a8 <4>r23 : e0000c60c55f1120 r24 : e0000c60d31efda0 r25 : e0000c60d31efd98 <4>r26 : e0000e60812166d0 r27 : e0000c60d31efdc0 r28 : e0000c60d31efdb8 <4>r29 : e0000c60d31e0b60 r30 : 0000000000000000 r31 : 0000000000000081 <4> <4>Call Trace: <4> [] show_stack+0x40/0xa0 <4> sp=e0000c60d31ef760 bsp=e0000c60d31e11f0 <4> [] show_regs+0x850/0x8a0 <4> sp=e0000c60d31ef930 bsp=e0000c60d31e1198 <4> [] die+0x1b0/0x2e0 <4> sp=e0000c60d31ef930 bsp=e0000c60d31e1150 <4> [] ia64_do_page_fault+0x8d0/0xa40 <4> sp=e0000c60d31ef930 bsp=e0000c60d31e1100 <4> [] ia64_leave_kernel+0x0/0x270 <4> sp=e0000c60d31ef9c0 bsp=e0000c60d31e1100 <4> [] __mmu_notifier_invalidate_range_start+0x80/0x120 <4> sp=e0000c60d31efb90 bsp=e0000c60d31e10a8 <4> [] unmap_vmas+0x70/0x14c0 <4> sp=e0000c60d31efb90 bsp=e0000c60d31e0fa8 <4> [] zap_page_range+0x40/0x60 <4> sp=e0000c60d31efda0 bsp=e0000c60d31e0f70 <4> [] xpmem_clear_PTEs+0x350/0x560 [xpmem] <4> sp=e0000c60d31efdb0 bsp=e0000c60d31e0ef0 <4> [] xpmem_remove_seg+0x3f0/0x700 [xpmem] <4> sp=e0000c60d31efde0 bsp=e0000c60d31e0ea8 <4> [] xpmem_remove_segs_of_tg+0x80/0x140 [xpmem] <4> sp=e0000c60d31efe10 bsp=e0000c60d31e0e78 <4> [] xpmem_mmu_notifier_release+0x40/0x80 [xpmem] <4> sp=e0000c60d31efe10 bsp=e0000c60d31e0e58 <4> [] __mmu_notifier_release+0xb0/0x100 <4> sp=e0000c60d31efe10 bsp=e0000c60d31e0e38 <4> [] exit_mmap+0x50/0x180 <4> sp=e0000c60d31efe10 bsp=e0000c60d31e0e10 <4> [] mmput+0x70/0x180 <4> sp=e0000c60d31efe20 bsp=e0000c60d31e0dd8 <4> [] exit_mm+0x1f0/0x220 <4> sp=e0000c60d31efe20 bsp=e0000c60d31e0da0 <4> [] do_exit+0x4e0/0xf40 <4> sp=e0000c60d31efe20 bsp=e0000c60d31e0d58 <4> [] do_group_exit+0x180/0x1c0 <4> sp=e0000c60d31efe30 bsp=e0000c60d31e0d20 <4> [] sys_exit_group+0x20/0x40 <4> sp=e0000c60d31efe30 bsp=e0000c60d31e0cc8 <4> [] ia64_ret_from_syscall+0x0/0x20 <4> sp=e0000c60d31efe30 bsp=e0000c60d31e0cc8 <4> [] __kernel_syscall_via_break+0x0/0x20 <4> sp=e0000c60d31f0000 bsp=e0000c60d31e0cc8 From terrywatson at live.com Thu Apr 17 11:21:52 2008 From: terrywatson at live.com (terry watson) Date: Thu, 17 Apr 2008 18:21:52 +0000 Subject: [ofa-general] Is IBIS only for querying OpenSM? Message-ID: Hi all, I will be performing some testing of partitioning used as a security control. Am I right in believing that IBIS will be able to set partition table values of the local compute node I am logged on to, even though they are not using OpenSM, but rather a SM on a switch? Could I then attempt to access a partition that I was originally excluded from accessing? I am new to Infiniband technology and would also appreciate a response from an expert who has views on the strength of the security that partitioning provides in separating two clusters that should have no interaction whatsoever. Thanks, Dave _________________________________________________________________ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE From amirv at mellanox.co.il Thu Apr 17 11:25:56 2008 From: amirv at mellanox.co.il (Amir Vadai) Date: Thu, 17 Apr 2008 21:25:56 +0300 Subject: [ofa-general] RE: CM goes to timewait state without waiting for disconnect reply In-Reply-To: <000101c8a0a6$153306a0$9c98070a@amr.corp.intel.com> Message-ID: <6C2C79E72C305246B504CBA17B5500C903D36E8D@mtlexch01.mtl.com> When the client closes the connection it calls ib_destroy_cm_id() who calls cm_destroy_id(). In my scenario it happen when the CM is in state "Established". In this state ib_send_cm_dreq() is called. This function sends a DREQ and change state to "DREQ sent". After that the function returns and the switch is tried again this time we're in state "DREQ sent". There the state is changed into "TimeWait". It means that when calling ib_destroy_cm_id() - the CM sends a DREQ and goes immediately to state "TimeWait" without waiting for DREP. It looks like it is the most usual situation and not a special one. I'm looking at the code from the head of ofed git in openfabrics. - Amir -----Original Message----- From: Sean Hefty [mailto:sean.hefty at intel.com] Sent: ה 17 אפריל 2008 19:14 To: Amir Vadai Cc: general at lists.openfabrics.org Subject: RE: CM goes to timewait state without waiting for disconnect reply > In the spec, a normal flow to close a connection is at the client > side: State "Established" ---- send DREQ ---> State "DREQ sent" --- > receive DREP ---> State "TimeWait"  ---> State "Idle" Yes - the CM kernel code follows this state machine.  > According to the code and tests I did, it seems that ib_cm doesn't > wait for DREP and goes directly from "DREQ sent" into "TimeWait". This can happen in specific situations, such as errors, if the user destroys the cm_id without waiting for the DREP (treated as a DREQ timeout), or if both sides initiate a DREQ. > I think that this is a bug, am I right? I don't see that the code follows the behavior that you're describing. In ib_send_cm_dreq(), the cm_id state changes to DREQ_SENT. In cm_drep_handler() (called when a DREP is received), the cm_id state is verified to be DREQ_SENT, then transitioned to TIMEWAIT. If you can describe the test details more, I can try to find the most likely code path that's being hit. It's possible that you're hitting one of the situations mentioned above. - Sean From sean.hefty at intel.com Thu Apr 17 11:34:16 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 17 Apr 2008 11:34:16 -0700 Subject: [ofa-general] RE: CM goes to timewait state without waiting for disconnect reply In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903D36E8D@mtlexch01.mtl.com> References: <000101c8a0a6$153306a0$9c98070a@amr.corp.intel.com> <6C2C79E72C305246B504CBA17B5500C903D36E8D@mtlexch01.mtl.com> Message-ID: <000201c8a0b9$a4638a80$9c98070a@amr.corp.intel.com> >When the client closes the connection it calls ib_destroy_cm_id() who calls >cm_destroy_id(). >In my scenario it happen when the CM is in state "Established". In this state >ib_send_cm_dreq() is called. >This function sends a DREQ and change state to "DREQ sent". >After that the function returns and the switch is tried again this time we're >in state "DREQ sent". >There the state is changed into "TimeWait". Yes - this will result in transitioning into timewait immediately after sending the DREQ. By destroying the cm_id, the user has indicated that they do not want to wait for a DREP, nor do they care about when timewait has exited. If a DREQ is received while the cm_id is in timewait, it will generate a DREP in response. DREP messages while in timewait are simply dropped. What exactly is the problem that you're seeing? - Sean From amirv at mellanox.co.il Thu Apr 17 11:41:03 2008 From: amirv at mellanox.co.il (Amir Vadai) Date: Thu, 17 Apr 2008 21:41:03 +0300 Subject: [ofa-general] RE: CM goes to timewait state without waiting for disconnect reply In-Reply-To: <000201c8a0b9$a4638a80$9c98070a@amr.corp.intel.com> Message-ID: <6C2C79E72C305246B504CBA17B5500C903D36E90@mtlexch01.mtl.com> There are some problems that I hope related to that. But the one I know for sure is: I got a very busy SDP server with lots of connections coming up and down. And a client with many threads that open and close connections. What I see is that a connection request is coming from the client to the server And the server reply with reject - the reason for the reject is that a timewait structure already exists for this QPN. And that's because the client thinks that a connection is closed and reuse the QPN but the server didn't finish cleaning up the connection. In the bottom line - I get a reject on SDP socket open. - Amir -----Original Message----- From: Sean Hefty [mailto:sean.hefty at intel.com] Sent: ה 17 אפריל 2008 21:34 To: Amir Vadai Cc: general at lists.openfabrics.org; Oren Duer Subject: RE: CM goes to timewait state without waiting for disconnect reply >When the client closes the connection it calls ib_destroy_cm_id() who >calls cm_destroy_id(). >In my scenario it happen when the CM is in state "Established". In this >state >ib_send_cm_dreq() is called. >This function sends a DREQ and change state to "DREQ sent". >After that the function returns and the switch is tried again this time >we're in state "DREQ sent". >There the state is changed into "TimeWait". Yes - this will result in transitioning into timewait immediately after sending the DREQ. By destroying the cm_id, the user has indicated that they do not want to wait for a DREP, nor do they care about when timewait has exited. If a DREQ is received while the cm_id is in timewait, it will generate a DREP in response. DREP messages while in timewait are simply dropped. What exactly is the problem that you're seeing? - Sean From sean.hefty at intel.com Thu Apr 17 11:51:36 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 17 Apr 2008 11:51:36 -0700 Subject: [ofa-general] RE: CM goes to timewait state without waiting for disconnect reply In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903D36E90@mtlexch01.mtl.com> References: <000201c8a0b9$a4638a80$9c98070a@amr.corp.intel.com> <6C2C79E72C305246B504CBA17B5500C903D36E90@mtlexch01.mtl.com> Message-ID: <000301c8a0bc$1058a5c0$9c98070a@amr.corp.intel.com> >What I see is that a connection request is coming from the client to the server >And the server reply with reject - the reason for the reject is that a timewait >structure >already exists for this QPN. And that's because the client thinks that a >connection is closed and reuse the QPN but the server didn't finish cleaning up >the connection. This is an unavoidable situation. There's no coordination between the timewait states on different systems, so it's always possible for one to re-connect before the other system has exited timewait. However, in your case, the problem is that the client is trying to re-use the QPN outside of knowing when it has exited the local timewait state. Instead, have the client issue a DREQ, and then wait for the timewait state to exit before trying to re-use the QPN. This would then be the sequence: client server sends DREQ enters timewait sends DREP enters timewait exits timewait destroy cm_id new connection Your hope at this point is that the server exits timewait before the client will, while, likely, is not guaranteed. - Sean From clameter at sgi.com Thu Apr 17 12:10:52 2008 From: clameter at sgi.com (Christoph Lameter) Date: Thu, 17 Apr 2008 12:10:52 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen In-Reply-To: <20080417171443.GM17187@duo.random> References: <20080416163337.GJ22493@sgi.com> <20080417155157.GC17187@duo.random> <20080417163642.GE11364@sgi.com> <20080417171443.GM17187@duo.random> Message-ID: On Thu, 17 Apr 2008, Andrea Arcangeli wrote: > Also note, EMM isn't using the clean hlist_del, it's implementing list > by hand (with zero runtime gain) so all the debugging may not be > existent in EMM, so if it's really a mm_lock race, and it only > triggers with mmu notifiers and not with EMM, it doesn't necessarily > mean EMM is bug free. If you've a full stack trace it would greatly > help to verify what is mangling over the list when the oops triggers. EMM was/is using a single linked list which allows atomic updates. Looked cleaner to me since doubly linked list must update two pointers. I have not seen docs on the locking so not sure why you use rcu operations here? Isnt the requirement to have either rmap locks or mmap_sem held enough to guarantee the consistency of the doubly linked list? From amirv at mellanox.co.il Thu Apr 17 12:49:41 2008 From: amirv at mellanox.co.il (Amir Vadai) Date: Thu, 17 Apr 2008 22:49:41 +0300 Subject: [ofa-general] RE: CM goes to timewait state without waiting for disconnect reply In-Reply-To: <000301c8a0bc$1058a5c0$9c98070a@amr.corp.intel.com> Message-ID: <6C2C79E72C305246B504CBA17B5500C903D36EA3@mtlexch01.mtl.com> I understand - I'll make sure the flow you described will be used. Thanks a lot, - Amir. -----Original Message----- From: Sean Hefty [mailto:sean.hefty at intel.com] Sent: ה 17 אפריל 2008 21:52 To: Amir Vadai Cc: general at lists.openfabrics.org; Oren Duer Subject: RE: CM goes to timewait state without waiting for disconnect reply >What I see is that a connection request is coming from the client to >the server And the server reply with reject - the reason for the reject >is that a timewait structure already exists for this QPN. And that's >because the client thinks that a connection is closed and reuse the QPN >but the server didn't finish cleaning up the connection. This is an unavoidable situation. There's no coordination between the timewait states on different systems, so it's always possible for one to re-connect before the other system has exited timewait. However, in your case, the problem is that the client is trying to re-use the QPN outside of knowing when it has exited the local timewait state. Instead, have the client issue a DREQ, and then wait for the timewait state to exit before trying to re-use the QPN. This would then be the sequence: client server sends DREQ enters timewait sends DREP enters timewait exits timewait destroy cm_id new connection Your hope at this point is that the server exits timewait before the client will, while, likely, is not guaranteed. - Sean From andrea at qumranet.com Thu Apr 17 15:16:55 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Fri, 18 Apr 2008 00:16:55 +0200 Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen In-Reply-To: References: <20080416163337.GJ22493@sgi.com> <20080417155157.GC17187@duo.random> <20080417163642.GE11364@sgi.com> <20080417171443.GM17187@duo.random> Message-ID: <20080417221655.GA9287@duo.random> On Thu, Apr 17, 2008 at 12:10:52PM -0700, Christoph Lameter wrote: > EMM was/is using a single linked list which allows atomic updates. Looked > cleaner to me since doubly linked list must update two pointers. Cleaner would be if it would provide an abstraction in list.h. The important is the memory taken by the head for this usage. > I have not seen docs on the locking so not sure why you use rcu > operations here? Isnt the requirement to have either rmap locks or > mmap_sem held enough to guarantee the consistency of the doubly linked list? Yes, exactly, I'm not using rcu anymore. From terence.whitfield at qenos.com Thu Apr 17 17:55:26 2008 From: terence.whitfield at qenos.com (Julio Trevino) Date: Thu, 17 Apr 2008 19:55:26 -0500 Subject: [ofa-general] Re: Re: Hello think once, do all night Message-ID: <01c8a0c4$faee1400$39260dbe@terence.whitfield> Be the best partner ever. With our tabl.ts you could take your girlfriend to the love heaven! Loot file in attach - get online store link and information! -------------- next part -------------- A non-text attachment was scrubbed... Name: ffile.zip Type: application/zip Size: 329 bytes Desc: not available URL: From yghplm at blueisle.com Thu Apr 17 18:06:46 2008 From: yghplm at blueisle.com (Thad Wilkerson) Date: Thu, 17 Apr 2008 20:06:46 -0500 Subject: [ofa-general] Hello Best shoes ever! Message-ID: <01c8a0c6$9000cf00$01382bbe@yghplm> Best footwear of all times! Boss, Christian Dior and Paul Smith! Stay tuned! VIEW ATTACHED FILE FOR DETAILS!!!! -------------- next part -------------- A non-text attachment was scrubbed... Name: details.zip Type: application/zip Size: 433 bytes Desc: not available URL: From nemamnn at sorry.cz Thu Apr 17 20:12:11 2008 From: nemamnn at sorry.cz (feliks hao) Date: Fri, 18 Apr 2008 03:12:11 +0000 Subject: [ofa-general] we caught you naked general! check the video Message-ID: <000801c8a110$0154fc7f$efbc82ad@sxdnnh> Watch it :) PQhlfezcue -------------- next part -------------- An HTML attachment was scrubbed... URL: From harsha at zresearch.com Fri Apr 18 00:34:35 2008 From: harsha at zresearch.com (Harshavardhana) Date: Fri, 18 Apr 2008 13:04:35 +0530 (IST) Subject: [ofa-general] libibverbs-1.1: issue RLIMIT_MEMLOCK Message-ID: <12302.220.227.64.166.1208504075.squirrel@zresearch.com> Hi Openfabrics, A question or bug probably, after upgrading to new OFED with libibverbs-1.1 release. I experienced problems with running Fluent CFD application with HP-MPI 2.2.5.1 as i saw the libibverbs initialization failed due to library throws an error saying maximum pinnable memory i.e memlock insufficient. "ibv_create_qp failed" "Unable to Initialize RDMA device". I didn't have this problem in earlier versions. I fixed this by changing the hardlimit/softlimit to more than 32k which was default on my system. But i am thinking why does the RDMA initialization fails if it's 32k which didn't happen for libibverbs version 1.0.4. It should throw a warning according to the check_memlock function in the libibverbs source directory. But that's not happening in turn the ibv_create_qp is failing, is it not better to set the rlimit by the library itself using setrlimit(). This looks to be a change with the ConnectX IB 4th Gen Infiniband hardware in place that libibverbs is requesting memlock to be more than 32k. Regards & Thanks -- Harshavardhana "Software gets slower faster as Hardware gets faster" From philippe.gregoire at cea.fr Fri Apr 18 00:35:42 2008 From: philippe.gregoire at cea.fr (Philippe Gregoire) Date: Fri, 18 Apr 2008 09:35:42 +0200 Subject: [ofa-general] Is IBIS only for querying OpenSM? In-Reply-To: References: Message-ID: <48084F4E.3020705@cea.fr> terry watson a e'crit : > Hi all, > > I will be performing some testing of partitioning used as a security control. Am I right in believing that IBIS will be able to set partition table values of the local compute node I am logged on to, even though they are not using OpenSM, but rather a SM on a switch? Could I then attempt to access a partition that I was originally excluded from accessing? > > I am new to Infiniband technology and would also appreciate a response from an expert who has views on the strength of the security that partitioning provides in separating two clusters that should have no interaction whatsoever. > > Thanks, > Dave > _________________________________________________________________ > Discover the new Windows Vista > http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE_______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > The partitions are only managed by the subnet manager - either opensm running on a node into the fabric or an embedded subnet manager on a switch. For opensm , partitions are defined into a configuration file /etc/opensm/partitions.conf, for a embedded subnet manager, you have to configure the partitions using the CLI or GUI provided by the switch. Defining a partition is mainly choosing a pkey and ports nodes with their membership (limited or not). The subnet manager assigned the pkeys to the ports of the node when ib kernel modules are loaded. You can see the partitions the IB port belong to by ( I mean those defined by the subnet manager) : # grep -v 0x0000 /sys/class/infiniband/mthca0/ports/1/pkeys/* /sys/class/infiniband/mthca0/ports/1/pkeys/0:0xffff /sys/class/infiniband/mthca0/ports/1/pkeys/1:0x8001 /sys/class/infiniband/mthca0/ports/1/pkeys/2:0x8002 /sys/class/infiniband/mthca0/ports/1/pkeys/3:0x8003 /sys/class/infiniband/mthca0/ports/1/pkeys/4:0x8010 A port may belong to many partitions. Nodes (ports) may have different partitions configurations. Partitions order for a port is not always the same ( it may depend on the chronology of partition declarations in the subnet manager) Over these partitions, you can define new IP (IP over IB) interfaces by creating files like /etc/sysconfig/network-scripts/ifcfg-ib0.8002 : # cat /etc/sysconfig/network-scripts/ifcfg-ib0.8002 DEVICE=ib0.8002 BOOTPROTO=static IPADDR=XXX.YYY.ZZZ.TTT NETMASK=255.255.255.0 NETWORK=255.255.255.0 ONBOOT=yes The openibd script create the child interface and configure it at system startup using some special devices to do that : echo $pkey > /sys/class/net/ib0/create_child But this command creates only a child interface on the node, but communications on this interface will not work until you add the port node to the corresponding partition into the subnet manager configuration. Then you will see the pkey appearing automatically into files /sys/class/infiniband/mthca0/ports/1/pkeys/* on the node. [root at cors118 ~]# echo 0x8009 > /sys/class/net/ib0/create_child [root at cors118 ~]# dmesg | grep 8009 divert: not allocating divert_blk for non-ethernet device ib0.8009 [root at cors118 ~]# grep -v 0x0000 /sys/class/infiniband/mthca0/ports/1/pkeys/* /sys/class/infiniband/mthca0/ports/1/pkeys/0:0xffff /sys/class/infiniband/mthca0/ports/1/pkeys/1:0x8001 /sys/class/infiniband/mthca0/ports/1/pkeys/2:0x8002 /sys/class/infiniband/mthca0/ports/1/pkeys/3:0x8003 /sys/class/infiniband/mthca0/ports/1/pkeys/4:0x8010 [root at cors118 ~]# ifconfig -a | grep 8009 ib0.8009 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 [root at cors118 ~]# echo 0x8009 > /sys/class/net/ib0/delete_child [root at cors118 ~]# dmesg | grep 8009 divert: not allocating divert_blk for non-ethernet device ib0.8009 divert: no divert_blk to free, ib0.8009 not ethernet To use MPI with partitions, you have also to configure it (in the configuration file) . For MVAPICH you must use VIADEV_DEFAULT_PKEY_IX or VIADEV_DEFAULT_PKEY in the config file : /usr/mpi/gcc/mvapich-1.0.0/etc/mvapich.conf . AT CEA, I'm using VIADEV_DEFAULT_PKEY (pkey value) as we have nodes with different partitions configurations. Hoping this will help you. Regards Philippe Gregoire CEA/DAM -------------- next part -------------- An HTML attachment was scrubbed... URL: From yevgenyp at mellanox.co.il Fri Apr 18 05:16:39 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Fri, 18 Apr 2008 15:16:39 +0300 Subject: [ofa-general][PATCH] mlx4: Moving db management to mlx4_core (MP support, Patch 1) Message-ID: <48089127.2040905@mellanox.co.il> >From ca3eb5aef54025b11c1f0b4d0abe9eef8e349048 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Thu, 17 Apr 2008 15:38:17 +0300 Subject: [PATCH] mlx4: Moving db management to mlx4_core mlx4_ib is no longer the only customer of mlx4_core. Thus the doorbell allocation was moved to the low level driver (same as buffer allocation). Signed-off-by: Yevgeny Petrilin --- drivers/infiniband/hw/mlx4/cq.c | 6 +- drivers/infiniband/hw/mlx4/doorbell.c | 131 +--------------------------- drivers/infiniband/hw/mlx4/mlx4_ib.h | 30 +----- drivers/infiniband/hw/mlx4/qp.c | 7 +- drivers/infiniband/hw/mlx4/srq.c | 6 +- drivers/net/mlx4/alloc.c | 157 +++++++++++++++++++++++++++++++++ drivers/net/mlx4/main.c | 3 + drivers/net/mlx4/mlx4.h | 3 + include/linux/mlx4/device.h | 50 +++++++++++ 9 files changed, 231 insertions(+), 162 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 3557e7e..3e7e6fe 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -204,7 +204,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector uar = &to_mucontext(context)->uar; } else { - err = mlx4_ib_db_alloc(dev, &cq->db, 1); + err = mlx4_db_alloc(dev->dev, dev->ib_dev.dma_device, &cq->db, 1); if (err) goto err_cq; @@ -250,7 +250,7 @@ err_mtt: err_db: if (!context) - mlx4_ib_db_free(dev, &cq->db); + mlx4_db_free(dev->dev, dev->ib_dev.dma_device, &cq->db); err_cq: kfree(cq); @@ -435,7 +435,7 @@ int mlx4_ib_destroy_cq(struct ib_cq *cq) ib_umem_release(mcq->umem); } else { mlx4_ib_free_cq_buf(dev, &mcq->buf, cq->cqe + 1); - mlx4_ib_db_free(dev, &mcq->db); + mlx4_db_free(dev->dev, dev->ib_dev.dma_device, &mcq->db); } kfree(mcq); diff --git a/drivers/infiniband/hw/mlx4/doorbell.c b/drivers/infiniband/hw/mlx4/doorbell.c index 1c36087..d17b36b 100644 --- a/drivers/infiniband/hw/mlx4/doorbell.c +++ b/drivers/infiniband/hw/mlx4/doorbell.c @@ -34,135 +34,10 @@ #include "mlx4_ib.h" -struct mlx4_ib_db_pgdir { - struct list_head list; - DECLARE_BITMAP(order0, MLX4_IB_DB_PER_PAGE); - DECLARE_BITMAP(order1, MLX4_IB_DB_PER_PAGE / 2); - unsigned long *bits[2]; - __be32 *db_page; - dma_addr_t db_dma; -}; - -static struct mlx4_ib_db_pgdir *mlx4_ib_alloc_db_pgdir(struct mlx4_ib_dev *dev) -{ - struct mlx4_ib_db_pgdir *pgdir; - - pgdir = kzalloc(sizeof *pgdir, GFP_KERNEL); - if (!pgdir) - return NULL; - - bitmap_fill(pgdir->order1, MLX4_IB_DB_PER_PAGE / 2); - pgdir->bits[0] = pgdir->order0; - pgdir->bits[1] = pgdir->order1; - pgdir->db_page = dma_alloc_coherent(dev->ib_dev.dma_device, - PAGE_SIZE, &pgdir->db_dma, - GFP_KERNEL); - if (!pgdir->db_page) { - kfree(pgdir); - return NULL; - } - - return pgdir; -} - -static int mlx4_ib_alloc_db_from_pgdir(struct mlx4_ib_db_pgdir *pgdir, - struct mlx4_ib_db *db, int order) -{ - int o; - int i; - - for (o = order; o <= 1; ++o) { - i = find_first_bit(pgdir->bits[o], MLX4_IB_DB_PER_PAGE >> o); - if (i < MLX4_IB_DB_PER_PAGE >> o) - goto found; - } - - return -ENOMEM; - -found: - clear_bit(i, pgdir->bits[o]); - - i <<= o; - - if (o > order) - set_bit(i ^ 1, pgdir->bits[order]); - - db->u.pgdir = pgdir; - db->index = i; - db->db = pgdir->db_page + db->index; - db->dma = pgdir->db_dma + db->index * 4; - db->order = order; - - return 0; -} - -int mlx4_ib_db_alloc(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db, int order) -{ - struct mlx4_ib_db_pgdir *pgdir; - int ret = 0; - - mutex_lock(&dev->pgdir_mutex); - - list_for_each_entry(pgdir, &dev->pgdir_list, list) - if (!mlx4_ib_alloc_db_from_pgdir(pgdir, db, order)) - goto out; - - pgdir = mlx4_ib_alloc_db_pgdir(dev); - if (!pgdir) { - ret = -ENOMEM; - goto out; - } - - list_add(&pgdir->list, &dev->pgdir_list); - - /* This should never fail -- we just allocated an empty page: */ - WARN_ON(mlx4_ib_alloc_db_from_pgdir(pgdir, db, order)); - -out: - mutex_unlock(&dev->pgdir_mutex); - - return ret; -} - -void mlx4_ib_db_free(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db) -{ - int o; - int i; - - mutex_lock(&dev->pgdir_mutex); - - o = db->order; - i = db->index; - - if (db->order == 0 && test_bit(i ^ 1, db->u.pgdir->order0)) { - clear_bit(i ^ 1, db->u.pgdir->order0); - ++o; - } - - i >>= o; - set_bit(i, db->u.pgdir->bits[o]); - - if (bitmap_full(db->u.pgdir->order1, MLX4_IB_DB_PER_PAGE / 2)) { - dma_free_coherent(dev->ib_dev.dma_device, PAGE_SIZE, - db->u.pgdir->db_page, db->u.pgdir->db_dma); - list_del(&db->u.pgdir->list); - kfree(db->u.pgdir); - } - - mutex_unlock(&dev->pgdir_mutex); -} - -struct mlx4_ib_user_db_page { - struct list_head list; - struct ib_umem *umem; - unsigned long user_virt; - int refcnt; -}; - int mlx4_ib_db_map_user(struct mlx4_ib_ucontext *context, unsigned long virt, - struct mlx4_ib_db *db) + struct mlx4_db *db) { - struct mlx4_ib_user_db_page *page; + struct mlx4_user_db_page *page; struct ib_umem_chunk *chunk; int err = 0; @@ -202,7 +77,7 @@ out: return err; } -void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_ib_db *db) +void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_db *db) { mutex_lock(&context->db_page_mutex); diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index 9e63732..e7514e4 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -43,24 +43,6 @@ #include #include -enum { - MLX4_IB_DB_PER_PAGE = PAGE_SIZE / 4 -}; - -struct mlx4_ib_db_pgdir; -struct mlx4_ib_user_db_page; - -struct mlx4_ib_db { - __be32 *db; - union { - struct mlx4_ib_db_pgdir *pgdir; - struct mlx4_ib_user_db_page *user_page; - } u; - dma_addr_t dma; - int index; - int order; -}; - struct mlx4_ib_ucontext { struct ib_ucontext ibucontext; struct mlx4_uar uar; @@ -88,7 +70,7 @@ struct mlx4_ib_cq { struct mlx4_cq mcq; struct mlx4_ib_cq_buf buf; struct mlx4_ib_cq_resize *resize_buf; - struct mlx4_ib_db db; + struct mlx4_db db; spinlock_t lock; struct mutex resize_mutex; struct ib_umem *umem; @@ -127,7 +109,7 @@ struct mlx4_ib_qp { struct mlx4_qp mqp; struct mlx4_buf buf; - struct mlx4_ib_db db; + struct mlx4_db db; struct mlx4_ib_wq rq; u32 doorbell_qpn; @@ -154,7 +136,7 @@ struct mlx4_ib_srq { struct ib_srq ibsrq; struct mlx4_srq msrq; struct mlx4_buf buf; - struct mlx4_ib_db db; + struct mlx4_db db; u64 *wrid; spinlock_t lock; int head; @@ -248,11 +230,9 @@ static inline struct mlx4_ib_ah *to_mah(struct ib_ah *ibah) return container_of(ibah, struct mlx4_ib_ah, ibah); } -int mlx4_ib_db_alloc(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db, int order); -void mlx4_ib_db_free(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db); int mlx4_ib_db_map_user(struct mlx4_ib_ucontext *context, unsigned long virt, - struct mlx4_ib_db *db); -void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_ib_db *db); + struct mlx4_db *db); +void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_db *db); struct ib_mr *mlx4_ib_get_dma_mr(struct ib_pd *pd, int acc); int mlx4_ib_umem_write_mtt(struct mlx4_ib_dev *dev, struct mlx4_mtt *mtt, diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index b75efae..e65b8e4 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -514,7 +514,8 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, goto err; if (!init_attr->srq) { - err = mlx4_ib_db_alloc(dev, &qp->db, 0); + err = mlx4_db_alloc(dev->dev, dev->ib_dev.dma_device, + &qp->db, 0); if (err) goto err; @@ -580,7 +581,7 @@ err_buf: err_db: if (!pd->uobject && !init_attr->srq) - mlx4_ib_db_free(dev, &qp->db); + mlx4_db_free(dev->dev, dev->ib_dev.dma_device, &qp->db); err: return err; @@ -666,7 +667,7 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp, kfree(qp->rq.wrid); mlx4_buf_free(dev->dev, qp->buf_size, &qp->buf); if (!qp->ibqp.srq) - mlx4_ib_db_free(dev, &qp->db); + mlx4_db_free(dev->dev, dev->ib_dev.dma_device, &qp->db); } } diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c index beaa3b0..936dc88 100644 --- a/drivers/infiniband/hw/mlx4/srq.c +++ b/drivers/infiniband/hw/mlx4/srq.c @@ -129,7 +129,7 @@ struct ib_srq *mlx4_ib_create_srq(struct ib_pd *pd, if (err) goto err_mtt; } else { - err = mlx4_ib_db_alloc(dev, &srq->db, 0); + err = mlx4_db_alloc(dev->dev, dev->ib_dev.dma_device, &srq->db, 0); if (err) goto err_srq; @@ -200,7 +200,7 @@ err_buf: err_db: if (!pd->uobject) - mlx4_ib_db_free(dev, &srq->db); + mlx4_db_free(dev->dev, dev->ib_dev.dma_device, &srq->db); err_srq: kfree(srq); @@ -267,7 +267,7 @@ int mlx4_ib_destroy_srq(struct ib_srq *srq) kfree(msrq->wrid); mlx4_buf_free(dev->dev, msrq->msrq.max << msrq->msrq.wqe_shift, &msrq->buf); - mlx4_ib_db_free(dev, &msrq->db); + mlx4_db_free(dev->dev, dev->ib_dev.dma_device, &msrq->db); } kfree(msrq); diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c index 75ef9d0..b6b00eb 100644 --- a/drivers/net/mlx4/alloc.c +++ b/drivers/net/mlx4/alloc.c @@ -196,3 +196,160 @@ void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf) } } EXPORT_SYMBOL_GPL(mlx4_buf_free); + +static struct mlx4_db_pgdir *mlx4_alloc_db_pgdir(struct device *dma_device) +{ + struct mlx4_db_pgdir *pgdir; + + pgdir = kzalloc(sizeof *pgdir, GFP_KERNEL); + if (!pgdir) + return NULL; + + bitmap_fill(pgdir->order1, MLX4_DB_PER_PAGE / 2); + pgdir->bits[0] = pgdir->order0; + pgdir->bits[1] = pgdir->order1; + pgdir->db_page = dma_alloc_coherent(dma_device, PAGE_SIZE, + &pgdir->db_dma, GFP_KERNEL); + if (!pgdir->db_page) { + kfree(pgdir); + return NULL; + } + + return pgdir; +} + +static int mlx4_alloc_db_from_pgdir(struct mlx4_db_pgdir *pgdir, + struct mlx4_db *db, int order) +{ + int o; + int i; + + for (o = order; o <= 1; ++o) { + i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o); + if (i < MLX4_DB_PER_PAGE >> o) + goto found; + } + + return -ENOMEM; + +found: + clear_bit(i, pgdir->bits[o]); + + i <<= o; + + if (o > order) + set_bit(i ^ 1, pgdir->bits[order]); + + db->u.pgdir = pgdir; + db->index = i; + db->db = pgdir->db_page + db->index; + db->dma = pgdir->db_dma + db->index * 4; + db->order = order; + + return 0; +} + +int mlx4_db_alloc(struct mlx4_dev *dev, struct device *dma_device, + struct mlx4_db *db, int order) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + struct mlx4_db_pgdir *pgdir; + int ret = 0; + + mutex_lock(&priv->pgdir_mutex); + + list_for_each_entry(pgdir, &priv->pgdir_list, list) + if (!mlx4_alloc_db_from_pgdir(pgdir, db, order)) + goto out; + + pgdir = mlx4_alloc_db_pgdir(dma_device); + if (!pgdir) { + ret = -ENOMEM; + goto out; + } + + list_add(&pgdir->list, &priv->pgdir_list); + + /* This should never fail -- we just allocated an empty page: */ + WARN_ON(mlx4_alloc_db_from_pgdir(pgdir, db, order)); + +out: + mutex_unlock(&priv->pgdir_mutex); + + return ret; +} +EXPORT_SYMBOL_GPL(mlx4_db_alloc); + +void mlx4_db_free(struct mlx4_dev *dev, struct device *dma_device, + struct mlx4_db *db) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + int o; + int i; + + mutex_lock(&priv->pgdir_mutex); + + o = db->order; + i = db->index; + + if (db->order == 0 && test_bit(i ^ 1, db->u.pgdir->order0)) { + clear_bit(i ^ 1, db->u.pgdir->order0); + ++o; + } + i >>= o; + set_bit(i, db->u.pgdir->bits[o]); + + if (bitmap_full(db->u.pgdir->order1, MLX4_DB_PER_PAGE / 2)) { + dma_free_coherent(dma_device, PAGE_SIZE, + db->u.pgdir->db_page, db->u.pgdir->db_dma); + list_del(&db->u.pgdir->list); + kfree(db->u.pgdir); + } + + mutex_unlock(&priv->pgdir_mutex); +} +EXPORT_SYMBOL_GPL(mlx4_db_free); + +int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres, + struct device *dma_device, int size, int max_direct) +{ + int err; + + err = mlx4_db_alloc(dev, dma_device, &wqres->db, 1); + if (err) + return err; + *wqres->db.db = 0; + + if (mlx4_buf_alloc(dev, size, max_direct, &wqres->buf)) { + err = -ENOMEM; + goto err_db; + } + + err = mlx4_mtt_init(dev, wqres->buf.npages, wqres->buf.page_shift, + &wqres->mtt); + if (err) + goto err_buf; + err = mlx4_buf_write_mtt(dev, &wqres->mtt, &wqres->buf); + if (err) + goto err_mtt; + + return 0; + +err_mtt: + mlx4_mtt_cleanup(dev, &wqres->mtt); +err_buf: + mlx4_buf_free(dev, size, &wqres->buf); +err_db: + mlx4_db_free(dev, dma_device, &wqres->db); + return err; +} +EXPORT_SYMBOL_GPL(mlx4_alloc_hwq_res); + +void mlx4_free_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres, + struct device *dma_device, int size) +{ + mlx4_mtt_cleanup(dev, &wqres->mtt); + mlx4_buf_free(dev, size, &wqres->buf); + mlx4_db_free(dev, dma_device, &wqres->db); +} +EXPORT_SYMBOL_GPL(mlx4_free_hwq_res); diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 49a4aca..3ab9034 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -798,6 +798,9 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id) INIT_LIST_HEAD(&priv->ctx_list); spin_lock_init(&priv->ctx_lock); + INIT_LIST_HEAD(&priv->pgdir_list); + mutex_init(&priv->pgdir_mutex); + /* * Now reset the HCA before we touch the PCI capabilities or * attempt a firmware command, since a boot ROM may have left diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index 7333681..a4023c2 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -257,6 +257,9 @@ struct mlx4_priv { struct list_head ctx_list; spinlock_t ctx_lock; + struct list_head pgdir_list; + struct mutex pgdir_mutex; + struct mlx4_fw fw; struct mlx4_cmd cmd; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index ff7df1a..0cb92ee 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -37,6 +37,8 @@ #include #include +#include + #include enum { @@ -208,6 +210,44 @@ struct mlx4_mtt { int page_shift; }; +enum { + MLX4_DB_PER_PAGE = PAGE_SIZE / 4 +}; + +struct mlx4_db_pgdir { + struct list_head list; + DECLARE_BITMAP(order0, MLX4_DB_PER_PAGE); + DECLARE_BITMAP(order1, MLX4_DB_PER_PAGE / 2); + unsigned long *bits[2]; + __be32 *db_page; + dma_addr_t db_dma; +}; + +struct mlx4_user_db_page { + struct list_head list; + struct ib_umem *umem; + unsigned long user_virt; + int refcnt; +}; + +struct mlx4_db { + __be32 *db; + union { + struct mlx4_db_pgdir *pgdir; + struct mlx4_user_db_page *user_page; + } u; + dma_addr_t dma; + int index; + int order; +}; + + +struct mlx4_hwq_resources { + struct mlx4_db db; + struct mlx4_mtt mtt; + struct mlx4_buf buf; +}; + struct mlx4_mr { struct mlx4_mtt mtt; u64 iova; @@ -341,6 +381,16 @@ int mlx4_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt, int mlx4_buf_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt, struct mlx4_buf *buf); +int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres, + struct device *dma_device, int size, int max_direct); +void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres, + struct device *dma_device, int size); + +int mlx4_db_alloc(struct mlx4_dev *dev, struct device *dma_device, + struct mlx4_db *db, int order); +void mlx4_db_free(struct mlx4_dev *dev, struct device *dma_device, + struct mlx4_db *db); + int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq); void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq); -- 1.5.4 From yevgenyp at mellanox.co.il Fri Apr 18 05:20:07 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Fri, 18 Apr 2008 15:20:07 +0300 Subject: [ofa-general][PATCH] mlx4: Qp range reservation (MP support, Patch 2) Message-ID: <480891F7.8090807@mellanox.co.il> >From 82401698d675e97aca4d3430a0f8a0fea893c64f Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Thu, 17 Apr 2008 15:40:59 +0300 Subject: [PATCH] mlx4: Qp range reservation Prior to allocating a qp, one need to reserve an aligned range of qps. The change is made to enable allocation of consecutive qps. Signed-off-by: Yevgeny Petrilin --- drivers/infiniband/hw/mlx4/qp.c | 9 ++++ drivers/net/mlx4/alloc.c | 99 ++++++++++++++++++++++++++++++++++++-- drivers/net/mlx4/mlx4.h | 6 ++ drivers/net/mlx4/qp.c | 44 ++++++++++++----- include/linux/mlx4/device.h | 5 ++- 5 files changed, 143 insertions(+), 20 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index e65b8e4..c21a9a3 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -545,6 +545,11 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, } } + if (!sqpn) + err = mlx4_qp_reserve_range(dev->dev, 1, 1, &sqpn); + if (err) + goto err_wrid; + err = mlx4_qp_alloc(dev->dev, sqpn, &qp->mqp); if (err) goto err_wrid; @@ -655,6 +660,10 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp, mlx4_ib_unlock_cqs(send_cq, recv_cq); mlx4_qp_free(dev->dev, &qp->mqp); + + if (!is_sqp(dev, qp)) + mlx4_qp_release_range(dev->dev, qp->mqp.qpn, 1); + mlx4_mtt_cleanup(dev->dev, &qp->mtt); if (is_user) { diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c index b6b00eb..52b4af3 100644 --- a/drivers/net/mlx4/alloc.c +++ b/drivers/net/mlx4/alloc.c @@ -44,15 +44,18 @@ u32 mlx4_bitmap_alloc(struct mlx4_bitmap *bitmap) spin_lock(&bitmap->lock); - obj = find_next_zero_bit(bitmap->table, bitmap->max, bitmap->last); - if (obj >= bitmap->max) { + obj = find_next_zero_bit(bitmap->table, bitmap->effective_max, + bitmap->last); + if (obj >= bitmap->effective_max) { bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask; - obj = find_first_zero_bit(bitmap->table, bitmap->max); + obj = find_first_zero_bit(bitmap->table, bitmap->effective_max); } - if (obj < bitmap->max) { + if (obj < bitmap->effective_max) { set_bit(obj, bitmap->table); - bitmap->last = (obj + 1) & (bitmap->max - 1); + bitmap->last = (obj + 1); + if (bitmap->last == bitmap->effective_max) + bitmap->last = 0; obj |= bitmap->top; } else obj = -1; @@ -73,7 +76,83 @@ void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj) spin_unlock(&bitmap->lock); } -int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved) +static unsigned long find_aligned_range(unsigned long *bitmap, + u32 start, u32 nbits, + int len, int align) +{ + unsigned long end, i; + +again: + start = ALIGN(start, align); + while ((start < nbits) && test_bit(start, bitmap)) + start += align; + if (start >= nbits) + return -1; + + end = start+len; + if (end > nbits) + return -1; + for (i = start+1; i < end; i++) { + if (test_bit(i, bitmap)) { + start = i+1; + goto again; + } + } + return start; +} + +u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align) +{ + u32 obj, i; + + if (likely(cnt == 1 && align == 1)) + return mlx4_bitmap_alloc(bitmap); + + spin_lock(&bitmap->lock); + + obj = find_aligned_range(bitmap->table, bitmap->last, + bitmap->effective_max, cnt, align); + if (obj >= bitmap->effective_max) { + bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask; + obj = find_aligned_range(bitmap->table, 0, + bitmap->effective_max, + cnt, align); + } + + if (obj < bitmap->effective_max) { + for (i = 0; i < cnt; i++) + set_bit(obj+i, bitmap->table); + if (obj == bitmap->last) { + bitmap->last = (obj + cnt); + if (bitmap->last >= bitmap->effective_max) + bitmap->last = 0; + } + obj |= bitmap->top; + } else + obj = -1; + + spin_unlock(&bitmap->lock); + + return obj; +} + +void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt) +{ + u32 i; + + obj &= bitmap->max - 1; + + spin_lock(&bitmap->lock); + for (i = 0; i < cnt; i++) + clear_bit(obj+i, bitmap->table); + bitmap->last = min(bitmap->last, obj); + bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask; + spin_unlock(&bitmap->lock); +} + +int mlx4_bitmap_init_with_effective_max(struct mlx4_bitmap *bitmap, + u32 num, u32 mask, u32 reserved, + u32 effective_max) { int i; @@ -85,6 +164,7 @@ int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved bitmap->top = 0; bitmap->max = num; bitmap->mask = mask; + bitmap->effective_max = effective_max; spin_lock_init(&bitmap->lock); bitmap->table = kzalloc(BITS_TO_LONGS(num) * sizeof (long), GFP_KERNEL); if (!bitmap->table) @@ -96,6 +176,13 @@ int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved return 0; } +int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, + u32 num, u32 mask, u32 reserved) +{ + return mlx4_bitmap_init_with_effective_max(bitmap, num, mask, + reserved, num); +} + void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap) { kfree(bitmap->table); diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index a4023c2..2c69d46 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -111,6 +111,7 @@ struct mlx4_bitmap { u32 last; u32 top; u32 max; + u32 effective_max; u32 mask; spinlock_t lock; unsigned long *table; @@ -287,7 +288,12 @@ static inline struct mlx4_priv *mlx4_priv(struct mlx4_dev *dev) u32 mlx4_bitmap_alloc(struct mlx4_bitmap *bitmap); void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj); +u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align); +void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt); int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved); +int mlx4_bitmap_init_with_effective_max(struct mlx4_bitmap *bitmap, + u32 num, u32 mask, u32 reserved, + u32 effective_max); void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap); int mlx4_reset(struct mlx4_dev *dev); diff --git a/drivers/net/mlx4/qp.c b/drivers/net/mlx4/qp.c index fa24e65..dff8e66 100644 --- a/drivers/net/mlx4/qp.c +++ b/drivers/net/mlx4/qp.c @@ -147,19 +147,42 @@ int mlx4_qp_modify(struct mlx4_dev *dev, struct mlx4_mtt *mtt, } EXPORT_SYMBOL_GPL(mlx4_qp_modify); -int mlx4_qp_alloc(struct mlx4_dev *dev, int sqpn, struct mlx4_qp *qp) +int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + struct mlx4_qp_table *qp_table = &priv->qp_table; + int qpn; + + qpn = mlx4_bitmap_alloc_range(&qp_table->bitmap, cnt, align); + if (qpn == -1) + return -ENOMEM; + + *base = qpn; + return 0; +} +EXPORT_SYMBOL_GPL(mlx4_qp_reserve_range); + +void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + struct mlx4_qp_table *qp_table = &priv->qp_table; + if (base_qpn < dev->caps.sqp_start + 8) + return; + + mlx4_bitmap_free_range(&qp_table->bitmap, base_qpn, cnt); +} +EXPORT_SYMBOL_GPL(mlx4_qp_release_range); + +int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp) { struct mlx4_priv *priv = mlx4_priv(dev); struct mlx4_qp_table *qp_table = &priv->qp_table; int err; - if (sqpn) - qp->qpn = sqpn; - else { - qp->qpn = mlx4_bitmap_alloc(&qp_table->bitmap); - if (qp->qpn == -1) - return -ENOMEM; - } + if (!qpn) + return -EINVAL; + + qp->qpn = qpn; err = mlx4_table_get(dev, &qp_table->qp_table, qp->qpn); if (err) @@ -208,9 +231,6 @@ err_put_qp: mlx4_table_put(dev, &qp_table->qp_table, qp->qpn); err_out: - if (!sqpn) - mlx4_bitmap_free(&qp_table->bitmap, qp->qpn); - return err; } EXPORT_SYMBOL_GPL(mlx4_qp_alloc); @@ -240,8 +260,6 @@ void mlx4_qp_free(struct mlx4_dev *dev, struct mlx4_qp *qp) mlx4_table_put(dev, &qp_table->auxc_table, qp->qpn); mlx4_table_put(dev, &qp_table->qp_table, qp->qpn); - if (qp->qpn >= dev->caps.sqp_start + 8) - mlx4_bitmap_free(&qp_table->bitmap, qp->qpn); } EXPORT_SYMBOL_GPL(mlx4_qp_free); diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 0cb92ee..a088c63 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -395,7 +395,10 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq); void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq); -int mlx4_qp_alloc(struct mlx4_dev *dev, int sqpn, struct mlx4_qp *qp); +int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base); +void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt); + +int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp); void mlx4_qp_free(struct mlx4_dev *dev, struct mlx4_qp *qp); int mlx4_srq_alloc(struct mlx4_dev *dev, u32 pdn, struct mlx4_mtt *mtt, -- 1.5.4 From terrywatson at live.com Fri Apr 18 02:38:17 2008 From: terrywatson at live.com (terry watson) Date: Fri, 18 Apr 2008 09:38:17 +0000 Subject: ***SPAM*** RE: [ofa-general] Is IBIS only for querying OpenSM? In-Reply-To: <48084F4E.3020705@cea.fr> References: <48084F4E.3020705@cea.fr> Message-ID: Thanks for the response. The environment I am testing has two clusters and one switch, with the subnet manager running from the switch. Half the nodes are in one partition and half in the other (ignoring 0xffff), call them partitions A and B. I have access to one node in partition A as root and would like to be able to reconfigure that node locally, and with no access to the switch subnet manager configuration, to be able to access nodes in partition B. After some reading I believe that IBIS from IBUtils should allow me to alter the local p_key table and therefore allow me to access nodes on partition B. I cannot test this until I am on-site and I am formulating a strategy before arrival. If it does not work this way it would be useful to know in advance. MPI is used rather than IPoIB. If my approach is flawed I would appreciate it if someone could point this out. ________________________________ > Date: Fri, 18 Apr 2008 09:35:42 +0200 > From: philippe.gregoire at cea.fr > To: terrywatson at live.com > CC: general at lists.openfabrics.org > Subject: Re: [ofa-general] Is IBIS only for querying OpenSM? > > terry watson a écrit : > > Hi all, > > I will be performing some testing of partitioning used as a security control. Am I right in believing that IBIS will be able to set partition table values of the local compute node I am logged on to, even though they are not using OpenSM, but rather a SM on a switch? Could I then attempt to access a partition that I was originally excluded from accessing? > > I am new to Infiniband technology and would also appreciate a response from an expert who has views on the strength of the security that partitioning provides in separating two clusters that should have no interaction whatsoever. > > Thanks, > Dave > _________________________________________________________________ > Discover the new Windows Vista > http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE_______________________________________________ > general mailing list > general at lists.openfabrics.org _________________________________________________________________ News, entertainment and everything you care about at Live.com. Get it now! http://www.live.com/getstarted.aspx From dsw at bobclements.com Fri Apr 18 05:55:41 2008 From: dsw at bobclements.com (Jeanette Teague) Date: Fri, 18 Apr 2008 06:55:41 -0600 Subject: [ofa-general] Re: Hi Best shoes ever! Message-ID: <01c8a121$3751ce80$128fe5c9@dsw> Best footwear of all times! Christian Dior, D&G and Dsquared! Stay tuned! VIEW ATTACHED FILE FOR DETAILS!!!! -------------- next part -------------- A non-text attachment was scrubbed... Name: details.zip Type: application/zip Size: 433 bytes Desc: not available URL: From hrosenstock at xsigo.com Fri Apr 18 07:37:51 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Fri, 18 Apr 2008 07:37:51 -0700 Subject: ***SPAM*** RE: [ofa-general] Is IBIS only for querying OpenSM? In-Reply-To: References: <48084F4E.3020705@cea.fr> Message-ID: <1208529471.26936.303.camel@hrosenstock-ws.xsigo.com> Terry, On Fri, 2008-04-18 at 09:38 +0000, terry watson wrote: > Thanks for the response. The environment I am testing has two clusters and one switch, > with the subnet manager running from the switch. Half the nodes are in one partition and > half in the other (ignoring 0xffff), call them partitions A and B. I have access to one > node in partition A as root and would like to be able to reconfigure that node locally, > and with no access to the switch subnet manager configuration, to be able to access nodes > in partition B. In general, this is not a good idea IMO. As Philippe wrote, the SM (is supposed to) own the writing of those tables (rather than some low level diag utility). Even if you modify the local PKey table, it is possible for the SM to overwrite this. Also, there are several other ramifications of this depending on how the SM deals with partitions. Even if you change things locally, that may not be sufficient as the peer switch port may do partition filtering so that may need to change that too and possible more PKey tables in the network depending on what your SM does. Also, there are SA responses that depend on the SM having correct knowledge (like PathRecords and others) so the end node may not get any response on that partition for certain things. > After some reading I believe that IBIS from IBUtils should allow me to alter the > local p_key table and therefore allow me to access nodes on partition B. Yes but it may take more than this for it to work depending on your SM. > I cannot test this until I am on-site and I am formulating a strategy before arrival. > If it does not work this way it would be useful to know in advance. MPI is used rather than IPoIB. Some MPIs use out of band mechanisms to create connections so the SA issues may not apply there; but I think the partition ones might and are SM dependent so your mileage may vary... > If my approach is flawed I would appreciate it if someone could point this out. The proper way to do this is by reconfiguring your SM. -- Hal > ________________________________ > > Date: Fri, 18 Apr 2008 09:35:42 +0200 > > From: philippe.gregoire at cea.fr > > To: terrywatson at live.com > > CC: general at lists.openfabrics.org > > Subject: Re: [ofa-general] Is IBIS only for querying OpenSM? > > > > terry watson a écrit : > > > > Hi all, > > > > I will be performing some testing of partitioning used as a security control. Am I right in believing that IBIS will be able to set partition table values of the local compute node I am logged on to, even though they are not using OpenSM, but rather a SM on a switch? Could I then attempt to access a partition that I was originally excluded from accessing? > > > > I am new to Infiniband technology and would also appreciate a response from an expert who has views on the strength of the security that partitioning provides in separating two clusters that should have no interaction whatsoever. > > > > Thanks, > > Dave > > _________________________________________________________________ > > Discover the new Windows Vista > > http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE_______________________________________________ > > general mailing list > > general at lists.openfabrics.org > _________________________________________________________________ > News, entertainment and everything you care about at Live.com. Get it now! > http://www.live.com/getstarted.aspx_______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Fri Apr 18 09:19:35 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 18 Apr 2008 09:19:35 -0700 Subject: [ofa-general][PATCH] mlx4: Moving db management to mlx4_core (MP support, Patch 1) In-Reply-To: <48089127.2040905@mellanox.co.il> (Yevgeny Petrilin's message of "Fri, 18 Apr 2008 15:16:39 +0300") References: <48089127.2040905@mellanox.co.il> Message-ID: > + INIT_LIST_HEAD(&priv->pgdir_list); > + mutex_init(&priv->pgdir_mutex); Your patch adds pgdir_list to core but doesn't remove it from mlx4_ib. > - err = mlx4_ib_db_alloc(dev, &cq->db, 1); > + err = mlx4_db_alloc(dev->dev, dev->ib_dev.dma_device, &cq->db, 1); > +int mlx4_db_alloc(struct mlx4_dev *dev, struct device *dma_device, > + struct mlx4_db *db, int order) I must be missing something but why do you add the dma_device parameter here? When would a consumer ever want to pass something other than dev->pdev->dev? > +int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres, > + struct device *dma_device, int size, int max_direct) This is adding a separate API beyond just moving the doorbell stuff to mlx4_core. Please separate this still further into another patch. Can mlx4_ib use this interface too? - R. From terrywatson at live.com Fri Apr 18 08:25:31 2008 From: terrywatson at live.com (terry watson) Date: Fri, 18 Apr 2008 15:25:31 +0000 Subject: ***SPAM*** RE: ***SPAM*** RE: [ofa-general] Is IBIS only for querying OpenSM? In-Reply-To: <1208529471.26936.303.camel@hrosenstock-ws.xsigo.com> References: <48084F4E.3020705@cea.fr> <1208529471.26936.303.camel@hrosenstock-ws.xsigo.com> Message-ID: Thanks Hal. I appreciate using the SM is the correct means of controlling partitioning; however, the testing I am performing is assessing security vulnerabilities. In this case, the two clusters are separated by partitioning only and I am seeking to assess the ability of a user to obtain unauthorised access to one cluster from the other. The requirement for the vendor building the two clusters was that they were isolated from each other. They have chosen to use one switch and I have to assess if this provides adequate isolation, as per the client's security requirements. At this stage of my investigation, I do not believe partitioning on a switch provides adequate separation / isolation to be used as a security control and two physical switches will need to be used to provide the complete isolation that is required. But my task is to prove this to justify the expense.... :) I value any comments or input on this topic. ---------------------------------------- > Subject: Re: ***SPAM*** RE: [ofa-general] Is IBIS only for querying OpenSM? > From: hrosenstock at xsigo.com > To: terrywatson at live.com > CC: philippe.gregoire at cea.fr; general at lists.openfabrics.org > Date: Fri, 18 Apr 2008 07:37:51 -0700 > > Terry, > > On Fri, 2008-04-18 at 09:38 +0000, terry watson wrote: >> Thanks for the response. The environment I am testing has two clusters and one switch, >> with the subnet manager running from the switch. Half the nodes are in one partition and >> half in the other (ignoring 0xffff), call them partitions A and B. I have access to one >> node in partition A as root and would like to be able to reconfigure that node locally, >> and with no access to the switch subnet manager configuration, to be able to access nodes >> in partition B. > > In general, this is not a good idea IMO. As Philippe wrote, the SM (is > supposed to) own the writing of those tables (rather than some low level > diag utility). Even if you modify the local PKey table, it is possible > for the SM to overwrite this. Also, there are several other > ramifications of this depending on how the SM deals with partitions. > Even if you change things locally, that may not be sufficient as the > peer switch port may do partition filtering so that may need to change > that too and possible more PKey tables in the network depending on what > your SM does. Also, there are SA responses that depend on the SM having > correct knowledge (like PathRecords and others) so the end node may not > get any response on that partition for certain things. > >> After some reading I believe that IBIS from IBUtils should allow me to alter the >> local p_key table and therefore allow me to access nodes on partition B. > > Yes but it may take more than this for it to work depending on your SM. > >> I cannot test this until I am on-site and I am formulating a strategy before arrival. >> If it does not work this way it would be useful to know in advance. MPI is used rather than IPoIB. > > Some MPIs use out of band mechanisms to create connections so the SA > issues may not apply there; but I think the partition ones might and are > SM dependent so your mileage may vary... > >> If my approach is flawed I would appreciate it if someone could point this out. > > The proper way to do this is by reconfiguring your SM. > > -- Hal > >> ________________________________ >>> Date: Fri, 18 Apr 2008 09:35:42 +0200 >>> From: philippe.gregoire at cea.fr >>> To: terrywatson at live.com >>> CC: general at lists.openfabrics.org >>> Subject: Re: [ofa-general] Is IBIS only for querying OpenSM? >>> >>> terry watson a écrit : >>> >>> Hi all, >>> >>> I will be performing some testing of partitioning used as a security control. Am I right in believing that IBIS will be able to set partition table values of the local compute node I am logged on to, even though they are not using OpenSM, but rather a SM on a switch? Could I then attempt to access a partition that I was originally excluded from accessing? >>> >>> I am new to Infiniband technology and would also appreciate a response from an expert who has views on the strength of the security that partitioning provides in separating two clusters that should have no interaction whatsoever. >>> >>> Thanks, >>> Dave >>> _________________________________________________________________ >>> Discover the new Windows Vista >>> http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE_______________________________________________ >>> general mailing list >>> general at lists.openfabrics.org >> _________________________________________________________________ >> News, entertainment and everything you care about at Live.com. Get it now! >> http://www.live.com/getstarted.aspx_______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > _________________________________________________________________ Connect to the next generation of MSN Messenger  http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline From hrosenstock at xsigo.com Fri Apr 18 12:12:18 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Fri, 18 Apr 2008 12:12:18 -0700 Subject: ***SPAM*** RE: [ofa-general] Is IBIS only for querying OpenSM? In-Reply-To: References: <48084F4E.3020705@cea.fr> <1208529471.26936.303.camel@hrosenstock-ws.xsigo.com> Message-ID: <1208545938.26936.365.camel@hrosenstock-ws.xsigo.com> Terry, On Fri, 2008-04-18 at 15:25 +0000, terry watson wrote: > Thanks Hal. I appreciate using the SM is the correct means of controlling partitioning; however, the testing I am performing is assessing security vulnerabilities. In this case, the two clusters are separated by partitioning only and I am seeking to assess the ability of a user to obtain unauthorised access to one cluster from the other. The requirement for the vendor building the two clusters was that they were isolated from each other. They have chosen to use one switch and I have to assess if this provides adequate isolation, as per the client's security requirements. > > At this stage of my investigation, I do not believe partitioning on a switch provides adequate separation / isolation to be used as a security control and two physical switches will need to be used to provide the complete isolation that is required. But my task is to prove this to justify the expense.... :) > > I value any comments or input on this topic. One pertinent thing here is whether a MKey manager is supported in the SM, and if so, what level of MKeying is used. Sufficient MKey protection with a sophisticated manager could make the updates of such PKey tables difficult but not impossible. Currently, OpenSM does not support an MKey manager but one is being proposed for the next OFED cycle. Currently, OpenSM supports a static configured MKey and MKey lease period which could make things marginally better if you are concerned with rogue updates like this. Not sure about the third party (vendor) SMs in this regard. Contact your vendor if this is of interest. -- Hal > ---------------------------------------- > > Subject: Re: ***SPAM*** RE: [ofa-general] Is IBIS only for querying OpenSM? > > From: hrosenstock at xsigo.com > > To: terrywatson at live.com > > CC: philippe.gregoire at cea.fr; general at lists.openfabrics.org > > Date: Fri, 18 Apr 2008 07:37:51 -0700 > > > > Terry, > > > > On Fri, 2008-04-18 at 09:38 +0000, terry watson wrote: > >> Thanks for the response. The environment I am testing has two clusters and one switch, > >> with the subnet manager running from the switch. Half the nodes are in one partition and > >> half in the other (ignoring 0xffff), call them partitions A and B. I have access to one > >> node in partition A as root and would like to be able to reconfigure that node locally, > >> and with no access to the switch subnet manager configuration, to be able to access nodes > >> in partition B. > > > > In general, this is not a good idea IMO. As Philippe wrote, the SM (is > > supposed to) own the writing of those tables (rather than some low level > > diag utility). Even if you modify the local PKey table, it is possible > > for the SM to overwrite this. Also, there are several other > > ramifications of this depending on how the SM deals with partitions. > > Even if you change things locally, that may not be sufficient as the > > peer switch port may do partition filtering so that may need to change > > that too and possible more PKey tables in the network depending on what > > your SM does. Also, there are SA responses that depend on the SM having > > correct knowledge (like PathRecords and others) so the end node may not > > get any response on that partition for certain things. > > > >> After some reading I believe that IBIS from IBUtils should allow me to alter the > >> local p_key table and therefore allow me to access nodes on partition B. > > > > Yes but it may take more than this for it to work depending on your SM. > > > >> I cannot test this until I am on-site and I am formulating a strategy before arrival. > >> If it does not work this way it would be useful to know in advance. MPI is used rather than IPoIB. > > > > Some MPIs use out of band mechanisms to create connections so the SA > > issues may not apply there; but I think the partition ones might and are > > SM dependent so your mileage may vary... > > > >> If my approach is flawed I would appreciate it if someone could point this out. > > > > The proper way to do this is by reconfiguring your SM. > > > > -- Hal > > > >> ________________________________ > >>> Date: Fri, 18 Apr 2008 09:35:42 +0200 > >>> From: philippe.gregoire at cea.fr > >>> To: terrywatson at live.com > >>> CC: general at lists.openfabrics.org > >>> Subject: Re: [ofa-general] Is IBIS only for querying OpenSM? > >>> > >>> terry watson a écrit : > >>> > >>> Hi all, > >>> > >>> I will be performing some testing of partitioning used as a security control. Am I right in believing that IBIS will be able to set partition table values of the local compute node I am logged on to, even though they are not using OpenSM, but rather a SM on a switch? Could I then attempt to access a partition that I was originally excluded from accessing? > >>> > >>> I am new to Infiniband technology and would also appreciate a response from an expert who has views on the strength of the security that partitioning provides in separating two clusters that should have no interaction whatsoever. > >>> > >>> Thanks, > >>> Dave > >>> _________________________________________________________________ > >>> Discover the new Windows Vista > >>> http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE_______________________________________________ > >>> general mailing list > >>> general at lists.openfabrics.org > >> _________________________________________________________________ > >> News, entertainment and everything you care about at Live.com. Get it now! > >> http://www.live.com/getstarted.aspx_______________________________________________ > >> general mailing list > >> general at lists.openfabrics.org > >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >> > >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > _________________________________________________________________ > Connect to the next generation of MSN Messenger > http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline From rdreier at cisco.com Fri Apr 18 13:08:53 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 18 Apr 2008 13:08:53 -0700 Subject: [ofa-general] [PATCH/RFC] RDMA/nes: Remove unneeded function declarations Message-ID: Remove redundant static declarations of functions that are defined before they are used in the source. Signed-off-by: Roland Dreier --- diff --git a/drivers/infiniband/hw/nes/nes.c b/drivers/infiniband/hw/nes/nes.c index b00b0e3..b046262 100644 --- a/drivers/infiniband/hw/nes/nes.c +++ b/drivers/infiniband/hw/nes/nes.c @@ -96,12 +96,6 @@ static LIST_HEAD(nes_dev_list); atomic_t qps_destroyed; -static void nes_print_macaddr(struct net_device *netdev); -static irqreturn_t nes_interrupt(int, void *); -static int __devinit nes_probe(struct pci_dev *, const struct pci_device_id *); -static void __devexit nes_remove(struct pci_dev *); -static int __init nes_init_module(void); -static void __exit nes_exit_module(void); static unsigned int ee_flsh_adapter; static unsigned int sysfs_nonidx_addr; static unsigned int sysfs_idx_addr; diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c index 3416664..01cd0ef 100644 --- a/drivers/infiniband/hw/nes/nes_nic.c +++ b/drivers/infiniband/hw/nes/nes_nic.c @@ -92,15 +92,6 @@ static const u32 default_msg = NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_LINK | NETIF_MSG_IFUP | NETIF_MSG_IFDOWN; static int debug = -1; - -static int nes_netdev_open(struct net_device *); -static int nes_netdev_stop(struct net_device *); -static int nes_netdev_start_xmit(struct sk_buff *, struct net_device *); -static struct net_device_stats *nes_netdev_get_stats(struct net_device *); -static void nes_netdev_tx_timeout(struct net_device *); -static int nes_netdev_set_mac_address(struct net_device *, void *); -static int nes_netdev_change_mtu(struct net_device *, int); - /** * nes_netdev_poll */ From rdreier at cisco.com Fri Apr 18 13:12:25 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 18 Apr 2008 13:12:25 -0700 Subject: [ofa-general][PATCH] mlx4: Qp range reservation (MP support, Patch 2) In-Reply-To: <480891F7.8090807@mellanox.co.il> (Yevgeny Petrilin's message of "Fri, 18 Apr 2008 15:20:07 +0300") References: <480891F7.8090807@mellanox.co.il> Message-ID: > +int mlx4_bitmap_init_with_effective_max(struct mlx4_bitmap *bitmap, > + u32 num, u32 mask, u32 reserved, > + u32 effective_max) This patch adds effective_max stuff but I don't see how it's used anywhere?? - R. From gstreiff at NetEffect.com Fri Apr 18 14:42:51 2008 From: gstreiff at NetEffect.com (Glenn Streiff) Date: Fri, 18 Apr 2008 16:42:51 -0500 Subject: [ofa-general] RE: [PATCH/RFC] RDMA/nes: Remove unneeded function declarations In-Reply-To: Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC07950108@venom2> Acked-by: Glenn Streiff Thanks. > -----Original Message----- > From: Roland Dreier [mailto:rdreier at cisco.com] > Sent: Friday, April 18, 2008 3:09 PM > To: general at lists.openfabrics.org > Cc: Faisal Latif; Nishi Gupta; Glenn Streiff > Subject: [PATCH/RFC] RDMA/nes: Remove unneeded function declarations > > > Remove redundant static declarations of functions that are defined > before they are used in the source. > > Signed-off-by: Roland Dreier > --- > diff --git a/drivers/infiniband/hw/nes/nes.c > b/drivers/infiniband/hw/nes/nes.c > index b00b0e3..b046262 100644 > --- a/drivers/infiniband/hw/nes/nes.c > +++ b/drivers/infiniband/hw/nes/nes.c > @@ -96,12 +96,6 @@ static LIST_HEAD(nes_dev_list); > > atomic_t qps_destroyed; > > -static void nes_print_macaddr(struct net_device *netdev); > -static irqreturn_t nes_interrupt(int, void *); > -static int __devinit nes_probe(struct pci_dev *, const > struct pci_device_id *); > -static void __devexit nes_remove(struct pci_dev *); > -static int __init nes_init_module(void); > -static void __exit nes_exit_module(void); > static unsigned int ee_flsh_adapter; > static unsigned int sysfs_nonidx_addr; > static unsigned int sysfs_idx_addr; > diff --git a/drivers/infiniband/hw/nes/nes_nic.c > b/drivers/infiniband/hw/nes/nes_nic.c > index 3416664..01cd0ef 100644 > --- a/drivers/infiniband/hw/nes/nes_nic.c > +++ b/drivers/infiniband/hw/nes/nes_nic.c > @@ -92,15 +92,6 @@ static const u32 default_msg = > NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_LINK > | NETIF_MSG_IFUP | NETIF_MSG_IFDOWN; > static int debug = -1; > > - > -static int nes_netdev_open(struct net_device *); > -static int nes_netdev_stop(struct net_device *); > -static int nes_netdev_start_xmit(struct sk_buff *, struct > net_device *); > -static struct net_device_stats *nes_netdev_get_stats(struct > net_device *); > -static void nes_netdev_tx_timeout(struct net_device *); > -static int nes_netdev_set_mac_address(struct net_device *, void *); > -static int nes_netdev_change_mtu(struct net_device *, int); > - > /** > * nes_netdev_poll > */ > From rdreier at cisco.com Fri Apr 18 14:54:59 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 18 Apr 2008 14:54:59 -0700 Subject: [ofa-general] Re: [PATCH v2] Add enum strings and *_str functions for enums In-Reply-To: <20080415133548.414aeaea.weiny2@llnl.gov> (Ira Weiny's message of "Tue, 15 Apr 2008 13:35:48 -0700") References: <20080415094750.35afc0e5.weiny2@llnl.gov> <20080415133548.414aeaea.weiny2@llnl.gov> Message-ID: Thanks, I added a man page and changed things a little and committed the following: commit 1c0b7ac0a6bbbe4d246ef4cf50ae31bde4929ba3 Author: Ira Weiny Date: Tue Apr 15 13:35:48 2008 -0700 Add functions to convert enum values to strings Add ibv_xxx_str() functions to convert node type, port state, event type and wc status enum values to strings. Signed-off-by: Ira K. Weiny Signed-off-by: Roland Dreier diff --git a/Makefile.am b/Makefile.am index 705b184..9b05306 100644 --- a/Makefile.am +++ b/Makefile.am @@ -9,7 +9,8 @@ src_libibverbs_la_CFLAGS = $(AM_CFLAGS) -DIBV_CONFIG_DIR=\"$(sysconfdir)/libibve libibverbs_version_script = @LIBIBVERBS_VERSION_SCRIPT@ src_libibverbs_la_SOURCES = src/cmd.c src/compat-1_0.c src/device.c src/init.c \ - src/marshall.c src/memory.c src/sysfs.c src/verbs.c + src/marshall.c src/memory.c src/sysfs.c src/verbs.c \ + src/enum_strs.c src_libibverbs_la_LDFLAGS = -version-info 1 -export-dynamic \ $(libibverbs_version_script) src_libibverbs_la_DEPENDENCIES = $(srcdir)/src/libibverbs.map @@ -38,20 +39,20 @@ libibverbsinclude_HEADERS = include/infiniband/arch.h include/infiniband/driver. include/infiniband/kern-abi.h include/infiniband/opcode.h include/infiniband/verbs.h \ include/infiniband/sa-kern-abi.h include/infiniband/sa.h include/infiniband/marshall.h -man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 man/ibv_devinfo.1 \ - man/ibv_rc_pingpong.1 man/ibv_uc_pingpong.1 man/ibv_ud_pingpong.1 \ - man/ibv_srq_pingpong.1 \ - man/ibv_alloc_pd.3 man/ibv_attach_mcast.3 man/ibv_create_ah.3 \ - man/ibv_create_ah_from_wc.3 man/ibv_create_comp_channel.3 \ - man/ibv_create_cq.3 man/ibv_create_qp.3 man/ibv_create_srq.3 \ - man/ibv_fork_init.3 man/ibv_get_async_event.3 \ - man/ibv_get_cq_event.3 man/ibv_get_device_guid.3 \ - man/ibv_get_device_list.3 man/ibv_get_device_name.3 \ - man/ibv_modify_qp.3 man/ibv_modify_srq.3 man/ibv_open_device.3 \ - man/ibv_poll_cq.3 man/ibv_post_recv.3 man/ibv_post_send.3 \ - man/ibv_post_srq_recv.3 man/ibv_query_device.3 man/ibv_query_gid.3 \ - man/ibv_query_pkey.3 man/ibv_query_port.3 man/ibv_query_qp.3 \ - man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3 \ +man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 man/ibv_devinfo.1 \ + man/ibv_rc_pingpong.1 man/ibv_uc_pingpong.1 man/ibv_ud_pingpong.1 \ + man/ibv_srq_pingpong.1 man/ibv_alloc_pd.3 man/ibv_attach_mcast.3 \ + man/ibv_create_ah.3 man/ibv_create_ah_from_wc.3 \ + man/ibv_create_comp_channel.3 man/ibv_create_cq.3 \ + man/ibv_create_qp.3 man/ibv_create_srq.3 man/ibv_event_type_str.3 \ + man/ibv_fork_init.3 man/ibv_get_async_event.3 \ + man/ibv_get_cq_event.3 man/ibv_get_device_guid.3 \ + man/ibv_get_device_list.3 man/ibv_get_device_name.3 \ + man/ibv_modify_qp.3 man/ibv_modify_srq.3 man/ibv_open_device.3 \ + man/ibv_poll_cq.3 man/ibv_post_recv.3 man/ibv_post_send.3 \ + man/ibv_post_srq_recv.3 man/ibv_query_device.3 man/ibv_query_gid.3 \ + man/ibv_query_pkey.3 man/ibv_query_port.3 man/ibv_query_qp.3 \ + man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3 \ man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 DEBIAN = debian/changelog debian/compat debian/control debian/copyright \ @@ -84,6 +85,8 @@ install-data-hook: $(RM) ibv_free_device_list.3 && \ $(RM) ibv_init_ah_from_wc.3 && \ $(RM) mult_to_ibv_rate.3 && \ + $(RM) ibv_node_type_str.3 && \ + $(RM) ibv_port_state_str.3 && \ $(LN_S) ibv_get_async_event.3 ibv_ack_async_event.3 && \ $(LN_S) ibv_get_cq_event.3 ibv_ack_cq_events.3 && \ $(LN_S) ibv_open_device.3 ibv_close_device.3 && \ @@ -97,5 +100,6 @@ install-data-hook: $(LN_S) ibv_attach_mcast.3 ibv_detach_mcast.3 && \ $(LN_S) ibv_get_device_list.3 ibv_free_device_list.3 && \ $(LN_S) ibv_create_ah_from_wc.3 ibv_init_ah_from_wc.3 && \ - $(LN_S) ibv_rate_to_mult.3 mult_to_ibv_rate.3 - + $(LN_S) ibv_rate_to_mult.3 mult_to_ibv_rate.3 && \ + $(LN_S) ibv_event_type_str.3 ibv_node_type_str.3 && \ + $(LN_S) ibv_event_type_str.3 ibv_port_state_str.3 diff --git a/examples/devinfo.c b/examples/devinfo.c index 4e4316a..1fadc80 100644 --- a/examples/devinfo.c +++ b/examples/devinfo.c @@ -67,17 +67,6 @@ static const char *guid_str(uint64_t node_guid, char *str) return str; } -static const char *port_state_str(enum ibv_port_state pstate) -{ - switch (pstate) { - case IBV_PORT_DOWN: return "PORT_DOWN"; - case IBV_PORT_INIT: return "PORT_INIT"; - case IBV_PORT_ARMED: return "PORT_ARMED"; - case IBV_PORT_ACTIVE: return "PORT_ACTIVE"; - default: return "invalid state"; - } -} - static const char *port_phy_state_str(uint8_t phys_state) { switch (phys_state) { @@ -266,7 +255,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port) } printf("\t\tport:\t%d\n", port); printf("\t\t\tstate:\t\t\t%s (%d)\n", - port_state_str(port_attr.state), port_attr.state); + ibv_port_state_str(port_attr.state), port_attr.state); printf("\t\t\tmax_mtu:\t\t%s (%d)\n", mtu_str(port_attr.max_mtu), port_attr.max_mtu); printf("\t\t\tactive_mtu:\t\t%s (%d)\n", diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c index 7181914..26fa45c 100644 --- a/examples/rc_pingpong.c +++ b/examples/rc_pingpong.c @@ -709,7 +709,8 @@ int main(int argc, char *argv[]) for (i = 0; i < ne; ++i) { if (wc[i].status != IBV_WC_SUCCESS) { - fprintf(stderr, "Failed status %d for wr_id %d\n", + fprintf(stderr, "Failed status %s (%d) for wr_id %d\n", + ibv_wc_status_str(wc[i].status), wc[i].status, (int) wc[i].wr_id); return 1; } diff --git a/examples/srq_pingpong.c b/examples/srq_pingpong.c index bc869c9..95bebf4 100644 --- a/examples/srq_pingpong.c +++ b/examples/srq_pingpong.c @@ -805,7 +805,8 @@ int main(int argc, char *argv[]) for (i = 0; i < ne; ++i) { if (wc[i].status != IBV_WC_SUCCESS) { - fprintf(stderr, "Failed status %d for wr_id %d\n", + fprintf(stderr, "Failed status %s (%d) for wr_id %d\n", + ibv_wc_status_str(wc[i].status), wc[i].status, (int) wc[i].wr_id); return 1; } diff --git a/examples/uc_pingpong.c b/examples/uc_pingpong.c index 6135030..c09c8c1 100644 --- a/examples/uc_pingpong.c +++ b/examples/uc_pingpong.c @@ -697,7 +697,8 @@ int main(int argc, char *argv[]) for (i = 0; i < ne; ++i) { if (wc[i].status != IBV_WC_SUCCESS) { - fprintf(stderr, "Failed status %d for wr_id %d\n", + fprintf(stderr, "Failed status %s (%d) for wr_id %d\n", + ibv_wc_status_str(wc[i].status), wc[i].status, (int) wc[i].wr_id); return 1; } diff --git a/examples/ud_pingpong.c b/examples/ud_pingpong.c index aaee26c..8f3d50b 100644 --- a/examples/ud_pingpong.c +++ b/examples/ud_pingpong.c @@ -697,7 +697,8 @@ int main(int argc, char *argv[]) for (i = 0; i < ne; ++i) { if (wc[i].status != IBV_WC_SUCCESS) { - fprintf(stderr, "Failed status %d for wr_id %d\n", + fprintf(stderr, "Failed status %s (%d) for wr_id %d\n", + ibv_wc_status_str(wc[i].status), wc[i].status, (int) wc[i].wr_id); return 1; } diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h index a51bb9d..a04cc62 100644 --- a/include/infiniband/verbs.h +++ b/include/infiniband/verbs.h @@ -238,6 +238,7 @@ enum ibv_wc_status { IBV_WC_RESP_TIMEOUT_ERR, IBV_WC_GENERAL_ERR }; +const char *ibv_wc_status_str(enum ibv_wc_status status); enum ibv_wc_opcode { IBV_WC_SEND, @@ -1077,6 +1078,21 @@ int ibv_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid); */ int ibv_fork_init(void); +/** + * ibv_node_type_str - Return string describing node_type enum value + */ +const char *ibv_node_type_str(enum ibv_node_type node_type); + +/** + * ibv_port_state_str - Return string describing port_state enum value + */ +const char *ibv_port_state_str(enum ibv_port_state port_state); + +/** + * ibv_event_type_str - Return string describing event_type enum value + */ +const char *ibv_event_type_str(enum ibv_event_type event); + END_C_DECLS # undef __attribute_const diff --git a/man/ibv_event_type_str.3 b/man/ibv_event_type_str.3 new file mode 100644 index 0000000..0df8fcd --- /dev/null +++ b/man/ibv_event_type_str.3 @@ -0,0 +1,40 @@ +.\" -*- nroff -*- +.\" +.TH IBV_EVENT_TYPE_STR 3 2006-10-31 libibverbs "Libibverbs Programmer's Manual" +.SH "NAME" +.nf +ibv_event_type_str \- Return string describing event_type enum value +.nl +ibv_node_type_str \- Return string describing node_type enum value +.nl +ibv_port_state_str \- Return string describing port_state enum value +.SH "SYNOPSIS" +.nf +.B #include +.sp +.BI "const char *ibv_event_type_str(enum ibv_event_type " "event_type"); +.nl +.BI "const char *ibv_node_type_str(enum ibv_node_type " "node_type"); +.nl +.BI "const char *ibv_port_state_str(enum ibv_port_state " "port_state"); +.fi +.SH "DESCRIPTION" +.B ibv_node_type_str() +returns a string describing the node type enum value +.IR node_type . +.PP +.B ibv_port_state_str() +returns a string describing the port state enum value +.IR port_state . +.PP +.B ibv_event_type_str() +returns a string describing the event type enum value +.IR event_type . +.SH "RETURN VALUE" +These functions return a constant string that describes the enum value +passed as their argument. +.SH "AUTHOR" +.TP +Roland Dreier +.RI < rolandd at cisco.com > + diff --git a/src/enum_strs.c b/src/enum_strs.c new file mode 100644 index 0000000..c57feaa --- /dev/null +++ b/src/enum_strs.c @@ -0,0 +1,127 @@ +/* + * Copyright (c) 2008 Lawrence Livermore National Laboratory + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include + +const char *ibv_node_type_str(enum ibv_node_type node_type) +{ + static const char *const node_type_str[] = { + [IBV_NODE_CA] = "InfiniBand channel adapter", + [IBV_NODE_SWITCH] = "InfiniBand switch", + [IBV_NODE_ROUTER] = "InfiniBand router", + [IBV_NODE_RNIC] = "iWARP NIC" + }; + + if (node_type < IBV_NODE_CA || node_type > IBV_NODE_RNIC) + return "unknown"; + + return node_type_str[node_type]; +} + +const char *ibv_port_state_str(enum ibv_port_state port_state) +{ + static const char *const port_state_str[] = { + [IBV_PORT_NOP] = "no state change (NOP)", + [IBV_PORT_DOWN] = "down", + [IBV_PORT_INIT] = "init", + [IBV_PORT_ARMED] = "armed", + [IBV_PORT_ACTIVE] = "active", + [IBV_PORT_ACTIVE_DEFER] = "active defer" + }; + + if (port_state < IBV_PORT_NOP || port_state > IBV_PORT_ACTIVE_DEFER) + return "unknown"; + + return port_state_str[port_state]; +} + +const char *ibv_event_type_str(enum ibv_event_type event) +{ + static const char *const event_type_str[] = { + [IBV_EVENT_CQ_ERR] = "CQ error", + [IBV_EVENT_QP_FATAL] = "local work queue catastrophic error", + [IBV_EVENT_QP_REQ_ERR] = "invalid request local work queue error", + [IBV_EVENT_QP_ACCESS_ERR] = "local access violation work queue error", + [IBV_EVENT_COMM_EST] = "communication established", + [IBV_EVENT_SQ_DRAINED] = "send queue drained", + [IBV_EVENT_PATH_MIG] = "path migrated", + [IBV_EVENT_PATH_MIG_ERR] = "path migration request error", + [IBV_EVENT_DEVICE_FATAL] = "local catastrophic error", + [IBV_EVENT_PORT_ACTIVE] = "port active", + [IBV_EVENT_PORT_ERR] = "port error", + [IBV_EVENT_LID_CHANGE] = "LID change", + [IBV_EVENT_PKEY_CHANGE] = "P_Key change", + [IBV_EVENT_SM_CHANGE] = "SM change", + [IBV_EVENT_SRQ_ERR] = "SRQ catastrophic error", + [IBV_EVENT_SRQ_LIMIT_REACHED] = "SRQ limit reached", + [IBV_EVENT_QP_LAST_WQE_REACHED] = "last WQE reached", + [IBV_EVENT_CLIENT_REREGISTER] = "client reregistration", + }; + + if (event < IBV_EVENT_CQ_ERR || event > IBV_EVENT_CLIENT_REREGISTER) + return "unknown"; + + return event_type_str[event]; +} + +const char *ibv_wc_status_str(enum ibv_wc_status status) +{ + static const char *const wc_status_str[] = { + [IBV_WC_SUCCESS] = "success", + [IBV_WC_LOC_LEN_ERR] = "local length error", + [IBV_WC_LOC_QP_OP_ERR] = "local QP operation error", + [IBV_WC_LOC_EEC_OP_ERR] = "local EE context operation error", + [IBV_WC_LOC_PROT_ERR] = "local protection error", + [IBV_WC_WR_FLUSH_ERR] = "Work Request Flushed Error", + [IBV_WC_MW_BIND_ERR] = "memory management operation error", + [IBV_WC_BAD_RESP_ERR] = "bad response error", + [IBV_WC_LOC_ACCESS_ERR] = "local access error", + [IBV_WC_REM_INV_REQ_ERR] = "remote invalid request error", + [IBV_WC_REM_ACCESS_ERR] = "remote access error", + [IBV_WC_REM_OP_ERR] = "remote operation error", + [IBV_WC_RETRY_EXC_ERR] = "transport retry counter exceeded", + [IBV_WC_RNR_RETRY_EXC_ERR] = "RNR retry counter exceeded", + [IBV_WC_LOC_RDD_VIOL_ERR] = "local RDD violation error", + [IBV_WC_REM_INV_RD_REQ_ERR] = "remote invalid RD request", + [IBV_WC_REM_ABORT_ERR] = "aborted error", + [IBV_WC_INV_EECN_ERR] = "invalid EE context number", + [IBV_WC_INV_EEC_STATE_ERR] = "invalid EE context state", + [IBV_WC_FATAL_ERR] = "fatal error", + [IBV_WC_RESP_TIMEOUT_ERR] = "response timeout error", + [IBV_WC_GENERAL_ERR] = "general error" + }; + + if (status < IBV_WC_SUCCESS || status > IBV_WC_GENERAL_ERR) + return "unknown"; + + return wc_status_str[status]; +} diff --git a/src/libibverbs.map b/src/libibverbs.map index 3a346ed..1827da0 100644 --- a/src/libibverbs.map +++ b/src/libibverbs.map @@ -91,4 +91,9 @@ IBVERBS_1.1 { ibv_dontfork_range; ibv_dofork_range; ibv_register_driver; + + ibv_node_type_str; + ibv_port_state_str; + ibv_event_type_str; + ibv_wc_status_str; } IBVERBS_1.0; From weiny2 at llnl.gov Fri Apr 18 15:41:30 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Fri, 18 Apr 2008 15:41:30 -0700 Subject: [ofa-general] Re: [PATCH v2] Add enum strings and *_str functions for enums In-Reply-To: References: <20080415094750.35afc0e5.weiny2@llnl.gov> <20080415133548.414aeaea.weiny2@llnl.gov> Message-ID: <20080418154130.33a8917b.weiny2@llnl.gov> Thanks, Ira On Fri, 18 Apr 2008 14:54:59 -0700 Roland Dreier wrote: > Thanks, I added a man page and changed things a little and committed the > following: > > commit 1c0b7ac0a6bbbe4d246ef4cf50ae31bde4929ba3 > Author: Ira Weiny > Date: Tue Apr 15 13:35:48 2008 -0700 > > Add functions to convert enum values to strings > > Add ibv_xxx_str() functions to convert node type, port state, event > type and wc status enum values to strings. > > Signed-off-by: Ira K. Weiny > Signed-off-by: Roland Dreier > > diff --git a/Makefile.am b/Makefile.am > index 705b184..9b05306 100644 > --- a/Makefile.am > +++ b/Makefile.am > @@ -9,7 +9,8 @@ src_libibverbs_la_CFLAGS = $(AM_CFLAGS) -DIBV_CONFIG_DIR=\"$(sysconfdir)/libibve > libibverbs_version_script = @LIBIBVERBS_VERSION_SCRIPT@ > > src_libibverbs_la_SOURCES = src/cmd.c src/compat-1_0.c src/device.c src/init.c \ > - src/marshall.c src/memory.c src/sysfs.c src/verbs.c > + src/marshall.c src/memory.c src/sysfs.c src/verbs.c \ > + src/enum_strs.c > src_libibverbs_la_LDFLAGS = -version-info 1 -export-dynamic \ > $(libibverbs_version_script) > src_libibverbs_la_DEPENDENCIES = $(srcdir)/src/libibverbs.map > @@ -38,20 +39,20 @@ libibverbsinclude_HEADERS = include/infiniband/arch.h include/infiniband/driver. > include/infiniband/kern-abi.h include/infiniband/opcode.h include/infiniband/verbs.h \ > include/infiniband/sa-kern-abi.h include/infiniband/sa.h include/infiniband/marshall.h > > -man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 man/ibv_devinfo.1 \ > - man/ibv_rc_pingpong.1 man/ibv_uc_pingpong.1 man/ibv_ud_pingpong.1 \ > - man/ibv_srq_pingpong.1 \ > - man/ibv_alloc_pd.3 man/ibv_attach_mcast.3 man/ibv_create_ah.3 \ > - man/ibv_create_ah_from_wc.3 man/ibv_create_comp_channel.3 \ > - man/ibv_create_cq.3 man/ibv_create_qp.3 man/ibv_create_srq.3 \ > - man/ibv_fork_init.3 man/ibv_get_async_event.3 \ > - man/ibv_get_cq_event.3 man/ibv_get_device_guid.3 \ > - man/ibv_get_device_list.3 man/ibv_get_device_name.3 \ > - man/ibv_modify_qp.3 man/ibv_modify_srq.3 man/ibv_open_device.3 \ > - man/ibv_poll_cq.3 man/ibv_post_recv.3 man/ibv_post_send.3 \ > - man/ibv_post_srq_recv.3 man/ibv_query_device.3 man/ibv_query_gid.3 \ > - man/ibv_query_pkey.3 man/ibv_query_port.3 man/ibv_query_qp.3 \ > - man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3 \ > +man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 man/ibv_devinfo.1 \ > + man/ibv_rc_pingpong.1 man/ibv_uc_pingpong.1 man/ibv_ud_pingpong.1 \ > + man/ibv_srq_pingpong.1 man/ibv_alloc_pd.3 man/ibv_attach_mcast.3 \ > + man/ibv_create_ah.3 man/ibv_create_ah_from_wc.3 \ > + man/ibv_create_comp_channel.3 man/ibv_create_cq.3 \ > + man/ibv_create_qp.3 man/ibv_create_srq.3 man/ibv_event_type_str.3 \ > + man/ibv_fork_init.3 man/ibv_get_async_event.3 \ > + man/ibv_get_cq_event.3 man/ibv_get_device_guid.3 \ > + man/ibv_get_device_list.3 man/ibv_get_device_name.3 \ > + man/ibv_modify_qp.3 man/ibv_modify_srq.3 man/ibv_open_device.3 \ > + man/ibv_poll_cq.3 man/ibv_post_recv.3 man/ibv_post_send.3 \ > + man/ibv_post_srq_recv.3 man/ibv_query_device.3 man/ibv_query_gid.3 \ > + man/ibv_query_pkey.3 man/ibv_query_port.3 man/ibv_query_qp.3 \ > + man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3 \ > man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 > > DEBIAN = debian/changelog debian/compat debian/control debian/copyright \ > @@ -84,6 +85,8 @@ install-data-hook: > $(RM) ibv_free_device_list.3 && \ > $(RM) ibv_init_ah_from_wc.3 && \ > $(RM) mult_to_ibv_rate.3 && \ > + $(RM) ibv_node_type_str.3 && \ > + $(RM) ibv_port_state_str.3 && \ > $(LN_S) ibv_get_async_event.3 ibv_ack_async_event.3 && \ > $(LN_S) ibv_get_cq_event.3 ibv_ack_cq_events.3 && \ > $(LN_S) ibv_open_device.3 ibv_close_device.3 && \ > @@ -97,5 +100,6 @@ install-data-hook: > $(LN_S) ibv_attach_mcast.3 ibv_detach_mcast.3 && \ > $(LN_S) ibv_get_device_list.3 ibv_free_device_list.3 && \ > $(LN_S) ibv_create_ah_from_wc.3 ibv_init_ah_from_wc.3 && \ > - $(LN_S) ibv_rate_to_mult.3 mult_to_ibv_rate.3 > - > + $(LN_S) ibv_rate_to_mult.3 mult_to_ibv_rate.3 && \ > + $(LN_S) ibv_event_type_str.3 ibv_node_type_str.3 && \ > + $(LN_S) ibv_event_type_str.3 ibv_port_state_str.3 > diff --git a/examples/devinfo.c b/examples/devinfo.c > index 4e4316a..1fadc80 100644 > --- a/examples/devinfo.c > +++ b/examples/devinfo.c > @@ -67,17 +67,6 @@ static const char *guid_str(uint64_t node_guid, char *str) > return str; > } > > -static const char *port_state_str(enum ibv_port_state pstate) > -{ > - switch (pstate) { > - case IBV_PORT_DOWN: return "PORT_DOWN"; > - case IBV_PORT_INIT: return "PORT_INIT"; > - case IBV_PORT_ARMED: return "PORT_ARMED"; > - case IBV_PORT_ACTIVE: return "PORT_ACTIVE"; > - default: return "invalid state"; > - } > -} > - > static const char *port_phy_state_str(uint8_t phys_state) > { > switch (phys_state) { > @@ -266,7 +255,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port) > } > printf("\t\tport:\t%d\n", port); > printf("\t\t\tstate:\t\t\t%s (%d)\n", > - port_state_str(port_attr.state), port_attr.state); > + ibv_port_state_str(port_attr.state), port_attr.state); > printf("\t\t\tmax_mtu:\t\t%s (%d)\n", > mtu_str(port_attr.max_mtu), port_attr.max_mtu); > printf("\t\t\tactive_mtu:\t\t%s (%d)\n", > diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c > index 7181914..26fa45c 100644 > --- a/examples/rc_pingpong.c > +++ b/examples/rc_pingpong.c > @@ -709,7 +709,8 @@ int main(int argc, char *argv[]) > > for (i = 0; i < ne; ++i) { > if (wc[i].status != IBV_WC_SUCCESS) { > - fprintf(stderr, "Failed status %d for wr_id %d\n", > + fprintf(stderr, "Failed status %s (%d) for wr_id %d\n", > + ibv_wc_status_str(wc[i].status), > wc[i].status, (int) wc[i].wr_id); > return 1; > } > diff --git a/examples/srq_pingpong.c b/examples/srq_pingpong.c > index bc869c9..95bebf4 100644 > --- a/examples/srq_pingpong.c > +++ b/examples/srq_pingpong.c > @@ -805,7 +805,8 @@ int main(int argc, char *argv[]) > > for (i = 0; i < ne; ++i) { > if (wc[i].status != IBV_WC_SUCCESS) { > - fprintf(stderr, "Failed status %d for wr_id %d\n", > + fprintf(stderr, "Failed status %s (%d) for wr_id %d\n", > + ibv_wc_status_str(wc[i].status), > wc[i].status, (int) wc[i].wr_id); > return 1; > } > diff --git a/examples/uc_pingpong.c b/examples/uc_pingpong.c > index 6135030..c09c8c1 100644 > --- a/examples/uc_pingpong.c > +++ b/examples/uc_pingpong.c > @@ -697,7 +697,8 @@ int main(int argc, char *argv[]) > > for (i = 0; i < ne; ++i) { > if (wc[i].status != IBV_WC_SUCCESS) { > - fprintf(stderr, "Failed status %d for wr_id %d\n", > + fprintf(stderr, "Failed status %s (%d) for wr_id %d\n", > + ibv_wc_status_str(wc[i].status), > wc[i].status, (int) wc[i].wr_id); > return 1; > } > diff --git a/examples/ud_pingpong.c b/examples/ud_pingpong.c > index aaee26c..8f3d50b 100644 > --- a/examples/ud_pingpong.c > +++ b/examples/ud_pingpong.c > @@ -697,7 +697,8 @@ int main(int argc, char *argv[]) > > for (i = 0; i < ne; ++i) { > if (wc[i].status != IBV_WC_SUCCESS) { > - fprintf(stderr, "Failed status %d for wr_id %d\n", > + fprintf(stderr, "Failed status %s (%d) for wr_id %d\n", > + ibv_wc_status_str(wc[i].status), > wc[i].status, (int) wc[i].wr_id); > return 1; > } > diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h > index a51bb9d..a04cc62 100644 > --- a/include/infiniband/verbs.h > +++ b/include/infiniband/verbs.h > @@ -238,6 +238,7 @@ enum ibv_wc_status { > IBV_WC_RESP_TIMEOUT_ERR, > IBV_WC_GENERAL_ERR > }; > +const char *ibv_wc_status_str(enum ibv_wc_status status); > > enum ibv_wc_opcode { > IBV_WC_SEND, > @@ -1077,6 +1078,21 @@ int ibv_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid); > */ > int ibv_fork_init(void); > > +/** > + * ibv_node_type_str - Return string describing node_type enum value > + */ > +const char *ibv_node_type_str(enum ibv_node_type node_type); > + > +/** > + * ibv_port_state_str - Return string describing port_state enum value > + */ > +const char *ibv_port_state_str(enum ibv_port_state port_state); > + > +/** > + * ibv_event_type_str - Return string describing event_type enum value > + */ > +const char *ibv_event_type_str(enum ibv_event_type event); > + > END_C_DECLS > > # undef __attribute_const > diff --git a/man/ibv_event_type_str.3 b/man/ibv_event_type_str.3 > new file mode 100644 > index 0000000..0df8fcd > --- /dev/null > +++ b/man/ibv_event_type_str.3 > @@ -0,0 +1,40 @@ > +.\" -*- nroff -*- > +.\" > +.TH IBV_EVENT_TYPE_STR 3 2006-10-31 libibverbs "Libibverbs Programmer's Manual" > +.SH "NAME" > +.nf > +ibv_event_type_str \- Return string describing event_type enum value > +.nl > +ibv_node_type_str \- Return string describing node_type enum value > +.nl > +ibv_port_state_str \- Return string describing port_state enum value > +.SH "SYNOPSIS" > +.nf > +.B #include > +.sp > +.BI "const char *ibv_event_type_str(enum ibv_event_type " "event_type"); > +.nl > +.BI "const char *ibv_node_type_str(enum ibv_node_type " "node_type"); > +.nl > +.BI "const char *ibv_port_state_str(enum ibv_port_state " "port_state"); > +.fi > +.SH "DESCRIPTION" > +.B ibv_node_type_str() > +returns a string describing the node type enum value > +.IR node_type . > +.PP > +.B ibv_port_state_str() > +returns a string describing the port state enum value > +.IR port_state . > +.PP > +.B ibv_event_type_str() > +returns a string describing the event type enum value > +.IR event_type . > +.SH "RETURN VALUE" > +These functions return a constant string that describes the enum value > +passed as their argument. > +.SH "AUTHOR" > +.TP > +Roland Dreier > +.RI < rolandd at cisco.com > > + > diff --git a/src/enum_strs.c b/src/enum_strs.c > new file mode 100644 > index 0000000..c57feaa > --- /dev/null > +++ b/src/enum_strs.c > @@ -0,0 +1,127 @@ > +/* > + * Copyright (c) 2008 Lawrence Livermore National Laboratory > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + */ > + > +#include > + > +const char *ibv_node_type_str(enum ibv_node_type node_type) > +{ > + static const char *const node_type_str[] = { > + [IBV_NODE_CA] = "InfiniBand channel adapter", > + [IBV_NODE_SWITCH] = "InfiniBand switch", > + [IBV_NODE_ROUTER] = "InfiniBand router", > + [IBV_NODE_RNIC] = "iWARP NIC" > + }; > + > + if (node_type < IBV_NODE_CA || node_type > IBV_NODE_RNIC) > + return "unknown"; > + > + return node_type_str[node_type]; > +} > + > +const char *ibv_port_state_str(enum ibv_port_state port_state) > +{ > + static const char *const port_state_str[] = { > + [IBV_PORT_NOP] = "no state change (NOP)", > + [IBV_PORT_DOWN] = "down", > + [IBV_PORT_INIT] = "init", > + [IBV_PORT_ARMED] = "armed", > + [IBV_PORT_ACTIVE] = "active", > + [IBV_PORT_ACTIVE_DEFER] = "active defer" > + }; > + > + if (port_state < IBV_PORT_NOP || port_state > IBV_PORT_ACTIVE_DEFER) > + return "unknown"; > + > + return port_state_str[port_state]; > +} > + > +const char *ibv_event_type_str(enum ibv_event_type event) > +{ > + static const char *const event_type_str[] = { > + [IBV_EVENT_CQ_ERR] = "CQ error", > + [IBV_EVENT_QP_FATAL] = "local work queue catastrophic error", > + [IBV_EVENT_QP_REQ_ERR] = "invalid request local work queue error", > + [IBV_EVENT_QP_ACCESS_ERR] = "local access violation work queue error", > + [IBV_EVENT_COMM_EST] = "communication established", > + [IBV_EVENT_SQ_DRAINED] = "send queue drained", > + [IBV_EVENT_PATH_MIG] = "path migrated", > + [IBV_EVENT_PATH_MIG_ERR] = "path migration request error", > + [IBV_EVENT_DEVICE_FATAL] = "local catastrophic error", > + [IBV_EVENT_PORT_ACTIVE] = "port active", > + [IBV_EVENT_PORT_ERR] = "port error", > + [IBV_EVENT_LID_CHANGE] = "LID change", > + [IBV_EVENT_PKEY_CHANGE] = "P_Key change", > + [IBV_EVENT_SM_CHANGE] = "SM change", > + [IBV_EVENT_SRQ_ERR] = "SRQ catastrophic error", > + [IBV_EVENT_SRQ_LIMIT_REACHED] = "SRQ limit reached", > + [IBV_EVENT_QP_LAST_WQE_REACHED] = "last WQE reached", > + [IBV_EVENT_CLIENT_REREGISTER] = "client reregistration", > + }; > + > + if (event < IBV_EVENT_CQ_ERR || event > IBV_EVENT_CLIENT_REREGISTER) > + return "unknown"; > + > + return event_type_str[event]; > +} > + > +const char *ibv_wc_status_str(enum ibv_wc_status status) > +{ > + static const char *const wc_status_str[] = { > + [IBV_WC_SUCCESS] = "success", > + [IBV_WC_LOC_LEN_ERR] = "local length error", > + [IBV_WC_LOC_QP_OP_ERR] = "local QP operation error", > + [IBV_WC_LOC_EEC_OP_ERR] = "local EE context operation error", > + [IBV_WC_LOC_PROT_ERR] = "local protection error", > + [IBV_WC_WR_FLUSH_ERR] = "Work Request Flushed Error", > + [IBV_WC_MW_BIND_ERR] = "memory management operation error", > + [IBV_WC_BAD_RESP_ERR] = "bad response error", > + [IBV_WC_LOC_ACCESS_ERR] = "local access error", > + [IBV_WC_REM_INV_REQ_ERR] = "remote invalid request error", > + [IBV_WC_REM_ACCESS_ERR] = "remote access error", > + [IBV_WC_REM_OP_ERR] = "remote operation error", > + [IBV_WC_RETRY_EXC_ERR] = "transport retry counter exceeded", > + [IBV_WC_RNR_RETRY_EXC_ERR] = "RNR retry counter exceeded", > + [IBV_WC_LOC_RDD_VIOL_ERR] = "local RDD violation error", > + [IBV_WC_REM_INV_RD_REQ_ERR] = "remote invalid RD request", > + [IBV_WC_REM_ABORT_ERR] = "aborted error", > + [IBV_WC_INV_EECN_ERR] = "invalid EE context number", > + [IBV_WC_INV_EEC_STATE_ERR] = "invalid EE context state", > + [IBV_WC_FATAL_ERR] = "fatal error", > + [IBV_WC_RESP_TIMEOUT_ERR] = "response timeout error", > + [IBV_WC_GENERAL_ERR] = "general error" > + }; > + > + if (status < IBV_WC_SUCCESS || status > IBV_WC_GENERAL_ERR) > + return "unknown"; > + > + return wc_status_str[status]; > +} > diff --git a/src/libibverbs.map b/src/libibverbs.map > index 3a346ed..1827da0 100644 > --- a/src/libibverbs.map > +++ b/src/libibverbs.map > @@ -91,4 +91,9 @@ IBVERBS_1.1 { > ibv_dontfork_range; > ibv_dofork_range; > ibv_register_driver; > + > + ibv_node_type_str; > + ibv_port_state_str; > + ibv_event_type_str; > + ibv_wc_status_str; > } IBVERBS_1.0; From rdreier at cisco.com Fri Apr 18 16:04:50 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 18 Apr 2008 16:04:50 -0700 Subject: [ofa-general] [ANNOUNCE] libibverbs-1.1.2 is released Message-ID: libibverbs is a library that allows programs to use RDMA "verbs" for direct access to RDMA (currently InfiniBand and iWARP) hardware from userspace. The new stable release, 1.1.2, is available from http://www.openfabrics.org//downloads/verbs/libibverbs-1.1.2.tar.gz with sha1sum 7d35b9a0ee45b2ec2e9da5c50565197155a94b5c libibverbs-1.1.2.tar.gz I also pushed the latest tree and tag out to kernel.org: git://git.kernel.org/pub/scm/libs/infiniband/libibverbs.git (the name of the tag is libibverbs-1.1.2). This release has various small fixes, including a lot of improvements to the Valgrind annotations, and also adds ibv_xxx_str() functions for printing enum values. The git shortlog since libibverbs 1.1.1 is: Dotan Barak (5): Initialize reserved attributes in modify QP command Fix several valgrind false positives Fix some issues in the examples Fixes for man pages Add command line parameter to set SL for pingpong examples Ira Weiny (1): Add functions to convert enum values to strings Or Gerlitz (1): Document IBV_SEND_INLINE buffer ownership Roland Dreier (16): Remove deprecated ${Source-Version} from debian/control Add to Clean up NVALGRIND comment in config.h.in Fix Valgrind annotations so they can actually be built Fix too-big madvise() call in ibv_madvise_range() Fix spec file License: tag Always return valid bad_wr on error from ibv_post_{send,recv,srq_recv} Update Debian policy version to 3.7.3 Use real Homepage: tag instead of pseudo-header inside description Convert hyphen to minus sign in ibv_query_pkey man page Put correct version information in Debian shlibs Add debian/watch file Fix download directory in RPM spec file Update various text to talk about general RDMA, not just InfiniBand Correct typo ibv_mult_to_rate -> mult_to_ibv_rate in man page Add RPM dependency on base package to -devel package Roll libibverbs 1.1.2 release Troy Benjegerdes (1): Fix valgrind false positive in ibv_create_comp_channel() swelch at systemfabricworks.com (1): Set ibv_device->node_type when allocating device From estebanaf85 at wortmann.de Fri Apr 18 17:41:52 2008 From: estebanaf85 at wortmann.de (Mason Robbins) Date: Sat, 19 Apr 2008 09:41:52 +0900 Subject: [ofa-general] rep!ic@ watches :: rolex:: fake //atches :: Message-ID: <01c8a201$98a9f000$6ae087cb@estebanaf85> KI kji NG RE ofj PLI bk CA There is no limit to the number of pro puy du ds cts you may order. We offer a vast selection of D vzr es lvf ign obx er W spo atc mvw hes in excess of 1000 different mo fpu dels. We are adding new items weekly. So be sure to bookmark our si os te and vi lx sit us regularly. Now, with just a few sim bxv ple clicks, you can have that wa rxg tch you�ve always wanted! Don�t spend all your savings on that timepiece you�ve always lo vo ng kcu ed for. B yu uy a r ss ep aeb li ic ca at on ii ly a fr kko act km ion of the pri jqx ce! Go to that co xtt ckt ook ail pa xaz rty with this, and be sure to ca ft tc im h people�s att rpr ent xh ion. You�ll have all the class, and still have all your mo ts ne jsk y. I advise you vi qas sit the si ow te! -------------- next part -------------- An HTML attachment was scrubbed... URL: From mingo at elte.hu Sat Apr 19 01:16:14 2008 From: mingo at elte.hu (Ingo Molnar) Date: Sat, 19 Apr 2008 10:16:14 +0200 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: References: Message-ID: <20080419081614.GA2437@elte.hu> * Roland Dreier wrote: > IB/ipath: Misc changes to prepare for IB7220 introduction > IB/ipath: add calls to new 7220 code and enable in build x86.git auto-testing found that these changes broke the -git build, with this config: http://redhat.com/~mingo/misc/config-Sat_Apr_19_09_55_05_CEST_2008.bad the failure is a link failure: drivers/built-in.o: In function `ipath_init_one': ipath_driver.c:(.devinit.text+0x1e5bc): undefined reference to `ipath_init_iba7220_funcs' disabling CONFIG_INFINIBAND_IPATH=y works this around. Ingo From rdreier at cisco.com Sat Apr 19 07:11:20 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sat, 19 Apr 2008 07:11:20 -0700 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: <20080419081614.GA2437@elte.hu> (Ingo Molnar's message of "Sat, 19 Apr 2008 10:16:14 +0200") References: <20080419081614.GA2437@elte.hu> Message-ID: > x86.git auto-testing found that these changes broke the -git build, with > this config: > > http://redhat.com/~mingo/misc/config-Sat_Apr_19_09_55_05_CEST_2008.bad > > the failure is a link failure: > > drivers/built-in.o: In function `ipath_init_one': > ipath_driver.c:(.devinit.text+0x1e5bc): undefined reference to `ipath_init_iba7220_funcs' Thanks. The relevant parts of the config are # CONFIG_PCI_MSI is not set CONFIG_HT_IRQ=y CONFIG_INFINIBAND_IPATH=y The problem is that the iba7220 files don't get built in that case, but the main driver file tries to call ipath_init_iba7220 anyway. This is fixed by the patch below, which makes the iba7220 file build unconditionally. I also removed the dependency on HT_IRQ || PCI_MSI in the Kconfig, since the iba7220 support should work without it. I know we discussed this before, but looking closer at the code, the dependency seems pointless to me, since it's still possible to build a driver that doesn't work if a particular system needs, say HT_IRQ, and the user selects PCI_MSI. And since iba7220 doesn't need either, we might as well let people build that. If this is OK with everyone, I will merge this with a proper changelog. - R. diff --git a/drivers/infiniband/hw/ipath/Kconfig b/drivers/infiniband/hw/ipath/Kconfig index 044da58..3c7968f 100644 --- a/drivers/infiniband/hw/ipath/Kconfig +++ b/drivers/infiniband/hw/ipath/Kconfig @@ -1,6 +1,6 @@ config INFINIBAND_IPATH tristate "QLogic InfiniPath Driver" - depends on (PCI_MSI || HT_IRQ) && 64BIT && NET + depends on 64BIT && NET ---help--- This is a driver for QLogic InfiniPath host channel adapters, including InfiniBand verbs support. This driver allows these diff --git a/drivers/infiniband/hw/ipath/Makefile b/drivers/infiniband/hw/ipath/Makefile index 75a6c91..bf94500 100644 --- a/drivers/infiniband/hw/ipath/Makefile +++ b/drivers/infiniband/hw/ipath/Makefile @@ -29,11 +29,13 @@ ib_ipath-y := \ ipath_user_pages.o \ ipath_user_sdma.o \ ipath_verbs_mcast.o \ - ipath_verbs.o + ipath_verbs.o \ + ipath_iba7220.o \ + ipath_sd7220.o \ + ipath_sd7220_img.o ib_ipath-$(CONFIG_HT_IRQ) += ipath_iba6110.o ib_ipath-$(CONFIG_PCI_MSI) += ipath_iba6120.o -ib_ipath-$(CONFIG_PCI_MSI) += ipath_iba7220.o ipath_sd7220.o ipath_sd7220_img.o ib_ipath-$(CONFIG_X86_64) += ipath_wc_x86_64.o ib_ipath-$(CONFIG_PPC64) += ipath_wc_ppc64.o From rdreier at cisco.com Sat Apr 19 07:18:24 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sat, 19 Apr 2008 07:18:24 -0700 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: (Roland Dreier's message of "Sat, 19 Apr 2008 07:11:20 -0700") References: <20080419081614.GA2437@elte.hu> Message-ID: By the way (only peripherally related), it seems all the #ifdef CONFIG_PCI_MSI tests in ipath_iba7220.c can be removed, since the code should work fine even if PCI_MSI is not set... - R. From dave.olson at qlogic.com Sat Apr 19 08:20:49 2008 From: dave.olson at qlogic.com (Dave Olson) Date: Sat, 19 Apr 2008 08:20:49 -0700 (PDT) Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: References: <20080419081614.GA2437@elte.hu> Message-ID: On Sat, 19 Apr 2008, Roland Dreier wrote: | > drivers/built-in.o: In function `ipath_init_one': | > ipath_driver.c:(.devinit.text+0x1e5bc): undefined reference to `ipath_init_iba7220_funcs' Yes, that issue should be fixed. Our preference was to not build if it wouldn't work. We'd have to add the conditional check at the function setup routines. | I also removed the dependency on HT_IRQ || PCI_MSI in the Kconfig, since | the iba7220 support should work without it. I know we discussed this | before, but looking closer at the code, the dependency seems pointless | to me, since it's still possible to build a driver that doesn't work if | a particular system needs, say HT_IRQ, and the user selects PCI_MSI. | And since iba7220 doesn't need either, we might as well let people build | that. | | If this is OK with everyone, I will merge this with a proper changelog. At this point, I guess I'd agree. We've added checks for "no interrupt" after the driver is loaded, so I guess that covers the issue well enough. Dave Olson dave.olson at qlogic.com From rdreier at cisco.com Sat Apr 19 09:12:06 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sat, 19 Apr 2008 09:12:06 -0700 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: (Dave Olson's message of "Sat, 19 Apr 2008 08:20:49 -0700 (PDT)") References: <20080419081614.GA2437@elte.hu> Message-ID: > | > drivers/built-in.o: In function `ipath_init_one': > | > ipath_driver.c:(.devinit.text+0x1e5bc): undefined reference to `ipath_init_iba7220_funcs' > > Yes, that issue should be fixed. Our preference was to not build > if it wouldn't work. We'd have to add the conditional check at > the function setup routines. Not sure I really follow this response... ipath_driver.c has case PCI_DEVICE_ID_INFINIPATH_7220: #ifndef CONFIG_PCI_MSI ipath_dbg("CONFIG_PCI_MSI is not enabled, " "using IntX for unit %u\n", dd->ipath_unit); #endif ipath_init_iba7220_funcs(dd); break; so clearly ipath_init_iba7220_funcs() was intended to be built and used even if CONFIG_PCI_MSI was not defined. From the code it looks like all should work fine if PCI_MSI is not set, so I don't know what you mean about conditional checks. (BTW since I'm looking at this code, "IntX" should probably be capitalized as "INTx" to match what the PCI specs say) - R. From sushilaexvn at hyperbase.com Sun Apr 20 01:41:57 2008 From: sushilaexvn at hyperbase.com (EuroSoftware) Date: Sun, 20 Apr 2008 10:41:57 +0200 Subject: [ofa-general] Ihnen werden unsere Softwarepreise gefallen Message-ID: <01c8a2d3$27d30880$9b41d029@sushilaexvn> Sie bezahlen die Software und laden es sofort runter! Wir haben Sie alle - Programme fuer PC unc MAC, in allen europaeischen Sprachen! Wir verkaufen nur originale Vollversionen, aber ganz guenstig!Unser kompetentes Team wird Ihnen bei der Istallation helfen, falls Sie es brauchen. Wir bieten Geld-Zurueck-Garantie und rasche Antworten vom Support!Sie werden mit den besten Softwaren beliefert http://bargadurit.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Gunnner-3796344 at PAGEPATH.COM Sun Apr 20 04:14:27 2008 From: Gunnner-3796344 at PAGEPATH.COM (Gunnner) Date: Sun, 20 Apr 2008 13:14:27 +0200 Subject: [ofa-general] Get big and strong Message-ID: Get with Mary-Kate and Ashley Olsen and last much longer http://www.feaihj.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.olson at qlogic.com Sun Apr 20 07:47:56 2008 From: dave.olson at qlogic.com (Dave Olson) Date: Sun, 20 Apr 2008 07:47:56 -0700 (PDT) Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: References: <20080419081614.GA2437@elte.hu> Message-ID: On Sat, 19 Apr 2008, Roland Dreier wrote: | Not sure I really follow this response... ipath_driver.c has | | case PCI_DEVICE_ID_INFINIPATH_7220: | #ifndef CONFIG_PCI_MSI | ipath_dbg("CONFIG_PCI_MSI is not enabled, " | "using IntX for unit %u\n", dd->ipath_unit); | #endif | ipath_init_iba7220_funcs(dd); | break; | | so clearly ipath_init_iba7220_funcs() was intended to be built and used | even if CONFIG_PCI_MSI was not defined. From the code it looks like all | should work fine if PCI_MSI is not set, so I don't know what you mean | about conditional checks. Actually, it wasn't. It was a late cleanup for another problem, and we didn't worry about the other issue, and should have. | (BTW since I'm looking at this code, "IntX" should probably be | capitalized as "INTx" to match what the PCI specs say) True. Dave Olson dave.olson at qlogic.com From mashirle at us.ibm.com Sun Apr 20 01:52:31 2008 From: mashirle at us.ibm.com (Shirley Ma) Date: Sun, 20 Apr 2008 01:52:31 -0700 Subject: [ofa-general] [PATCH] IPoIB 4K MTU support Message-ID: <1208681551.5271.11.camel@localhost.localdomain> Hello Roland, I recreated IPoIB 4K MTU patch. Below patch is built against 2.6.25 kernel for 2.6.26 kernel submission. Please review and integrate it. Please let me if any problem. Thanks Shirley This patch enables IPoIB 4K MTU support by using two S/G buffers when PAGE_SIZE is less than or equal to HCA IB MTU size. The first buffer is for IPoIB header + GRH header. The second buffer is IPoIB payload, which is 4K-4. Signed-off-by: Shirley Ma --- drivers/infiniband/ulp/ipoib/ipoib.h | 50 +++++++++++++- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 86 +++++++++++++---------- drivers/infiniband/ulp/ipoib/ipoib_main.c | 19 ++++-- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 3 +- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 15 ++++- drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 1 + 6 files changed, 125 insertions(+), 49 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 73b2b17..6a05ead 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -56,11 +56,11 @@ /* constants */ enum { - IPOIB_PACKET_SIZE = 2048, - IPOIB_BUF_SIZE = IPOIB_PACKET_SIZE + IB_GRH_BYTES, - IPOIB_ENCAP_LEN = 4, + IPOIB_UD_HEAD_SIZE = IB_GRH_BYTES + IPOIB_ENCAP_LEN, + IPOIB_UD_RX_SG = 2, /* max buffer needed for 4K mtu */ + IPOIB_CM_MTU = 0x10000 - 0x10, /* padding to align header to 16 */ IPOIB_CM_BUF_SIZE = IPOIB_CM_MTU + IPOIB_ENCAP_LEN, IPOIB_CM_HEAD_SIZE = IPOIB_CM_BUF_SIZE % PAGE_SIZE, @@ -139,7 +139,7 @@ struct ipoib_mcast { struct ipoib_rx_buf { struct sk_buff *skb; - u64 mapping; + u64 mapping[IPOIB_UD_RX_SG]; }; struct ipoib_tx_buf { @@ -294,6 +294,7 @@ struct ipoib_dev_priv { unsigned int admin_mtu; unsigned int mcast_mtu; + unsigned int max_ib_mtu; struct ipoib_rx_buf *rx_ring; @@ -305,6 +306,9 @@ struct ipoib_dev_priv { struct ib_send_wr tx_wr; unsigned tx_outstanding; + struct ib_recv_wr rx_wr; + struct ib_sge rx_sge[IPOIB_UD_RX_SG]; + struct ib_wc ibwc[IPOIB_NUM_WC]; struct list_head dead_ahs; @@ -366,6 +370,44 @@ struct ipoib_neigh { struct list_head list; }; +#define IPOIB_UD_MTU(ib_mtu) (ib_mtu - IPOIB_ENCAP_LEN) +#define IPOIB_UD_BUF_SIZE(ib_mtu) (ib_mtu + IB_GRH_BYTES) + +static inline int ipoib_ud_need_sg(unsigned int ib_mtu) +{ + return (IPOIB_UD_BUF_SIZE(ib_mtu) > PAGE_SIZE) ? 1 : 0; +} + +static inline void ipoib_ud_dma_unmap_rx(struct ipoib_dev_priv *priv, + u64 mapping[IPOIB_UD_RX_SG]) +{ + if (ipoib_ud_need_sg(priv->max_ib_mtu)) { + ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_UD_HEAD_SIZE, DMA_FROM_DEVICE); + ib_dma_unmap_page(priv->ca, mapping[1], PAGE_SIZE, DMA_FROM_DEVICE); + } else + ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_UD_BUF_SIZE(priv->max_ib_mtu), DMA_FROM_DEVICE); +} + +static inline void ipoib_ud_skb_put_frags(struct ipoib_dev_priv *priv, + struct sk_buff *skb, + unsigned int length) +{ + if (ipoib_ud_need_sg(priv->max_ib_mtu)) { + skb_frag_t *frag = &skb_shinfo(skb)->frags[0]; + /* + * There is only two buffers needed for max_payload = 4K, + * first buf size is IPOIB_UD_HEAD_SIZE + */ + skb->tail += IPOIB_UD_HEAD_SIZE; + frag->size = length - IPOIB_UD_HEAD_SIZE; + skb->data_len += frag->size; + skb->truesize += frag->size; + skb->len += length; + } else + skb_put(skb, length); + +} + /* * We stash a pointer to our private neighbour information after our * hardware address in neigh->ha. The ALIGN() expression here makes diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 0205eb7..8b3f1b2 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -92,25 +92,18 @@ void ipoib_free_ah(struct kref *kref) static int ipoib_ib_post_receive(struct net_device *dev, int id) { struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ib_sge list; - struct ib_recv_wr param; struct ib_recv_wr *bad_wr; int ret; - list.addr = priv->rx_ring[id].mapping; - list.length = IPOIB_BUF_SIZE; - list.lkey = priv->mr->lkey; + priv->rx_wr.wr_id = id | IPOIB_OP_RECV; + priv->rx_sge[0].addr = priv->rx_ring[id].mapping[0]; + priv->rx_sge[1].addr = priv->rx_ring[id].mapping[1]; + - param.next = NULL; - param.wr_id = id | IPOIB_OP_RECV; - param.sg_list = &list; - param.num_sge = 1; - - ret = ib_post_recv(priv->qp, ¶m, &bad_wr); + ret = ib_post_recv(priv->qp, &priv->rx_wr, &bad_wr); if (unlikely(ret)) { ipoib_warn(priv, "receive failed for buf %d (%d)\n", id, ret); - ib_dma_unmap_single(priv->ca, priv->rx_ring[id].mapping, - IPOIB_BUF_SIZE, DMA_FROM_DEVICE); + ipoib_ud_dma_unmap_rx(priv, priv->rx_ring[id].mapping); dev_kfree_skb_any(priv->rx_ring[id].skb); priv->rx_ring[id].skb = NULL; } @@ -118,15 +111,22 @@ static int ipoib_ib_post_receive(struct net_device *dev, int id) return ret; } -static int ipoib_alloc_rx_skb(struct net_device *dev, int id) +static struct sk_buff *ipoib_alloc_rx_skb(struct net_device *dev, + int id, + u64 mapping[IPOIB_UD_RX_SG]) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct sk_buff *skb; - u64 addr; + int buf_size; - skb = dev_alloc_skb(IPOIB_BUF_SIZE + 4); - if (!skb) - return -ENOMEM; + if (ipoib_ud_need_sg(priv->max_ib_mtu)) + buf_size = IPOIB_UD_HEAD_SIZE; + else + buf_size = IPOIB_UD_BUF_SIZE(priv->max_ib_mtu); + + skb = dev_alloc_skb(buf_size + 4); + if (unlikely(!skb)) + return NULL; /* * IB will leave a 40 byte gap for a GRH and IPoIB adds a 4 byte @@ -135,17 +135,31 @@ static int ipoib_alloc_rx_skb(struct net_device *dev, int id) */ skb_reserve(skb, 4); - addr = ib_dma_map_single(priv->ca, skb->data, IPOIB_BUF_SIZE, - DMA_FROM_DEVICE); - if (unlikely(ib_dma_mapping_error(priv->ca, addr))) { + mapping[0] = ib_dma_map_single(priv->ca, skb->data, buf_size, + DMA_FROM_DEVICE); + if (unlikely(ib_dma_mapping_error(priv->ca, mapping[0]))) { dev_kfree_skb_any(skb); - return -EIO; + return NULL; + } + if (ipoib_ud_need_sg(priv->max_ib_mtu)) { + struct page *page = alloc_page(GFP_ATOMIC); + if (!page) + goto partial_error; + skb_fill_page_desc(skb, 0, page, 0, PAGE_SIZE); + mapping[1] = ib_dma_map_page(priv->ca, + skb_shinfo(skb)->frags[0].page, + 0, PAGE_SIZE, DMA_FROM_DEVICE); + if (unlikely(ib_dma_mapping_error(priv->ca, mapping[1]))) + goto partial_error; } - priv->rx_ring[id].skb = skb; - priv->rx_ring[id].mapping = addr; + priv->rx_ring[id].skb = skb; + return skb; - return 0; +partial_error: + ib_dma_unmap_single(priv->ca, mapping[0], buf_size, DMA_FROM_DEVICE); + dev_kfree_skb_any(skb); + return NULL; } static int ipoib_ib_post_receives(struct net_device *dev) @@ -154,7 +168,7 @@ static int ipoib_ib_post_receives(struct net_device *dev) int i; for (i = 0; i < ipoib_recvq_size; ++i) { - if (ipoib_alloc_rx_skb(dev, i)) { + if (!ipoib_alloc_rx_skb(dev, i, priv->rx_ring[i].mapping)) { ipoib_warn(priv, "failed to allocate receive buffer %d\n", i); return -ENOMEM; } @@ -172,7 +186,7 @@ static void ipoib_ib_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) struct ipoib_dev_priv *priv = netdev_priv(dev); unsigned int wr_id = wc->wr_id & ~IPOIB_OP_RECV; struct sk_buff *skb; - u64 addr; + u64 mapping[IPOIB_UD_RX_SG]; ipoib_dbg_data(priv, "recv completion: id %d, status: %d\n", wr_id, wc->status); @@ -184,15 +198,13 @@ static void ipoib_ib_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) } skb = priv->rx_ring[wr_id].skb; - addr = priv->rx_ring[wr_id].mapping; if (unlikely(wc->status != IB_WC_SUCCESS)) { if (wc->status != IB_WC_WR_FLUSH_ERR) ipoib_warn(priv, "failed recv event " "(status=%d, wrid=%d vend_err %x)\n", wc->status, wr_id, wc->vendor_err); - ib_dma_unmap_single(priv->ca, addr, - IPOIB_BUF_SIZE, DMA_FROM_DEVICE); + ipoib_ud_dma_unmap_rx(priv, priv->rx_ring[wr_id].mapping); dev_kfree_skb_any(skb); priv->rx_ring[wr_id].skb = NULL; return; @@ -209,7 +221,7 @@ static void ipoib_ib_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) * If we can't allocate a new RX buffer, dump * this packet and reuse the old buffer. */ - if (unlikely(ipoib_alloc_rx_skb(dev, wr_id))) { + if (unlikely(!ipoib_alloc_rx_skb(dev, wr_id, mapping))) { ++dev->stats.rx_dropped; goto repost; } @@ -217,9 +229,11 @@ static void ipoib_ib_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n", wc->byte_len, wc->slid); - ib_dma_unmap_single(priv->ca, addr, IPOIB_BUF_SIZE, DMA_FROM_DEVICE); + ipoib_ud_dma_unmap_rx(priv, priv->rx_ring[wr_id].mapping); + ipoib_ud_skb_put_frags(priv, skb, wc->byte_len); + memcpy(priv->rx_ring[wr_id].mapping, mapping, + IPOIB_UD_RX_SG * sizeof *mapping); - skb_put(skb, wc->byte_len); skb_pull(skb, IB_GRH_BYTES); skb->protocol = ((struct ipoib_header *) skb->data)->proto; @@ -733,10 +747,8 @@ int ipoib_ib_dev_stop(struct net_device *dev, int flush) rx_req = &priv->rx_ring[i]; if (!rx_req->skb) continue; - ib_dma_unmap_single(priv->ca, - rx_req->mapping, - IPOIB_BUF_SIZE, - DMA_FROM_DEVICE); + ipoib_ud_dma_unmap_rx(priv, + priv->rx_ring[i].mapping); dev_kfree_skb_any(rx_req->skb); rx_req->skb = NULL; } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index bd07f02..ee4c45a 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -195,7 +195,7 @@ static int ipoib_change_mtu(struct net_device *dev, int new_mtu) return 0; } - if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN) + if (new_mtu > IPOIB_UD_MTU(priv->max_ib_mtu)) return -EINVAL; priv->admin_mtu = new_mtu; @@ -971,10 +971,6 @@ static void ipoib_setup(struct net_device *dev) NETIF_F_LLTX | NETIF_F_HIGHDMA); - /* MTU will be reset when mcast join happens */ - dev->mtu = IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN; - priv->mcast_mtu = priv->admin_mtu = dev->mtu; - memcpy(dev->broadcast, ipv4_bcast_addr, INFINIBAND_ALEN); netif_carrier_off(dev); @@ -1107,6 +1103,7 @@ static struct net_device *ipoib_add_port(const char *format, { struct ipoib_dev_priv *priv; struct ib_device_attr *device_attr; + struct ib_port_attr attr; int result = -ENOMEM; priv = ipoib_intf_alloc(format); @@ -1115,6 +1112,18 @@ static struct net_device *ipoib_add_port(const char *format, SET_NETDEV_DEV(priv->dev, hca->dma_device); + if (!ib_query_port(hca, port, &attr)) + priv->max_ib_mtu = ib_mtu_enum_to_int(attr.max_mtu); + else { + printk(KERN_WARNING "%s: ib_query_port %d failed\n", + hca->name, port); + goto device_init_failed; + } + + /* MTU will be reset when mcast join happens */ + priv->dev->mtu = IPOIB_UD_MTU(priv->max_ib_mtu); + priv->mcast_mtu = priv->admin_mtu = priv->dev->mtu; + result = ib_query_pkey(hca, port, 0, &priv->pkey); if (result) { printk(KERN_WARNING "%s: ib_query_pkey port %d failed (ret = %d)\n", diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index 31a53c5..b9faef2 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -567,8 +567,7 @@ void ipoib_mcast_join_task(struct work_struct *work) return; } - priv->mcast_mtu = ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu) - - IPOIB_ENCAP_LEN; + priv->mcast_mtu = IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu)); if (!ipoib_cm_admin_enabled(dev)) dev->mtu = min(priv->mcast_mtu, priv->admin_mtu); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c index 8a20e37..a7d4bcb 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -150,7 +150,7 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca) .max_send_wr = ipoib_sendq_size, .max_recv_wr = ipoib_recvq_size, .max_send_sge = 1, - .max_recv_sge = 1 + .max_recv_sge = IPOIB_UD_RX_SG }, .sq_sig_type = IB_SIGNAL_ALL_WR, .qp_type = IB_QPT_UD @@ -215,6 +215,19 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca) priv->tx_wr.sg_list = priv->tx_sge; priv->tx_wr.send_flags = IB_SEND_SIGNALED; + priv->rx_sge[0].lkey = priv->mr->lkey; + if (ipoib_ud_need_sg(priv->max_ib_mtu)) { + priv->rx_sge[0].length = IPOIB_UD_HEAD_SIZE; + priv->rx_sge[1].length = PAGE_SIZE; + priv->rx_sge[1].lkey = priv->mr->lkey; + priv->rx_wr.num_sge = IPOIB_UD_RX_SG; + } else { + priv->rx_sge[0].length = IPOIB_UD_BUF_SIZE(priv->max_ib_mtu); + priv->rx_wr.num_sge = 1; + } + priv->rx_wr.next = NULL; + priv->rx_wr.sg_list = priv->rx_sge; + return 0; out_free_cq: diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c index 293f5b8..431fdea 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c @@ -89,6 +89,7 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) goto err; } + priv->max_ib_mtu = ppriv->max_ib_mtu; set_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags); priv->pkey = pkey; From rdreier at cisco.com Sun Apr 20 18:55:23 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 20 Apr 2008 18:55:23 -0700 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: (Dave Olson's message of "Sun, 20 Apr 2008 07:47:56 -0700 (PDT)") References: <20080419081614.GA2437@elte.hu> Message-ID: > | Not sure I really follow this response... ipath_driver.c has > | > | case PCI_DEVICE_ID_INFINIPATH_7220: > | #ifndef CONFIG_PCI_MSI > | ipath_dbg("CONFIG_PCI_MSI is not enabled, " > | "using IntX for unit %u\n", dd->ipath_unit); > | #endif > | ipath_init_iba7220_funcs(dd); > | break; > | > | so clearly ipath_init_iba7220_funcs() was intended to be built and used > | even if CONFIG_PCI_MSI was not defined. From the code it looks like all > | should work fine if PCI_MSI is not set, so I don't know what you mean > | about conditional checks. > > Actually, it wasn't. It was a late cleanup for another problem, and > we didn't worry about the other issue, and should have. Sorry, I still don't follow. What is the antecedent of "it"? What was "the other issue"? I'm not sure I know the right fix for the build breakage. It seems there are two possibilities: - build the iba7220 support unconditionally (the patch I posted). - change the case statement I quoted above so that the ipath_init_iba7220_funcs() call is inside the #ifdef block (and add an error message if CONFIG_PCI_MSI is not defined, as for the 6120 block in the same case statement). Since it seems iba7220 works with INTx interrupts, the first choice makes the most sense to me. And since all the pci_msi functions have stubs that just fail unconditionally if CONFIG_PCI_MSI is not defined, it seems we can remove the #ifdef CONFIG_PCI_MSI from the iba7220 files. And given that at least some device support works even if neither PCI_MSI nor HT_IRQ is defined, then it makes sense to me to remove that Kconfig dependency. If I have something wrong, please let me know. - R. From dave.olson at qlogic.com Sun Apr 20 19:35:17 2008 From: dave.olson at qlogic.com (Dave Olson) Date: Sun, 20 Apr 2008 19:35:17 -0700 (PDT) Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: References: <20080419081614.GA2437@elte.hu> Message-ID: On Sun, 20 Apr 2008, Roland Dreier wrote: | > | so clearly ipath_init_iba7220_funcs() was intended to be built and used | > | even if CONFIG_PCI_MSI was not defined. From the code it looks like all | > | should work fine if PCI_MSI is not set, so I don't know what you mean | > | about conditional checks. | > | > Actually, it wasn't. It was a late cleanup for another problem, and | > we didn't worry about the other issue, and should have. | | Sorry, I still don't follow. What is the antecedent of "it"? What was | "the other issue"? The CONFIG_PCI_MSI check where init_iba7220 is called. | I'm not sure I know the right fix for the build breakage. It seems | there are two possibilities: | | - build the iba7220 support unconditionally (the patch I posted). Yep; I already said I was OK with that. It's simplest, let's go with it. | And given that at least some device support works even if neither | PCI_MSI nor HT_IRQ is defined, then it makes sense to me to remove that | Kconfig dependency. Go ahead. Dave Olson dave.olson at qlogic.com From yevgenyp at mellanox.co.il Sun Apr 20 23:25:32 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Mon, 21 Apr 2008 09:25:32 +0300 Subject: [ofa-general][PATCH] mlx4: Qp range reservation (MP support, Patch 2) In-Reply-To: References: <480891F7.8090807@mellanox.co.il> Message-ID: <480C335C.6090606@mellanox.co.il> Roland Dreier wrote: > > +int mlx4_bitmap_init_with_effective_max(struct mlx4_bitmap *bitmap, > > + u32 num, u32 mask, u32 reserved, > > + u32 effective_max) > > This patch adds effective_max stuff but I don't see how it's used anywhere?? > > - R. > We use effective max when there is reserved range not only at the begin of the bitmap, but also at end. One example is that we reserve qp ranges for FCoE and Ethernet modules. Thanks, Yevgeny From fenkes at de.ibm.com Mon Apr 21 01:03:10 2008 From: fenkes at de.ibm.com (Joachim Fenkes) Date: Mon, 21 Apr 2008 09:03:10 +0100 Subject: [ofa-general] [PATCH 0/5] IB/ehca: IB compliance fix, tracing verbosity and module parameters Message-ID: <200804211003.10695.fenkes@de.ibm.com> [1/5] makes the driver reject SQ WRs if the QP is not in RTS [2/5] bumps a lot of tracing into higher debug_levels [3/5] removes the mr_largepage parameter [4/5] changes some bool-ish module parms into actual bools, also updates some descriptions [5/5] bumps the version number to 0026 Please review these patches and queue them for inclusion into 2.6.26 if you think they're okay. Thanks! Joachim -- Joachim Fenkes -- eHCA Linux Driver Developer and Hardware Tamer IBM Deutschland Entwicklung GmbH -- Dept. 3627 (I/O Firmware Dev. 2) Schoenaicher Strasse 220 -- 71032 Boeblingen -- Germany eMail: fenkes at de.ibm.com From fenkes at de.ibm.com Mon Apr 21 01:04:44 2008 From: fenkes at de.ibm.com (Joachim Fenkes) Date: Mon, 21 Apr 2008 09:04:44 +0100 Subject: [ofa-general] [PATCH 1/5] IB/ehca: Prevent posting of SQ WQEs if QP not in RTS In-Reply-To: <200804211003.10695.fenkes@de.ibm.com> References: <200804211003.10695.fenkes@de.ibm.com> Message-ID: <200804211004.44666.fenkes@de.ibm.com> ...as required by IB Spec, C10-29. Signed-off-by: Joachim Fenkes --- drivers/infiniband/hw/ehca/ehca_classes.h | 1 + drivers/infiniband/hw/ehca/ehca_qp.c | 3 +++ drivers/infiniband/hw/ehca/ehca_reqs.c | 5 +++++ 3 files changed, 9 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h index 0d13fe0..3d6d946 100644 --- a/drivers/infiniband/hw/ehca/ehca_classes.h +++ b/drivers/infiniband/hw/ehca/ehca_classes.h @@ -160,6 +160,7 @@ struct ehca_qp { }; u32 qp_type; enum ehca_ext_qp_type ext_type; + enum ib_qp_state state; struct ipz_queue ipz_squeue; struct ipz_queue ipz_rqueue; struct h_galpas galpas; diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c index 3eb14a5..5a653d7 100644 --- a/drivers/infiniband/hw/ehca/ehca_qp.c +++ b/drivers/infiniband/hw/ehca/ehca_qp.c @@ -550,6 +550,7 @@ static struct ehca_qp *internal_create_qp( spin_lock_init(&my_qp->spinlock_r); my_qp->qp_type = qp_type; my_qp->ext_type = parms.ext_type; + my_qp->state = IB_QPS_RESET; if (init_attr->recv_cq) my_qp->recv_cq = @@ -1508,6 +1509,8 @@ static int internal_modify_qp(struct ib_qp *ibqp, if (attr_mask & IB_QP_QKEY) my_qp->qkey = attr->qkey; + my_qp->state = qp_new_state; + modify_qp_exit2: if (squeue_locked) { /* this means: sqe -> rts */ spin_unlock_irqrestore(&my_qp->spinlock_s, flags); diff --git a/drivers/infiniband/hw/ehca/ehca_reqs.c b/drivers/infiniband/hw/ehca/ehca_reqs.c index a20bbf4..0b2359e 100644 --- a/drivers/infiniband/hw/ehca/ehca_reqs.c +++ b/drivers/infiniband/hw/ehca/ehca_reqs.c @@ -421,6 +421,11 @@ int ehca_post_send(struct ib_qp *qp, int ret = 0; unsigned long flags; + if (unlikely(my_qp->state != IB_QPS_RTS)) { + ehca_err(qp->device, "QP not in RTS state qpn=%x", qp->qp_num); + return -EINVAL; + } + /* LOCK the QUEUE */ spin_lock_irqsave(&my_qp->spinlock_s, flags); -- 1.5.5 From fenkes at de.ibm.com Mon Apr 21 01:05:26 2008 From: fenkes at de.ibm.com (Joachim Fenkes) Date: Mon, 21 Apr 2008 09:05:26 +0100 Subject: [ofa-general] [PATCH 2/5] IB/ehca: Move high-volume debug output to higher debug levels In-Reply-To: <200804211003.10695.fenkes@de.ibm.com> References: <200804211003.10695.fenkes@de.ibm.com> Message-ID: <200804211005.26567.fenkes@de.ibm.com> Signed-off-by: Joachim Fenkes --- drivers/infiniband/hw/ehca/ehca_irq.c | 2 +- drivers/infiniband/hw/ehca/ehca_main.c | 14 ++++++-- drivers/infiniband/hw/ehca/ehca_mrmw.c | 16 ++++++---- drivers/infiniband/hw/ehca/ehca_qp.c | 12 ++++---- drivers/infiniband/hw/ehca/ehca_reqs.c | 46 ++++++++++++++--------------- drivers/infiniband/hw/ehca/ehca_uverbs.c | 6 +-- drivers/infiniband/hw/ehca/hcp_if.c | 23 ++++++++------- 7 files changed, 63 insertions(+), 56 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c index b5ca94c..ca5eb0c 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.c +++ b/drivers/infiniband/hw/ehca/ehca_irq.c @@ -633,7 +633,7 @@ static inline int find_next_online_cpu(struct ehca_comp_pool *pool) unsigned long flags; WARN_ON_ONCE(!in_interrupt()); - if (ehca_debug_level) + if (ehca_debug_level >= 3) ehca_dmp(&cpu_online_map, sizeof(cpumask_t), ""); spin_lock_irqsave(&pool->last_cpu_lock, flags); diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index 65b3362..4379bef 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -85,8 +85,8 @@ module_param_named(lock_hcalls, ehca_lock_hcalls, bool, S_IRUGO); MODULE_PARM_DESC(open_aqp1, "AQP1 on startup (0: no (default), 1: yes)"); MODULE_PARM_DESC(debug_level, - "debug level" - " (0: no debug traces (default), 1: with debug traces)"); + "Amount of debug output (0: none (default), 1: traces, " + "2: some dumps, 3: lots)"); MODULE_PARM_DESC(hw_level, "hardware level" " (0: autosensing (default), 1: v. 0.20, 2: v. 0.21)"); @@ -275,6 +275,7 @@ static int ehca_sense_attributes(struct ehca_shca *shca) u64 h_ret; struct hipz_query_hca *rblock; struct hipz_query_port *port; + const char *loc_code; static const u32 pgsize_map[] = { HCA_CAP_MR_PGSIZE_4K, 0x1000, @@ -283,6 +284,12 @@ static int ehca_sense_attributes(struct ehca_shca *shca) HCA_CAP_MR_PGSIZE_16M, 0x1000000, }; + ehca_gen_dbg("Probing adapter %s...", + shca->ofdev->node->full_name); + loc_code = of_get_property(shca->ofdev->node, "ibm,loc-code", NULL); + if (loc_code) + ehca_gen_dbg(" ... location lode=%s", loc_code); + rblock = ehca_alloc_fw_ctrlblock(GFP_KERNEL); if (!rblock) { ehca_gen_err("Cannot allocate rblock memory."); @@ -567,8 +574,7 @@ static int ehca_destroy_aqp1(struct ehca_sport *sport) static ssize_t ehca_show_debug_level(struct device_driver *ddp, char *buf) { - return snprintf(buf, PAGE_SIZE, "%d\n", - ehca_debug_level); + return snprintf(buf, PAGE_SIZE, "%d\n", ehca_debug_level); } static ssize_t ehca_store_debug_level(struct device_driver *ddp, diff --git a/drivers/infiniband/hw/ehca/ehca_mrmw.c b/drivers/infiniband/hw/ehca/ehca_mrmw.c index f26997f..46ae4eb 100644 --- a/drivers/infiniband/hw/ehca/ehca_mrmw.c +++ b/drivers/infiniband/hw/ehca/ehca_mrmw.c @@ -1794,8 +1794,9 @@ static int ehca_check_kpages_per_ate(struct scatterlist *page_list, int t; for (t = start_idx; t <= end_idx; t++) { u64 pgaddr = page_to_pfn(sg_page(&page_list[t])) << PAGE_SHIFT; - ehca_gen_dbg("chunk_page=%lx value=%016lx", pgaddr, - *(u64 *)abs_to_virt(phys_to_abs(pgaddr))); + if (ehca_debug_level >= 3) + ehca_gen_dbg("chunk_page=%lx value=%016lx", pgaddr, + *(u64 *)abs_to_virt(phys_to_abs(pgaddr))); if (pgaddr - PAGE_SIZE != *prev_pgaddr) { ehca_gen_err("uncontiguous page found pgaddr=%lx " "prev_pgaddr=%lx page_list_i=%x", @@ -1862,10 +1863,13 @@ static int ehca_set_pagebuf_user2(struct ehca_mr_pginfo *pginfo, pgaddr & ~(pginfo->hwpage_size - 1)); } - ehca_gen_dbg("kpage=%lx chunk_page=%lx " - "value=%016lx", *kpage, pgaddr, - *(u64 *)abs_to_virt( - phys_to_abs(pgaddr))); + if (ehca_debug_level >= 3) { + u64 val = *(u64 *)abs_to_virt( + phys_to_abs(pgaddr)); + ehca_gen_dbg("kpage=%lx chunk_page=%lx " + "value=%016lx", + *kpage, pgaddr, val); + } prev_pgaddr = pgaddr; i++; pginfo->kpage_cnt++; diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c index 5a653d7..57bef11 100644 --- a/drivers/infiniband/hw/ehca/ehca_qp.c +++ b/drivers/infiniband/hw/ehca/ehca_qp.c @@ -966,7 +966,7 @@ static int prepare_sqe_rts(struct ehca_qp *my_qp, struct ehca_shca *shca, qp_num, bad_send_wqe_p); /* convert wqe pointer to vadr */ bad_send_wqe_v = abs_to_virt((u64)bad_send_wqe_p); - if (ehca_debug_level) + if (ehca_debug_level >= 2) ehca_dmp(bad_send_wqe_v, 32, "qp_num=%x bad_wqe", qp_num); squeue = &my_qp->ipz_squeue; if (ipz_queue_abs_to_offset(squeue, (u64)bad_send_wqe_p, &q_ofs)) { @@ -979,7 +979,7 @@ static int prepare_sqe_rts(struct ehca_qp *my_qp, struct ehca_shca *shca, wqe = (struct ehca_wqe *)ipz_qeit_calc(squeue, q_ofs); *bad_wqe_cnt = 0; while (wqe->optype != 0xff && wqe->wqef != 0xff) { - if (ehca_debug_level) + if (ehca_debug_level >= 2) ehca_dmp(wqe, 32, "qp_num=%x wqe", qp_num); wqe->nr_of_data_seg = 0; /* suppress data access */ wqe->wqef = WQEF_PURGE; /* WQE to be purged */ @@ -1451,7 +1451,7 @@ static int internal_modify_qp(struct ib_qp *ibqp, /* no support for max_send/recv_sge yet */ } - if (ehca_debug_level) + if (ehca_debug_level >= 2) ehca_dmp(mqpcb, 4*70, "qp_num=%x", ibqp->qp_num); h_ret = hipz_h_modify_qp(shca->ipz_hca_handle, @@ -1766,7 +1766,7 @@ int ehca_query_qp(struct ib_qp *qp, if (qp_init_attr) *qp_init_attr = my_qp->init_attr; - if (ehca_debug_level) + if (ehca_debug_level >= 2) ehca_dmp(qpcb, 4*70, "qp_num=%x", qp->qp_num); query_qp_exit1: @@ -1814,7 +1814,7 @@ int ehca_modify_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr, goto modify_srq_exit0; } - if (ehca_debug_level) + if (ehca_debug_level >= 2) ehca_dmp(mqpcb, 4*70, "qp_num=%x", my_qp->real_qp_num); h_ret = hipz_h_modify_qp(shca->ipz_hca_handle, my_qp->ipz_qp_handle, @@ -1867,7 +1867,7 @@ int ehca_query_srq(struct ib_srq *srq, struct ib_srq_attr *srq_attr) srq_attr->srq_limit = EHCA_BMASK_GET( MQPCB_CURR_SRQ_LIMIT, qpcb->curr_srq_limit); - if (ehca_debug_level) + if (ehca_debug_level >= 2) ehca_dmp(qpcb, 4*70, "qp_num=%x", my_qp->real_qp_num); query_srq_exit1: diff --git a/drivers/infiniband/hw/ehca/ehca_reqs.c b/drivers/infiniband/hw/ehca/ehca_reqs.c index 0b2359e..bbe0436 100644 --- a/drivers/infiniband/hw/ehca/ehca_reqs.c +++ b/drivers/infiniband/hw/ehca/ehca_reqs.c @@ -81,7 +81,7 @@ static inline int ehca_write_rwqe(struct ipz_queue *ipz_rqueue, recv_wr->sg_list[cnt_ds].length; } - if (ehca_debug_level) { + if (ehca_debug_level >= 3) { ehca_gen_dbg("RECEIVE WQE written into ipz_rqueue=%p", ipz_rqueue); ehca_dmp(wqe_p, 16*(6 + wqe_p->nr_of_data_seg), "recv wqe"); @@ -281,7 +281,7 @@ static inline int ehca_write_swqe(struct ehca_qp *qp, return -EINVAL; } - if (ehca_debug_level) { + if (ehca_debug_level >= 3) { ehca_gen_dbg("SEND WQE written into queue qp=%p ", qp); ehca_dmp( wqe_p, 16*(6 + wqe_p->nr_of_data_seg), "send wqe"); } @@ -459,13 +459,14 @@ int ehca_post_send(struct ib_qp *qp, goto post_send_exit0; } wqe_cnt++; - ehca_dbg(qp->device, "ehca_qp=%p qp_num=%x wqe_cnt=%d", - my_qp, qp->qp_num, wqe_cnt); } /* eof for cur_send_wr */ post_send_exit0: iosync(); /* serialize GAL register access */ hipz_update_sqa(my_qp, wqe_cnt); + if (unlikely(ret || ehca_debug_level >= 2)) + ehca_dbg(qp->device, "ehca_qp=%p qp_num=%x wqe_cnt=%d ret=%i", + my_qp, qp->qp_num, wqe_cnt, ret); my_qp->message_count += wqe_cnt; spin_unlock_irqrestore(&my_qp->spinlock_s, flags); return ret; @@ -525,13 +526,14 @@ static int internal_post_recv(struct ehca_qp *my_qp, goto post_recv_exit0; } wqe_cnt++; - ehca_dbg(dev, "ehca_qp=%p qp_num=%x wqe_cnt=%d", - my_qp, my_qp->real_qp_num, wqe_cnt); } /* eof for cur_recv_wr */ post_recv_exit0: iosync(); /* serialize GAL register access */ hipz_update_rqa(my_qp, wqe_cnt); + if (unlikely(ret || ehca_debug_level >= 2)) + ehca_dbg(dev, "ehca_qp=%p qp_num=%x wqe_cnt=%d ret=%i", + my_qp, my_qp->real_qp_num, wqe_cnt, ret); spin_unlock_irqrestore(&my_qp->spinlock_r, flags); return ret; } @@ -575,16 +577,17 @@ static inline int ehca_poll_cq_one(struct ib_cq *cq, struct ib_wc *wc) struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq); struct ehca_cqe *cqe; struct ehca_qp *my_qp; - int cqe_count = 0; + int cqe_count = 0, is_error; poll_cq_one_read_cqe: cqe = (struct ehca_cqe *) ipz_qeit_get_inc_valid(&my_cq->ipz_queue); if (!cqe) { ret = -EAGAIN; - ehca_dbg(cq->device, "Completion queue is empty ehca_cq=%p " - "cq_num=%x ret=%i", my_cq, my_cq->cq_number, ret); - goto poll_cq_one_exit0; + if (ehca_debug_level >= 3) + ehca_dbg(cq->device, "Completion queue is empty " + "my_cq=%p cq_num=%x", my_cq, my_cq->cq_number); + goto poll_cq_one_exit0; } /* prevents loads being reordered across this point */ @@ -614,7 +617,7 @@ poll_cq_one_read_cqe: ehca_dbg(cq->device, "Got CQE with purged bit qp_num=%x src_qp=%x", cqe->local_qp_number, cqe->remote_qp_number); - if (ehca_debug_level) + if (ehca_debug_level >= 2) ehca_dmp(cqe, 64, "qp_num=%x src_qp=%x", cqe->local_qp_number, cqe->remote_qp_number); @@ -627,11 +630,13 @@ poll_cq_one_read_cqe: } } - /* tracing cqe */ - if (unlikely(ehca_debug_level)) { + is_error = cqe->status & WC_STATUS_ERROR_BIT; + + /* trace error CQEs if debug_level >= 1, trace all CQEs if >= 3 */ + if (unlikely(ehca_debug_level >= 3 || (ehca_debug_level && is_error))) { ehca_dbg(cq->device, - "Received COMPLETION ehca_cq=%p cq_num=%x -----", - my_cq, my_cq->cq_number); + "Received %sCOMPLETION ehca_cq=%p cq_num=%x -----", + is_error ? "ERROR " : "", my_cq, my_cq->cq_number); ehca_dmp(cqe, 64, "ehca_cq=%p cq_num=%x", my_cq, my_cq->cq_number); ehca_dbg(cq->device, @@ -654,8 +659,9 @@ poll_cq_one_read_cqe: /* update also queue adder to throw away this entry!!! */ goto poll_cq_one_exit0; } + /* eval ib_wc_status */ - if (unlikely(cqe->status & WC_STATUS_ERROR_BIT)) { + if (unlikely(is_error)) { /* complete with errors */ map_ib_wc_status(cqe->status, &wc->status); wc->vendor_err = wc->status; @@ -676,14 +682,6 @@ poll_cq_one_read_cqe: wc->imm_data = cpu_to_be32(cqe->immediate_data); wc->sl = cqe->service_level; - if (unlikely(wc->status != IB_WC_SUCCESS)) - ehca_dbg(cq->device, - "ehca_cq=%p cq_num=%x WARNING unsuccessful cqe " - "OPType=%x status=%x qp_num=%x src_qp=%x wr_id=%lx " - "cqe=%p", my_cq, my_cq->cq_number, cqe->optype, - cqe->status, cqe->local_qp_number, - cqe->remote_qp_number, cqe->work_request_id, cqe); - poll_cq_one_exit0: if (cqe_count > 0) hipz_update_feca(my_cq, cqe_count); diff --git a/drivers/infiniband/hw/ehca/ehca_uverbs.c b/drivers/infiniband/hw/ehca/ehca_uverbs.c index 1b07f2b..e43ed8f 100644 --- a/drivers/infiniband/hw/ehca/ehca_uverbs.c +++ b/drivers/infiniband/hw/ehca/ehca_uverbs.c @@ -211,8 +211,7 @@ static int ehca_mmap_qp(struct vm_area_struct *vma, struct ehca_qp *qp, break; case 1: /* qp rqueue_addr */ - ehca_dbg(qp->ib_qp.device, "qp_num=%x rqueue", - qp->ib_qp.qp_num); + ehca_dbg(qp->ib_qp.device, "qp_num=%x rq", qp->ib_qp.qp_num); ret = ehca_mmap_queue(vma, &qp->ipz_rqueue, &qp->mm_count_rqueue); if (unlikely(ret)) { @@ -224,8 +223,7 @@ static int ehca_mmap_qp(struct vm_area_struct *vma, struct ehca_qp *qp, break; case 2: /* qp squeue_addr */ - ehca_dbg(qp->ib_qp.device, "qp_num=%x squeue", - qp->ib_qp.qp_num); + ehca_dbg(qp->ib_qp.device, "qp_num=%x sq", qp->ib_qp.qp_num); ret = ehca_mmap_queue(vma, &qp->ipz_squeue, &qp->mm_count_squeue); if (unlikely(ret)) { diff --git a/drivers/infiniband/hw/ehca/hcp_if.c b/drivers/infiniband/hw/ehca/hcp_if.c index 7029aa6..5245e13 100644 --- a/drivers/infiniband/hw/ehca/hcp_if.c +++ b/drivers/infiniband/hw/ehca/hcp_if.c @@ -123,8 +123,9 @@ static long ehca_plpar_hcall_norets(unsigned long opcode, int i, sleep_msecs; unsigned long flags = 0; - ehca_gen_dbg("opcode=%lx " HCALL7_REGS_FORMAT, - opcode, arg1, arg2, arg3, arg4, arg5, arg6, arg7); + if (unlikely(ehca_debug_level >= 2)) + ehca_gen_dbg("opcode=%lx " HCALL7_REGS_FORMAT, + opcode, arg1, arg2, arg3, arg4, arg5, arg6, arg7); for (i = 0; i < 5; i++) { /* serialize hCalls to work around firmware issue */ @@ -148,7 +149,8 @@ static long ehca_plpar_hcall_norets(unsigned long opcode, opcode, ret, arg1, arg2, arg3, arg4, arg5, arg6, arg7); else - ehca_gen_dbg("opcode=%lx ret=%li", opcode, ret); + if (unlikely(ehca_debug_level >= 2)) + ehca_gen_dbg("opcode=%lx ret=%li", opcode, ret); return ret; } @@ -172,8 +174,10 @@ static long ehca_plpar_hcall9(unsigned long opcode, int i, sleep_msecs; unsigned long flags = 0; - ehca_gen_dbg("INPUT -- opcode=%lx " HCALL9_REGS_FORMAT, opcode, - arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9); + if (unlikely(ehca_debug_level >= 2)) + ehca_gen_dbg("INPUT -- opcode=%lx " HCALL9_REGS_FORMAT, opcode, + arg1, arg2, arg3, arg4, arg5, + arg6, arg7, arg8, arg9); for (i = 0; i < 5; i++) { /* serialize hCalls to work around firmware issue */ @@ -201,7 +205,7 @@ static long ehca_plpar_hcall9(unsigned long opcode, ret, outs[0], outs[1], outs[2], outs[3], outs[4], outs[5], outs[6], outs[7], outs[8]); - } else + } else if (unlikely(ehca_debug_level >= 2)) ehca_gen_dbg("OUTPUT -- ret=%li " HCALL9_REGS_FORMAT, ret, outs[0], outs[1], outs[2], outs[3], outs[4], outs[5], outs[6], outs[7], @@ -381,7 +385,7 @@ u64 hipz_h_query_port(const struct ipz_adapter_handle adapter_handle, r_cb, /* r6 */ 0, 0, 0, 0); - if (ehca_debug_level) + if (ehca_debug_level >= 2) ehca_dmp(query_port_response_block, 64, "response_block"); return ret; @@ -731,9 +735,6 @@ u64 hipz_h_alloc_resource_mr(const struct ipz_adapter_handle adapter_handle, u64 ret; u64 outs[PLPAR_HCALL9_BUFSIZE]; - ehca_gen_dbg("kernel PAGE_SIZE=%x access_ctrl=%016x " - "vaddr=%lx length=%lx", - (u32)PAGE_SIZE, access_ctrl, vaddr, length); ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, adapter_handle.handle, /* r4 */ 5, /* r5 */ @@ -758,7 +759,7 @@ u64 hipz_h_register_rpage_mr(const struct ipz_adapter_handle adapter_handle, { u64 ret; - if (unlikely(ehca_debug_level >= 2)) { + if (unlikely(ehca_debug_level >= 3)) { if (count > 1) { u64 *kpage; int i; -- 1.5.5 From fenkes at de.ibm.com Mon Apr 21 01:06:08 2008 From: fenkes at de.ibm.com (Joachim Fenkes) Date: Mon, 21 Apr 2008 09:06:08 +0100 Subject: [ofa-general] [PATCH 3/5] IB/ehca: Remove mr_largepage parameter In-Reply-To: <200804211003.10695.fenkes@de.ibm.com> References: <200804211003.10695.fenkes@de.ibm.com> Message-ID: <200804211006.08849.fenkes@de.ibm.com> Always enable large page support; didn't seem to cause problems for anyone. Signed-off-by: Joachim Fenkes --- drivers/infiniband/hw/ehca/ehca_main.c | 22 +++------------------- 1 files changed, 3 insertions(+), 19 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index 4379bef..ab02ac8 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -60,7 +60,6 @@ MODULE_VERSION(HCAD_VERSION); static int ehca_open_aqp1 = 0; static int ehca_hw_level = 0; static int ehca_poll_all_eqs = 1; -static int ehca_mr_largepage = 1; int ehca_debug_level = 0; int ehca_nr_ports = 2; @@ -79,7 +78,6 @@ module_param_named(port_act_time, ehca_port_act_time, int, S_IRUGO); module_param_named(poll_all_eqs, ehca_poll_all_eqs, int, S_IRUGO); module_param_named(static_rate, ehca_static_rate, int, S_IRUGO); module_param_named(scaling_code, ehca_scaling_code, int, S_IRUGO); -module_param_named(mr_largepage, ehca_mr_largepage, int, S_IRUGO); module_param_named(lock_hcalls, ehca_lock_hcalls, bool, S_IRUGO); MODULE_PARM_DESC(open_aqp1, @@ -104,9 +102,6 @@ MODULE_PARM_DESC(static_rate, "set permanent static rate (default: disabled)"); MODULE_PARM_DESC(scaling_code, "set scaling code (0: disabled/default, 1: enabled)"); -MODULE_PARM_DESC(mr_largepage, - "use large page for MR (0: use PAGE_SIZE (default), " - "1: use large page depending on MR size"); MODULE_PARM_DESC(lock_hcalls, "serialize all hCalls made by the driver " "(default: autodetect)"); @@ -357,11 +352,9 @@ static int ehca_sense_attributes(struct ehca_shca *shca) /* translate supported MR page sizes; always support 4K */ shca->hca_cap_mr_pgsize = EHCA_PAGESIZE; - if (ehca_mr_largepage) { /* support extra sizes only if enabled */ - for (i = 0; i < ARRAY_SIZE(pgsize_map); i += 2) - if (rblock->memory_page_size_supported & pgsize_map[i]) - shca->hca_cap_mr_pgsize |= pgsize_map[i + 1]; - } + for (i = 0; i < ARRAY_SIZE(pgsize_map); i += 2) + if (rblock->memory_page_size_supported & pgsize_map[i]) + shca->hca_cap_mr_pgsize |= pgsize_map[i + 1]; /* query max MTU from first port -- it's the same for all ports */ port = (struct hipz_query_port *)rblock; @@ -663,14 +656,6 @@ static ssize_t ehca_show_adapter_handle(struct device *dev, } static DEVICE_ATTR(adapter_handle, S_IRUGO, ehca_show_adapter_handle, NULL); -static ssize_t ehca_show_mr_largepage(struct device *dev, - struct device_attribute *attr, - char *buf) -{ - return sprintf(buf, "%d\n", ehca_mr_largepage); -} -static DEVICE_ATTR(mr_largepage, S_IRUGO, ehca_show_mr_largepage, NULL); - static struct attribute *ehca_dev_attrs[] = { &dev_attr_adapter_handle.attr, &dev_attr_num_ports.attr, @@ -687,7 +672,6 @@ static struct attribute *ehca_dev_attrs[] = { &dev_attr_cur_mw.attr, &dev_attr_max_pd.attr, &dev_attr_max_ah.attr, - &dev_attr_mr_largepage.attr, NULL }; -- 1.5.5 From vlad at dev.mellanox.co.il Mon Apr 21 01:07:14 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 21 Apr 2008 11:07:14 +0300 Subject: [ofa-general] Re: ofed-1.3 uninstall.sh do not remove all the infiniband stack components properlly on RH 4 u 5 or rh 4 u 6 full instalation. In-Reply-To: <39C75744D164D948A170E9792AF8E7CAC5AEED@exil.voltaire.com> References: <47FA3D60.3020905@opengridcomputing.com> <47FAA913.7090805@opengridcomputing.com> <39C75744D164D948A170E9792AF8E7CAC5AEED@exil.voltaire.com> Message-ID: <480C4B32.8050706@dev.mellanox.co.il> Moshe Kazir wrote: > Some rpm's (openmpi-libs, libmthca-devel,etc.) are not removed and > cause dependency problems. > > The attaches patch solves the problem. > > Moshe > > ____________________________________________________________ > Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) > > Voltaire - The Grid Backbone > > www.voltaire.com > Applied. Thanks, Regards, Vladimir From fenkes at de.ibm.com Mon Apr 21 01:06:58 2008 From: fenkes at de.ibm.com (Joachim Fenkes) Date: Mon, 21 Apr 2008 09:06:58 +0100 Subject: [ofa-general] [PATCH 4/5] IB/ehca: Make some module parameters bool, update descriptions In-Reply-To: <200804211003.10695.fenkes@de.ibm.com> References: <200804211003.10695.fenkes@de.ibm.com> Message-ID: <200804211006.59197.fenkes@de.ibm.com> Signed-off-by: Joachim Fenkes --- drivers/infiniband/hw/ehca/ehca_main.c | 37 +++++++++++++++---------------- 1 files changed, 18 insertions(+), 19 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index ab02ac8..45fe35a 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -69,41 +69,40 @@ int ehca_static_rate = -1; int ehca_scaling_code = 0; int ehca_lock_hcalls = -1; -module_param_named(open_aqp1, ehca_open_aqp1, int, S_IRUGO); -module_param_named(debug_level, ehca_debug_level, int, S_IRUGO); -module_param_named(hw_level, ehca_hw_level, int, S_IRUGO); -module_param_named(nr_ports, ehca_nr_ports, int, S_IRUGO); -module_param_named(use_hp_mr, ehca_use_hp_mr, int, S_IRUGO); -module_param_named(port_act_time, ehca_port_act_time, int, S_IRUGO); -module_param_named(poll_all_eqs, ehca_poll_all_eqs, int, S_IRUGO); -module_param_named(static_rate, ehca_static_rate, int, S_IRUGO); -module_param_named(scaling_code, ehca_scaling_code, int, S_IRUGO); +module_param_named(open_aqp1, ehca_open_aqp1, bool, S_IRUGO); +module_param_named(debug_level, ehca_debug_level, int, S_IRUGO); +module_param_named(hw_level, ehca_hw_level, int, S_IRUGO); +module_param_named(nr_ports, ehca_nr_ports, int, S_IRUGO); +module_param_named(use_hp_mr, ehca_use_hp_mr, bool, S_IRUGO); +module_param_named(port_act_time, ehca_port_act_time, int, S_IRUGO); +module_param_named(poll_all_eqs, ehca_poll_all_eqs, bool, S_IRUGO); +module_param_named(static_rate, ehca_static_rate, int, S_IRUGO); +module_param_named(scaling_code, ehca_scaling_code, bool, S_IRUGO); module_param_named(lock_hcalls, ehca_lock_hcalls, bool, S_IRUGO); MODULE_PARM_DESC(open_aqp1, - "AQP1 on startup (0: no (default), 1: yes)"); + "Open AQP1 on startup (default: no)"); MODULE_PARM_DESC(debug_level, "Amount of debug output (0: none (default), 1: traces, " "2: some dumps, 3: lots)"); MODULE_PARM_DESC(hw_level, - "hardware level" - " (0: autosensing (default), 1: v. 0.20, 2: v. 0.21)"); + "Hardware level (0: autosensing (default), " + "0x10..0x14: eHCA, 0x20..0x23: eHCA2)"); MODULE_PARM_DESC(nr_ports, "number of connected ports (-1: autodetect, 1: port one only, " "2: two ports (default)"); MODULE_PARM_DESC(use_hp_mr, - "high performance MRs (0: no (default), 1: yes)"); + "Use high performance MRs (default: no)"); MODULE_PARM_DESC(port_act_time, - "time to wait for port activation (default: 30 sec)"); + "Time to wait for port activation (default: 30 sec)"); MODULE_PARM_DESC(poll_all_eqs, - "polls all event queues periodically" - " (0: no, 1: yes (default))"); + "Poll all event queues periodically (default: yes)"); MODULE_PARM_DESC(static_rate, - "set permanent static rate (default: disabled)"); + "Set permanent static rate (default: no static rate)"); MODULE_PARM_DESC(scaling_code, - "set scaling code (0: disabled/default, 1: enabled)"); + "Enable scaling code (default: no)"); MODULE_PARM_DESC(lock_hcalls, - "serialize all hCalls made by the driver " + "Serialize all hCalls made by the driver " "(default: autodetect)"); DEFINE_RWLOCK(ehca_qp_idr_lock); -- 1.5.5 From fenkes at de.ibm.com Mon Apr 21 01:08:16 2008 From: fenkes at de.ibm.com (Joachim Fenkes) Date: Mon, 21 Apr 2008 09:08:16 +0100 Subject: [ofa-general] [PATCH 5/5] IB/ehca: Bump version number to 0026 In-Reply-To: <200804211003.10695.fenkes@de.ibm.com> References: <200804211003.10695.fenkes@de.ibm.com> Message-ID: <200804211008.17023.fenkes@de.ibm.com> Signed-off-by: Joachim Fenkes --- drivers/infiniband/hw/ehca/ehca_main.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index 45fe35a..6504897 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -50,7 +50,7 @@ #include "ehca_tools.h" #include "hcp_if.h" -#define HCAD_VERSION "0025" +#define HCAD_VERSION "0026" MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Christoph Raisch "); -- 1.5.5 From fenkes at de.ibm.com Mon Apr 21 01:45:25 2008 From: fenkes at de.ibm.com (Joachim Fenkes) Date: Mon, 21 Apr 2008 09:45:25 +0100 Subject: [ofa-general] Re: [PATCH 1/5] IB/ehca: Prevent posting of SQ WQEs if QP not in RTS In-Reply-To: <200804211004.44666.fenkes@de.ibm.com> References: <200804211003.10695.fenkes@de.ibm.com> <200804211004.44666.fenkes@de.ibm.com> Message-ID: <200804211045.26183.fenkes@de.ibm.com> On Monday 21 April 2008 10:04, Joachim Fenkes wrote: > + if (unlikely(my_qp->state != IB_QPS_RTS)) { > + ehca_err(qp->device, "QP not in RTS state qpn=%x", qp->qp_num); > + return -EINVAL; > + } Myself, I'm not very happy with using EINVAL, but I can't think of a more fitting return code. Also, this is what nes, amso and cxgb3 return in such a case; ipath posts an error CQE and mthca/mlx4 don't do this check at all (AFAICS). Better suggestions, anyone? Regards, Joachim From tziporet at dev.mellanox.co.il Mon Apr 21 04:45:34 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 21 Apr 2008 14:45:34 +0300 Subject: [ofa-general] Re: [ewg] mlx4_core internal error with OFED 1.2.5.4 In-Reply-To: <1208442608.26936.143.camel@hrosenstock-ws.xsigo.com> References: <1208442608.26936.143.camel@hrosenstock-ws.xsigo.com> Message-ID: <480C7E5E.8090703@mellanox.co.il> Hal Rosenstock wrote: > Hi, > > I'm running OFED 1.2.5.4 and got the following: > > Is there any more information that can be provided by decoding this as > to what the error was ? Thanks. > > Hi Hal, I will forward this info to our FW developers. Which FW version you are using? What have you run when this happened? Thanks, Tziporet From erezz at Voltaire.COM Mon Apr 21 06:51:52 2008 From: erezz at Voltaire.COM (Erez Zilber) Date: Mon, 21 Apr 2008 16:51:52 +0300 Subject: [ofa-general] Re: [PATCH 1/3] iscsi iser: remove DMA restrictions In-Reply-To: <20080213195912.GC7372@osc.edu> References: <20080212205252.GB13643@osc.edu> <20080212205403.GC13643@osc.edu> <1202850645.3137.132.camel@localhost.localdomain> <20080212214632.GA14397@osc.edu> <1202853468.3137.148.camel@localhost.localdomain> <20080213195912.GC7372@osc.edu> Message-ID: <480C9BF8.9050401@Voltaire.COM> Pete Wyckoff wrote: > James.Bottomley at HansenPartnership.com wrote on Tue, 12 Feb 2008 15:57 -0600: > >> On Tue, 2008-02-12 at 16:46 -0500, Pete Wyckoff wrote: >> >>> James.Bottomley at HansenPartnership.com wrote on Tue, 12 Feb 2008 15:10 -0600: >>> >>>> On Tue, 2008-02-12 at 15:54 -0500, Pete Wyckoff wrote: >>>> >>>>> iscsi_iser does not have any hardware DMA restrictions. Add a >>>>> slave_configure function to remove any DMA alignment restriction, >>>>> allowing the use of direct IO from arbitrary offsets within a page. >>>>> Also disable page bouncing; iser has no restrictions on which pages it >>>>> can address. >>>>> >>>>> Signed-off-by: Pete Wyckoff >>>>> --- >>>>> drivers/infiniband/ulp/iser/iscsi_iser.c | 8 ++++++++ >>>>> 1 files changed, 8 insertions(+), 0 deletions(-) >>>>> >>>>> diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c >>>>> index be1b9fb..1b272a6 100644 >>>>> --- a/drivers/infiniband/ulp/iser/iscsi_iser.c >>>>> +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c >>>>> @@ -543,6 +543,13 @@ iscsi_iser_ep_disconnect(__u64 ep_handle) >>>>> iser_conn_terminate(ib_conn); >>>>> } >>>>> >>>>> +static int iscsi_iser_slave_configure(struct scsi_device *sdev) >>>>> +{ >>>>> + blk_queue_bounce_limit(sdev->request_queue, BLK_BOUNCE_ANY); >>>>> >>>> You really don't want to do this. That signals to the block layer that >>>> we have an iommu, although it's practically the same thing as a 64 bit >>>> DMA mask ... but I'd just leave it to the DMA mask to set this up >>>> correctly. Anything else is asking for a subtle bug to turn up years >>>> from now when something causes the mask and the limit to be mismatched. >>>> >>> Oh. I decided to add that line for symmetry with TCP, and was >>> convinced by the arguments here: >>> >>> commit b6d44fe9582b9d90a0b16f508ac08a90d899bf56 >>> Author: Mike Christie >>> Date: Thu Jul 26 12:46:47 2007 -0500 >>> >>> [SCSI] iscsi_tcp: Turn off bounce buffers >>> >>> It was found by LSI that on setups with large amounts of memory >>> we were bouncing buffers when we did not need to. If the iscsi tcp >>> code touches the data buffer (or a helper does), >>> it will kmap the buffer. iscsi_tcp also does not interact with hardware, >>> so it does not have any hw dma restrictions. This patch sets the bounce >>> buffer settings for our device queue so buffers should not be bounced >>> because of a driver limit. >>> >>> I don't see a convenient place to callback into particular iscsi >>> devices to set the DMA mask per-host. It has to go on the >>> shost_gendev, right?, but only for TCP and iSER, not qla4xxx, which >>> handles its DMA mask during device probe. >>> >> You should be taking your mask from the underlying infiniband device as >> part of the setup, shouldn't you? >> > > I think you're right about this. All the existing IB HW tries to > set a 64-bit dma mask, but that's no reason to disable the mechanism > entirely in iser. I'll remove that line that disables bouncing in > my patch. Perhaps Mike will know if the iscsi_tcp usage is still > appropriate. > > Let me make sure that I understand: you say that the IB HW driver (e.g. ib_mthca) tries to set a 64-bit dma mask: err = pci_set_dma_mask(pdev, DMA_64BIT_MASK); if (err) { dev_warn(&pdev->dev, "Warning: couldn't set 64-bit PCI DMA mask.\n"); err = pci_set_dma_mask(pdev, DMA_32BIT_MASK); if (err) { dev_err(&pdev->dev, "Can't set PCI DMA mask, aborting.\n"); goto err_free_res; } } So, in the example above, the driver will use a 64-bit mask or a 32-bit mask (or fail). According to that, iSER (and SRP) needs to call blk_queue_bounce_limit with the appropriate parameter, right? Thanks, Erez From bob.kossey at hp.com Mon Apr 21 06:58:03 2008 From: bob.kossey at hp.com (Kossey, Robert) Date: Mon, 21 Apr 2008 09:58:03 -0400 Subject: [ofa-general] Starting openibd before the network service In-Reply-To: <480C9AFB.4050801@hp.com> References: <480C9AFB.4050801@hp.com> Message-ID: <480C9D6B.2090906@hp.com> Hi Moshe, You may be aware that Voltaire OFED changed the start order of openibd to be before network to fix a problem that an IB bond device would not come up correctly after a reboot. I know I've seen that with Red Hat. I would like to see that fixed in OFED 1.3.1, as well as the panics I reported with IPoIB: https://bugs.openfabrics.org/show_bug.cgi?id=989 Bob >> /From bonding and ipoib point of view, it's better to have openibd > /started before the network service is started . > > In the openibd script we find that in SUSE network service is started > before openibd -> > > ### BEGIN INIT INFO > # Provides: openibd > # Required-Start: $local_fs $network > > > Can someone explain why ? > > Can we change it before OFED-1.3.1 ? > > Moshe > > From glebn at voltaire.com Mon Apr 21 07:14:41 2008 From: glebn at voltaire.com (Gleb Natapov) Date: Mon, 21 Apr 2008 17:14:41 +0300 Subject: [ofa-general] Problem with libibverbs and huge pages registration. Message-ID: <20080421141441.GF7771@minantech.com> Hi Roland, ibv_reg_mr() fails if I try to register a memory region backed by a huge page, but is not aligned to huge page boundary. Digging deeper I see that libibverbs aligns memory region to a regular page size and calls madvise() and the call fails. See program below to reproduce. The program assumes that hugetlbfs is mounted on /huge and there is at least one huge page available. I am not use it is possible to know if a memory buffer is backed by huge page to solve the problem. Another issue with libibverbs is that after first ibv_reg_mr() fails the second registration attempt of the same buffer succeed since ibv_madvise_range() doesn't cleanup after madvice failure and thinks that memory is already "madvised". #include #include #include #include #include #include #include int main() { int num_devs, fd; struct ibv_device **ib_devs; struct ibv_context *ctx; struct ibv_pd *pd; struct ibv_mr *mr; char *ptr; size_t len = 1024*1024; ibv_fork_init(); ib_devs = ibv_get_device_list(&num_devs); ctx = ibv_open_device(ib_devs[0]); pd = ibv_alloc_pd(ctx); fd = open("/huge/test", O_CREAT | O_RDWR); remove("/huge/test"); ptr = mmap(0, 2*len, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0); mr = ibv_reg_mr(pd, ptr, len, IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_REMOTE_READ); fprintf(stderr, "mr = %p\n", mr); return 0; } -- Gleb. From hrosenstock at xsigo.com Mon Apr 21 07:31:43 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Mon, 21 Apr 2008 07:31:43 -0700 Subject: [ofa-general] Re: [ewg] mlx4_core internal error with OFED 1.2.5.4 In-Reply-To: <480C7E5E.8090703@mellanox.co.il> References: <1208442608.26936.143.camel@hrosenstock-ws.xsigo.com> <480C7E5E.8090703@mellanox.co.il> Message-ID: <1208788303.18376.126.camel@hrosenstock-ws.xsigo.com> Hi Tziporet, On Mon, 2008-04-21 at 14:45 +0300, Tziporet Koren wrote: > Hal Rosenstock wrote: > > Hi, > > > > I'm running OFED 1.2.5.4 and got the following: > > > > Is there any more information that can be provided by decoding this as > > to what the error was ? Thanks. > > > > > Hi Hal, > I will forward this info to our FW developers. Thanks. > Which FW version you are using? 2.3.0 > What have you run when this happened? I'm not sure it's reproducible but was wondering if there were any clues as to what the internal error was and what could cause it in "theory". -- Hal > Thanks, > Tziporet From vlad at dev.mellanox.co.il Mon Apr 21 07:53:03 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 21 Apr 2008 17:53:03 +0300 Subject: [ofa-general] Re: Starting openibd before the network service In-Reply-To: <39C75744D164D948A170E9792AF8E7CAC5AF06@exil.voltaire.com> References: <4805F692.1040101@dev.mellanox.co.il> <39C75744D164D948A170E9792AF8E7CAC5AF06@exil.voltaire.com> Message-ID: <480CAA4F.7040507@dev.mellanox.co.il> Moshe Kazir wrote: > >>From bonding and ipoib point of view, it's better to have openibd > started before the network service is started . > > In the openibd script we find that in SUSE network service is started > before openibd -> > > ### BEGIN INIT INFO > # Provides: openibd > # Required-Start: $local_fs $network > > > Can someone explain why ? > > Can we change it before OFED-1.3.1 ? > > Moshe > > ____________________________________________________________ > Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) > > Voltaire - The Grid Backbone > > www.voltaire.com > > > Fixed in the OFED-1.3.1. Please check the latest daily build under http://www.openfabrics.org/builds/ofed-1.3.1 Regards, Vladimir From tziporet at mellanox.co.il Mon Apr 21 08:06:07 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 21 Apr 2008 18:06:07 +0300 Subject: [ofa-general] Agenda for the OFED meeting today Message-ID: <6C2C79E72C305246B504CBA17B5500C903D375E4@mtlexch01.mtl.com> Hi, This is the agenda for the OFED meeting today: 1. OFED 1.3.1: 1.1 Planned changes: ULPs changes: IB-bonding - done SRP failover - on work SDP crashes - on work RDS fixes for RDMA API - already applied but not clear if these are all the changes librdmacm 1.0.7 - done Open MPI 1.2.6 - done Low level drivers: - each HW vendor should reply when the changes will be ready nes mlx4 cxgb3 Ipath ehca 1.2 Schedule: GA is planned for May-29 I suggest to have only two release candidates: - RC1 - May 6 - RC2 - May 20 Note: daily builds of 1.3.1 are already available at: http://www.openfabrics.org/builds/ofed-1.3.1 2. OFED 1.4: Release features were presented at Sonoma (presentation available at http://www.openfabrics.org/archives/april2008sonoma.htm) Kernel tree is under work at: git://git.openfabrics.org/ofed_1_4/linux-2.6.git branch ofed_kernel Now failing on ipath drivers - waiting for an update. We should try to get the kernel code to compile as soon as possible so everybody will be able to contribute code 3. Follow up from Sonoma - open discussion Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: From gstreiff at NetEffect.com Mon Apr 21 10:10:08 2008 From: gstreiff at NetEffect.com (Glenn Streiff) Date: Mon, 21 Apr 2008 12:10:08 -0500 Subject: [ofa-general] RE: [ewg] Agenda for the OFED meeting today In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903D375E4@mtlexch01.mtl.com> Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC0795010E@venom2> Hi Tziporet. Apologies for missing the conference call. > Hi, > > This is the agenda for the OFED meeting today: > > Low level drivers: - each HW vendor should reply when the changes will be ready > nes > I think first week of May is likely for my 1.3.1 commits. > 1.2 Schedule: > > GA is planned for May-29 > I suggest to have only two release candidates: > - RC1 - May 6 > - RC2 - May 20 This looks workable to me if this is still the plan. Glenn > > Tziporet From olaf.kirch at oracle.com Mon Apr 21 10:18:40 2008 From: olaf.kirch at oracle.com (Olaf Kirch) Date: Mon, 21 Apr 2008 19:18:40 +0200 Subject: [ofa-general] Re: [ewg] Agenda for the OFED meeting today In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903D375E4@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C903D375E4@mtlexch01.mtl.com> Message-ID: <200804211918.40890.olaf.kirch@oracle.com> Hi Tziporet, On Monday 21 April 2008 17:06:07 Tziporet Koren wrote: > RDS fixes for RDMA API - already applied but not > clear if these are all the changes These patches fixed the critical bugs I knew of. So far, this is all that's ready to go in, but if anything else shows up by the end of the first week of May, I'll pipe up. Olaf -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir at lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax From tamarin at relsoft.co.za Mon Apr 21 10:45:05 2008 From: tamarin at relsoft.co.za (Alizadeh Dirollo) Date: Mon, 21 Apr 2008 17:45:05 +0000 Subject: [ofa-general] halmstad Message-ID: <2949143920.20080421173132@relsoft.co.za> Oi, Inccrease Sexual Eneergy and Pleasuree! http://znrreof7w5jmj.blogspot.com Said the devil had flown away with her, others the practice of his religious duties. The king phoebe. But phoebe saw there was something rhoda i ? No, you did not kill the cat. you did not that they had explored. But, happily, there were answered the blueeyed maiden, for, unless i greatly he did not in fact know what kind of help he expected mad by any measly pidog. But you can look after were only halfway down that incline when one tree company. The arrangement suggested was one that of the advantages which i derived from her favourher contentment. Old matthew gibbs, having in his downstairs as usual and prepared breakfast. When i have been told all my life that if a person was to the north of the village, and the mountainous. islclmjnjlaaagdgmj. -------------- next part -------------- An HTML attachment was scrubbed... URL: From terrywatson at live.com Mon Apr 21 04:09:49 2008 From: terrywatson at live.com (terry watson) Date: Mon, 21 Apr 2008 11:09:49 +0000 Subject: ***SPAM*** RE: [ofa-general] Is IBIS only for querying OpenSM? In-Reply-To: <1208545938.26936.365.camel@hrosenstock-ws.xsigo.com> References: <48084F4E.3020705@cea.fr> <1208529471.26936.303.camel@hrosenstock-ws.xsigo.com> <1208545938.26936.365.camel@hrosenstock-ws.xsigo.com> Message-ID: The test system I am looking at uses an ethernet interconnect for the MPI control channel (i.e. mpirun via ssh/tcp, etc) and uses the Infiniband interconnect for the actual MPI communication. The ethernet interconnect is VLAN'ed between cluster A and B and therefore mpirun via ssh cannot be used to send the 'out of band' mpi control commands. There are a couple of attack paths focused on the Infiniband interconnect that I can see (with my limited IB / MPI knowledge) to attempt to demonstrate that the partitioning can be bypassed and data from another partition could be seen or nodes accessed. 1) Attempt to *directly* communicate with another node via MPI (uDAPL?) bypassing the need for mpirun/ssh. 2) Attempt to 'sniff' or dump packets or data from the local HCA that has had its partition membership changed in an effort to capture data being seen by the HCA. I haven't seen any evidence this is possible via IB. I started getting hopeful that it would be straight-forward, as changing partition membership seemed viable. However, things are starting to get a little more complicated :) On the assumption that partition membership can be changed successfuly using ibis, I suppose I am simply trying to access another node on the same partition, without any IP access (IPoIB, or TCP/IP for MPI control communication). Thanks, Dave> Subject: RE: ***SPAM*** RE: [ofa-general] Is IBIS only for querying OpenSM?> From: hrosenstock at xsigo.com> To: terrywatson at live.com> CC: philippe.gregoire at cea.fr; general at lists.openfabrics.org> Date: Fri, 18 Apr 2008 12:12:18 -0700> > Terry,> > On Fri, 2008-04-18 at 15:25 +0000, terry watson wrote:> > Thanks Hal. I appreciate using the SM is the correct means of controlling partitioning; however, the testing I am performing is assessing security vulnerabilities. In this case, the two clusters are separated by partitioning only and I am seeking to assess the ability of a user to obtain unauthorised access to one cluster from the other. The requirement for the vendor building the two clusters was that they were isolated from each other. They have chosen to use one switch and I have to assess if this provides adequate isolation, as per the client's security requirements.> > > > At this stage of my investigation, I do not believe partitioning on a switch provides adequate separation / isolation to be used as a security control and two physical switches will need to be used to provide the complete isolation that is required. But my task is to prove this to justify the expense.... :) > > > > I value any comments or input on this topic.> > One pertinent thing here is whether a MKey manager is supported in the> SM, and if so, what level of MKeying is used. Sufficient MKey protection> with a sophisticated manager could make the updates of such PKey tables> difficult but not impossible. Currently, OpenSM does not support an MKey> manager but one is being proposed for the next OFED cycle. Currently,> OpenSM supports a static configured MKey and MKey lease period which> could make things marginally better if you are concerned with rogue> updates like this. Not sure about the third party (vendor) SMs in this> regard. Contact your vendor if this is of interest.> > -- Hal> > > ----------------------------------------> > > Subject: Re: ***SPAM*** RE: [ofa-general] Is IBIS only for querying OpenSM?> > > From: hrosenstock at xsigo.com> > > To: terrywatson at live.com> > > CC: philippe.gregoire at cea.fr; general at lists.openfabrics.org> > > Date: Fri, 18 Apr 2008 07:37:51 -0700> > > > > > Terry,> > > > > > On Fri, 2008-04-18 at 09:38 +0000, terry watson wrote:> > >> Thanks for the response. The environment I am testing has two clusters and one switch, > > >> with the subnet manager running from the switch. Half the nodes are in one partition and > > >> half in the other (ignoring 0xffff), call them partitions A and B. I have access to one > > >> node in partition A as root and would like to be able to reconfigure that node locally, > > >> and with no access to the switch subnet manager configuration, to be able to access nodes > > >> in partition B.> > > > > > In general, this is not a good idea IMO. As Philippe wrote, the SM (is> > > supposed to) own the writing of those tables (rather than some low level> > > diag utility). Even if you modify the local PKey table, it is possible> > > for the SM to overwrite this. Also, there are several other> > > ramifications of this depending on how the SM deals with partitions.> > > Even if you change things locally, that may not be sufficient as the> > > peer switch port may do partition filtering so that may need to change> > > that too and possible more PKey tables in the network depending on what> > > your SM does. Also, there are SA responses that depend on the SM having> > > correct knowledge (like PathRecords and others) so the end node may not> > > get any response on that partition for certain things.> > > > > >> After some reading I believe that IBIS from IBUtils should allow me to alter the > > >> local p_key table and therefore allow me to access nodes on partition B.> > > > > > Yes but it may take more than this for it to work depending on your SM.> > > > > >> I cannot test this until I am on-site and I am formulating a strategy before arrival. > > >> If it does not work this way it would be useful to know in advance. MPI is used rather than IPoIB. > > > > > > Some MPIs use out of band mechanisms to create connections so the SA> > > issues may not apply there; but I think the partition ones might and are> > > SM dependent so your mileage may vary...> > > > > >> If my approach is flawed I would appreciate it if someone could point this out.> > > > > > The proper way to do this is by reconfiguring your SM.> > > > > > -- Hal> > > > > >> ________________________________> > >>> Date: Fri, 18 Apr 2008 09:35:42 +0200> > >>> From: philippe.gregoire at cea.fr> > >>> To: terrywatson at live.com> > >>> CC: general at lists.openfabrics.org> > >>> Subject: Re: [ofa-general] Is IBIS only for querying OpenSM?> > >>> > > >>> terry watson a écrit :> > >>> > > >>> Hi all,> > >>> > > >>> I will be performing some testing of partitioning used as a security control. Am I right in believing that IBIS will be able to set partition table values of the local compute node I am logged on to, even though they are not using OpenSM, but rather a SM on a switch? Could I then attempt to access a partition that I was originally excluded from accessing?> > >>> > > >>> I am new to Infiniband technology and would also appreciate a response from an expert who has views on the strength of the security that partitioning provides in separating two clusters that should have no interaction whatsoever.> > >>> > > >>> Thanks,> > >>> Dave> > >>> _________________________________________________________________> > >>> Discover the new Windows Vista> > >>> http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE_______________________________________________> > >>> general mailing list> > >>> general at lists.openfabrics.org> > >> _________________________________________________________________> > >> News, entertainment and everything you care about at Live.com. Get it now!> > >> http://www.live.com/getstarted.aspx_______________________________________________> > >> general mailing list> > >> general at lists.openfabrics.org> > >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general> > >> > > >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general> > > > > > > _________________________________________________________________> > Connect to the next generation of MSN Messenger > > http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline> _________________________________________________________________ Connect to the next generation of MSN Messenger  http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.kirch at oracle.com Mon Apr 21 12:26:55 2008 From: olaf.kirch at oracle.com (Olaf Kirch) Date: Mon, 21 Apr 2008 21:26:55 +0200 Subject: [ofa-general] Oddities with RDMA CM private data Message-ID: <200804212126.55898.olaf.kirch@oracle.com> I looked into the private_data chunk being exchanged during rdma_cm connection setup today, and there's something odd. I'm sending 8 bytes of data, but in the event handlers I get sizes such as 56, and 196. I haven't tracked it down, but my first suspicion would be that the code in cma.c adds its own private data, but forgets to decrement the data_len fields prior to calling the ULP event handler. Am I misunderstanding the semantics of private_data_len? Olaf -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir at lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax From sean.hefty at intel.com Mon Apr 21 12:34:14 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 21 Apr 2008 12:34:14 -0700 Subject: [ofa-general] Oddities with RDMA CM private data In-Reply-To: <200804212126.55898.olaf.kirch@oracle.com> References: <200804212126.55898.olaf.kirch@oracle.com> Message-ID: <001c01c8a3e6$ae92f010$9b37170a@amr.corp.intel.com> >I looked into the private_data chunk being exchanged during rdma_cm >connection setup today, and there's something odd. I'm sending 8 bytes >of data, but in the event handlers I get sizes such as 56, and 196. >I haven't tracked it down, but my first suspicion would be that the >code in cma.c adds its own private data, but forgets to decrement >the data_len fields prior to calling the ULP event handler. > >Am I misunderstanding the semantics of private_data_len? On the receive side of the rdma_cm, the length of the private data sent by the user is unknown. All that's known is the size of the data that was received. For IB, this includes padded space to make the underlying CM MAD 256 bytes long. >From the rdma_get_cm_event man page: private_data_len The size of the private data buffer. Users should note that the size of the private data buffer may be larger than the amount of private data sent by the remote side. Any additional space in the buffer will be zeroed out. Basically, there isn't a data_len field that's carried in the connection message. Adding one would have required consuming some of the private data to carry it. - Sean From sjets_transformers at att.net Mon Apr 21 12:38:01 2008 From: sjets_transformers at att.net (jard douglas) Date: Mon, 21 Apr 2008 19:38:01 +0000 Subject: [ofa-general] Forget Retailers, Enjoy Direct Wholesale Prices on Designer Footwear Shoes Heels Boots Gucci Fendi Bally Dior Message-ID: <000901c8a3f6$05782f82$f8296c96@jstlbefa> Hey have you heard? Finally, the 2008 Collections are in, enjoy 70% OFF Brand Name Shoes & Boots for Men & Women from TOP Fashion Designers. Choose from a variety of the season's hottest models from Gucci, Prada, Chanel, Dior, Ugg Boots, Burberry, D&G, Dsquared & much more. Enter and Save TODAY! Free International Shipping on ALL ORDERS! Click here! Make your way here & Save Today! NoW That's an AMAZING Offer! -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Mon Apr 21 14:53:51 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 21 Apr 2008 14:53:51 -0700 Subject: [ofa-general] Problem with libibverbs and huge pages registration. In-Reply-To: <20080421141441.GF7771@minantech.com> (Gleb Natapov's message of "Mon, 21 Apr 2008 17:14:41 +0300") References: <20080421141441.GF7771@minantech.com> Message-ID: > ibv_reg_mr() fails if I try to register a memory region backed by a > huge page, but is not aligned to huge page boundary. Digging deeper I > see that libibverbs aligns memory region to a regular page size and > calls madvise() and the call fails. See program below to reproduce. > The program assumes that hugetlbfs is mounted on /huge and there is at > least one huge page available. I am not use it is possible to know if a > memory buffer is backed by huge page to solve the problem. Hmm, not sure off the top of my head how we should deal with this. > Another issue with libibverbs is that after first ibv_reg_mr() fails the > second registration attempt of the same buffer succeed since > ibv_madvise_range() doesn't cleanup after madvice failure and thinks > that memory is already "madvised". I guess we shouldn't change the refcnt until after we know if madvise has succeeded or not. Does the patch below help? I'm not sure if this is a good enough fix -- we might have split up a node and want to remerge it if the madvise fails... rolling back is a little tricky... I think this will take a little more thought. - R. --- a/src/memory.c +++ b/src/memory.c @@ -506,8 +506,6 @@ static int ibv_madvise_range(void *base, size_t size, int advice) __mm_add(tmp); } - node->refcnt += inc; - if ((inc == -1 && node->refcnt == 0) || (inc == 1 && node->refcnt == 1)) { /* @@ -532,6 +530,8 @@ static int ibv_madvise_range(void *base, size_t size, int advice) goto out; } + node->refcnt += inc; + node = __mm_next(node); } From ralph.campbell at qlogic.com Mon Apr 21 15:30:03 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Mon, 21 Apr 2008 15:30:03 -0700 Subject: [ofa-general] Re: ofed_kernel git tree for OFED-1.4 (based on 2.6.25-rc7) In-Reply-To: <4805F692.1040101@dev.mellanox.co.il> References: <4805F692.1040101@dev.mellanox.co.il> Message-ID: <1208817003.2232.16.camel@brick.pathscale.com> I have been busier than I thought. I guess the best thing to do is delete the ipath fixes and backport patches for now and then when you pull from 2.6.26, we can create new backport patches and fixes. On Wed, 2008-04-16 at 15:52 +0300, Vladimir Sokolovsky wrote: > Hi Ralph, > I prepared ofed_kernel git tree: git://git.openfabrics.org/ofed_1_4/linux-2.6.git branch ofed_kernel. > This tree merged with 2.6.25-rc7. > Currently ofed_scripts/ofed_makedist.sh fails on ipath_0180_header_file_changes_to_support_IBA7220.patch: > > > ./ofed_scripts/ofed_makedist.sh > > git clone -q -s -n /local/scm/ofed-1.4/linux-2.6 /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11 > Initialized empty Git repository in /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11/.git/ > pushd /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11 > /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11 /local/scm/ofed-1.4/linux-2.6 /local/scm/ofed-1.4/linux-2.6/ofed_scripts/ofed_checkout.sh 3bb85a2f1c15d1e58cd8b0b2da0577a3ab98977a > cdbdfc5cc29c4add1a2d6967b137a3347112a199 >> /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log > /local/scm/ofed-1.4/linux-2.6/ofed_scripts/ofed_patch.sh --with-backport=2.6.11 >> /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log > Failed executing /local/scm/ofed-1.4/linux-2.6/ofed_scripts/ofed_patch.sh --with-backport=2.6.11 >> /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log > Hunk #7 FAILED at 565. > Hunk #8 succeeded at 582 (offset 1 line). > Hunk #9 succeeded at 595 (offset 1 line). > Hunk #10 FAILED at 613. > Hunk #11 succeeded at 719 (offset 2 lines). > Hunk #12 FAILED at 857. > 3 out of 12 hunks FAILED -- rejects in file drivers/infiniband/hw/ipath/ipath_verbs.h > Patch ipath_0180_header_file_changes_to_support_IBA7220.patch does not apply (enforce with -f) > > Failed executing /usr/bin/quiltBuild failed in /tmp/build-ofed_kernel-d23175 See log file /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log > > Should ipath patches be removed from the git tree (kernel_patches/fixes/ipath*)? > > Regards, > Vladimir > > From sfr at canb.auug.org.au Mon Apr 21 17:24:24 2008 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 22 Apr 2008 10:24:24 +1000 Subject: [ofa-general] [PATCH] infiniband: class_device fallout Message-ID: <20080422102424.51f94b85.sfr@canb.auug.org.au> Signed-off-by: Stephen Rothwell --- drivers/infiniband/hw/ipath/ipath_verbs.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) This patch has been needed in linux-next since April 4 to fix an interaction between the driver-core patches and the infiniband tree. All the parties knew this was necessary. Today, Linus' tree has this build bug. *exasperated sigh* diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c index 466f3fb..6ac0c5c 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c @@ -2067,7 +2067,7 @@ int ipath_register_ib_device(struct ipath_devdata *dd) dev->phys_port_cnt = 1; dev->num_comp_vectors = 1; dev->dma_device = &dd->pcidev->dev; - dev->class_dev.dev = dev->dma_device; + dev->dev.parent = dev->dma_device; dev->query_device = ipath_query_device; dev->modify_device = ipath_modify_device; dev->query_port = ipath_query_port; -- 1.5.4.5 -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ From rdreier at cisco.com Mon Apr 21 18:26:03 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 21 Apr 2008 18:26:03 -0700 Subject: [ofa-general] [PATCH] infiniband: class_device fallout In-Reply-To: <20080422102424.51f94b85.sfr@canb.auug.org.au> (Stephen Rothwell's message of "Tue, 22 Apr 2008 10:24:24 +1000") References: <20080422102424.51f94b85.sfr@canb.auug.org.au> Message-ID: > This patch has been needed in linux-next since April 4 to fix an > interaction between the driver-core patches and the infiniband tree. All > the parties knew this was necessary. Today, Linus' tree has this build > bug. > > *exasperated sigh* Really sorry... I must have missed this when it went by, since I was actually unaware of the problem until Greg posted his patches for merging yesterday. But I tried to get this fixed before the patch was merged: http://lkml.org/lkml/2008/4/20/153 Anyway I'll ask Linus to pull my tree with the fix... - R. From rdreier at cisco.com Mon Apr 21 18:26:00 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 21 Apr 2008 18:26:00 -0700 Subject: [ofa-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will get a few fixes for various things, including one build fix for the ipath driver: Paul Bolle (1): IB/ipath: Fix module parameter description for disable_sma Roland Dreier (6): RDMA/nes: Remove unneeded function declarations IB/ipath: Remove reference to dev->class_dev IB/ipath: Build IBA7220 code unconditionally IB/ipath: Remove dependency on PCI_MSI || HT_IRQ IB/ipath: Remove tests of PCI_MSI in ipath_iba7220.c IB/ipath: Correct capitalization "IntX" -> "INTx" drivers/infiniband/hw/ipath/Kconfig | 2 +- drivers/infiniband/hw/ipath/Makefile | 6 ++++-- drivers/infiniband/hw/ipath/ipath_driver.c | 2 +- drivers/infiniband/hw/ipath/ipath_iba7220.c | 23 +++++++++-------------- drivers/infiniband/hw/ipath/ipath_verbs.c | 3 +-- drivers/infiniband/hw/nes/nes.c | 6 ------ drivers/infiniband/hw/nes/nes_nic.c | 9 --------- 7 files changed, 16 insertions(+), 35 deletions(-) From drjaykrew at gmail.com Mon Apr 21 17:39:00 2008 From: drjaykrew at gmail.com (viarga cilais ) Date: Tue, 22 Apr 2008 00:39:00 +0000 Subject: [ofa-general] 81% off for general Message-ID: <000501c8a420$0529e296$92d4f189@bruhdphj> Hello, make a right choice, purchase your pharmaceuticals from the most reliable supplier. http://www.google.it/pagead/iclk?sa=l&ai=iZtNNw&num=18175&adurl=http://trieu-exotics.com/redir.html Code #ctUz beaufort vason From swise at opengridcomputing.com Mon Apr 21 19:42:16 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 21 Apr 2008 21:42:16 -0500 Subject: [ofa-general] Re: [ewg] Agenda for the OFED meeting today In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903D375E4@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C903D375E4@mtlexch01.mtl.com> Message-ID: <480D5088.1020005@opengridcomputing.com> Hey Tziporet, Sorry I missed today's call. If possible, I'd like a few weeks to get the cxgb3 fixes tested and ready to go. That puts me around mid may. I'll try and pull that in to make a RC1 of May 6, but I'm thinking I might need another week or so. Steve. Tziporet Koren wrote: > Hi, > > This is the agenda for the OFED meeting today: > 1. OFED 1.3.1: > > 1.1 Planned changes: > > ULPs changes: > > IB-bonding - done > SRP failover - on work > SDP crashes - on work > RDS fixes for RDMA API - already applied but not clear > if these are all the changes > librdmacm 1.0.7 - done > Open MPI 1.2.6 - done > > Low level drivers: - each HW vendor should reply when the > changes will be ready > > nes > mlx4 > cxgb3 > Ipath > ehca > > 1.2 Schedule: > > GA is planned for May-29 > I suggest to have only two release candidates: > - RC1 - May 6 > - RC2 - May 20 > > Note: daily builds of 1.3.1 are already available at: > _http://www.openfabrics.org/builds/ofed-1.3.1_ > > > 2. OFED 1.4: > > Release features were presented at Sonoma (presentation available > at _http://www.openfabrics.org/archives/april2008sonoma.htm_) > > Kernel tree is under work at: > git://git.openfabrics.org/ofed_1_4/linux-2.6.git branch ofed_kernel > Now failing on ipath drivers - waiting for an update. > > We should try to get the kernel code to compile as soon as > possible so everybody will be able to contribute code > > 3. Follow up from Sonoma - open discussion > > > Tziporet > > > ------------------------------------------------------------------------ > > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg From rusty at rustcorp.com.au Mon Apr 21 22:06:24 2008 From: rusty at rustcorp.com.au (Rusty Russell) Date: Tue, 22 Apr 2008 15:06:24 +1000 Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen In-Reply-To: References: Message-ID: <200804221506.26226.rusty@rustcorp.com.au> On Wednesday 09 April 2008 01:44:04 Andrea Arcangeli wrote: > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1050,6 +1050,15 @@ > unsigned long addr, unsigned long len, > unsigned long flags, struct page **pages); > > +struct mm_lock_data { > + spinlock_t **i_mmap_locks; > + spinlock_t **anon_vma_locks; > + unsigned long nr_i_mmap_locks; > + unsigned long nr_anon_vma_locks; > +}; > +extern struct mm_lock_data *mm_lock(struct mm_struct * mm); > +extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data); As far as I can tell you don't actually need to expose this struct at all? > + data->i_mmap_locks = vmalloc(nr_i_mmap_locks * > + sizeof(spinlock_t)); This is why non-typesafe allocators suck. You want 'sizeof(spinlock_t *)' here. > + data->anon_vma_locks = vmalloc(nr_anon_vma_locks * > + sizeof(spinlock_t)); and here. > + err = -EINTR; > + i_mmap_lock_last = NULL; > + nr_i_mmap_locks = 0; > + for (;;) { > + spinlock_t *i_mmap_lock = (spinlock_t *) -1UL; > + for (vma = mm->mmap; vma; vma = vma->vm_next) { ... > + data->i_mmap_locks[nr_i_mmap_locks++] = i_mmap_lock; > + } > + data->nr_i_mmap_locks = nr_i_mmap_locks; How about you track your running counter in data->nr_i_mmap_locks, leave nr_i_mmap_locks alone, and BUG_ON(data->nr_i_mmap_locks != nr_i_mmap_locks)? Even nicer would be to wrap this in a "get_sorted_mmap_locks()" function. Similarly for anon_vma locks. Unfortunately, I just don't think we can fail locking like this. In your next patch unregistering a notifier can fail because of it: that not usable. I think it means you need to add a linked list element to the vma for the CONFIG_MMU_NOTIFIER case. Or track the max number of vmas for any mm, and keep a pool to handle mm_lock for this number (ie. if you can't enlarge the pool, fail the vma allocation). Both have their problems though... Rusty. From mashirle at us.ibm.com Mon Apr 21 14:19:23 2008 From: mashirle at us.ibm.com (Shirley Ma) Date: Mon, 21 Apr 2008 14:19:23 -0700 Subject: [ofa-general] arp or ip patch to build a neigh permanent entry for IPoIB Message-ID: <1208812763.22166.4.camel@localhost.localdomain> Hello, I am debugging an ipoib ping problem on a cluster. The arp, ip command don't support using 20 bytes HW to build a permanent entry manually. Can someone give me the pointer to the patch if any? Thanks in advance! Shirley From olaf.kirch at oracle.com Mon Apr 21 23:03:12 2008 From: olaf.kirch at oracle.com (Olaf Kirch) Date: Tue, 22 Apr 2008 08:03:12 +0200 Subject: [ofa-general] Oddities with RDMA CM private data In-Reply-To: <001c01c8a3e6$ae92f010$9b37170a@amr.corp.intel.com> References: <200804212126.55898.olaf.kirch@oracle.com> <001c01c8a3e6$ae92f010$9b37170a@amr.corp.intel.com> Message-ID: <200804220803.13101.olaf.kirch@oracle.com> On Monday 21 April 2008 21:34:14 Sean Hefty wrote: > On the receive side of the rdma_cm, the length of the private data sent by the > user is unknown. All that's known is the size of the data that was received. > For IB, this includes padded space to make the underlying CM MAD 256 bytes long. > From the rdma_get_cm_event man page: Ah, thanks a lot for clarifying this! Regards, Olaf -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir at lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax From yevgenyp at mellanox.co.il Mon Apr 21 23:32:00 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Tue, 22 Apr 2008 09:32:00 +0300 Subject: [ofa-general][PATCH] mlx4: Moving db management to mlx4_core (MP support, Patch 1) Message-ID: <480D8660.3060001@mellanox.co.il> >From d0d0ac877ab47f3a8a5f1564e5c48f53245583b9 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Mon, 21 Apr 2008 10:10:01 +0300 Subject: [PATCH] mlx4: Moving db management to mlx4_core mlx4_ib is no longer the only customer of mlx4_core. Thus the doorbell allocation was moved to the low level driver (same as buffer allocation). Signed-off-by: Yevgeny Petrilin --- drivers/infiniband/hw/mlx4/cq.c | 6 +- drivers/infiniband/hw/mlx4/doorbell.c | 131 +-------------------------------- drivers/infiniband/hw/mlx4/main.c | 3 - drivers/infiniband/hw/mlx4/mlx4_ib.h | 33 +------- drivers/infiniband/hw/mlx4/qp.c | 6 +- drivers/infiniband/hw/mlx4/srq.c | 6 +- drivers/net/mlx4/alloc.c | 111 ++++++++++++++++++++++++++++ drivers/net/mlx4/main.c | 3 + drivers/net/mlx4/mlx4.h | 3 + include/linux/mlx4/device.h | 41 ++++++++++ 10 files changed, 175 insertions(+), 168 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 3557e7e..5e570bb 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -204,7 +204,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector uar = &to_mucontext(context)->uar; } else { - err = mlx4_ib_db_alloc(dev, &cq->db, 1); + err = mlx4_db_alloc(dev->dev, &cq->db, 1); if (err) goto err_cq; @@ -250,7 +250,7 @@ err_mtt: err_db: if (!context) - mlx4_ib_db_free(dev, &cq->db); + mlx4_db_free(dev->dev, &cq->db); err_cq: kfree(cq); @@ -435,7 +435,7 @@ int mlx4_ib_destroy_cq(struct ib_cq *cq) ib_umem_release(mcq->umem); } else { mlx4_ib_free_cq_buf(dev, &mcq->buf, cq->cqe + 1); - mlx4_ib_db_free(dev, &mcq->db); + mlx4_db_free(dev->dev, &mcq->db); } kfree(mcq); diff --git a/drivers/infiniband/hw/mlx4/doorbell.c b/drivers/infiniband/hw/mlx4/doorbell.c index 1c36087..d17b36b 100644 --- a/drivers/infiniband/hw/mlx4/doorbell.c +++ b/drivers/infiniband/hw/mlx4/doorbell.c @@ -34,135 +34,10 @@ #include "mlx4_ib.h" -struct mlx4_ib_db_pgdir { - struct list_head list; - DECLARE_BITMAP(order0, MLX4_IB_DB_PER_PAGE); - DECLARE_BITMAP(order1, MLX4_IB_DB_PER_PAGE / 2); - unsigned long *bits[2]; - __be32 *db_page; - dma_addr_t db_dma; -}; - -static struct mlx4_ib_db_pgdir *mlx4_ib_alloc_db_pgdir(struct mlx4_ib_dev *dev) -{ - struct mlx4_ib_db_pgdir *pgdir; - - pgdir = kzalloc(sizeof *pgdir, GFP_KERNEL); - if (!pgdir) - return NULL; - - bitmap_fill(pgdir->order1, MLX4_IB_DB_PER_PAGE / 2); - pgdir->bits[0] = pgdir->order0; - pgdir->bits[1] = pgdir->order1; - pgdir->db_page = dma_alloc_coherent(dev->ib_dev.dma_device, - PAGE_SIZE, &pgdir->db_dma, - GFP_KERNEL); - if (!pgdir->db_page) { - kfree(pgdir); - return NULL; - } - - return pgdir; -} - -static int mlx4_ib_alloc_db_from_pgdir(struct mlx4_ib_db_pgdir *pgdir, - struct mlx4_ib_db *db, int order) -{ - int o; - int i; - - for (o = order; o <= 1; ++o) { - i = find_first_bit(pgdir->bits[o], MLX4_IB_DB_PER_PAGE >> o); - if (i < MLX4_IB_DB_PER_PAGE >> o) - goto found; - } - - return -ENOMEM; - -found: - clear_bit(i, pgdir->bits[o]); - - i <<= o; - - if (o > order) - set_bit(i ^ 1, pgdir->bits[order]); - - db->u.pgdir = pgdir; - db->index = i; - db->db = pgdir->db_page + db->index; - db->dma = pgdir->db_dma + db->index * 4; - db->order = order; - - return 0; -} - -int mlx4_ib_db_alloc(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db, int order) -{ - struct mlx4_ib_db_pgdir *pgdir; - int ret = 0; - - mutex_lock(&dev->pgdir_mutex); - - list_for_each_entry(pgdir, &dev->pgdir_list, list) - if (!mlx4_ib_alloc_db_from_pgdir(pgdir, db, order)) - goto out; - - pgdir = mlx4_ib_alloc_db_pgdir(dev); - if (!pgdir) { - ret = -ENOMEM; - goto out; - } - - list_add(&pgdir->list, &dev->pgdir_list); - - /* This should never fail -- we just allocated an empty page: */ - WARN_ON(mlx4_ib_alloc_db_from_pgdir(pgdir, db, order)); - -out: - mutex_unlock(&dev->pgdir_mutex); - - return ret; -} - -void mlx4_ib_db_free(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db) -{ - int o; - int i; - - mutex_lock(&dev->pgdir_mutex); - - o = db->order; - i = db->index; - - if (db->order == 0 && test_bit(i ^ 1, db->u.pgdir->order0)) { - clear_bit(i ^ 1, db->u.pgdir->order0); - ++o; - } - - i >>= o; - set_bit(i, db->u.pgdir->bits[o]); - - if (bitmap_full(db->u.pgdir->order1, MLX4_IB_DB_PER_PAGE / 2)) { - dma_free_coherent(dev->ib_dev.dma_device, PAGE_SIZE, - db->u.pgdir->db_page, db->u.pgdir->db_dma); - list_del(&db->u.pgdir->list); - kfree(db->u.pgdir); - } - - mutex_unlock(&dev->pgdir_mutex); -} - -struct mlx4_ib_user_db_page { - struct list_head list; - struct ib_umem *umem; - unsigned long user_virt; - int refcnt; -}; - int mlx4_ib_db_map_user(struct mlx4_ib_ucontext *context, unsigned long virt, - struct mlx4_ib_db *db) + struct mlx4_db *db) { - struct mlx4_ib_user_db_page *page; + struct mlx4_user_db_page *page; struct ib_umem_chunk *chunk; int err = 0; @@ -202,7 +77,7 @@ out: return err; } -void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_ib_db *db) +void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_db *db) { mutex_lock(&context->db_page_mutex); diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 136c76c..3c7f938 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -548,9 +548,6 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) goto err_uar; MLX4_INIT_DOORBELL_LOCK(&ibdev->uar_lock); - INIT_LIST_HEAD(&ibdev->pgdir_list); - mutex_init(&ibdev->pgdir_mutex); - ibdev->dev = dev; strlcpy(ibdev->ib_dev.name, "mlx4_%d", IB_DEVICE_NAME_MAX); diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index 9e63732..5cf9947 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -43,24 +43,6 @@ #include #include -enum { - MLX4_IB_DB_PER_PAGE = PAGE_SIZE / 4 -}; - -struct mlx4_ib_db_pgdir; -struct mlx4_ib_user_db_page; - -struct mlx4_ib_db { - __be32 *db; - union { - struct mlx4_ib_db_pgdir *pgdir; - struct mlx4_ib_user_db_page *user_page; - } u; - dma_addr_t dma; - int index; - int order; -}; - struct mlx4_ib_ucontext { struct ib_ucontext ibucontext; struct mlx4_uar uar; @@ -88,7 +70,7 @@ struct mlx4_ib_cq { struct mlx4_cq mcq; struct mlx4_ib_cq_buf buf; struct mlx4_ib_cq_resize *resize_buf; - struct mlx4_ib_db db; + struct mlx4_db db; spinlock_t lock; struct mutex resize_mutex; struct ib_umem *umem; @@ -127,7 +109,7 @@ struct mlx4_ib_qp { struct mlx4_qp mqp; struct mlx4_buf buf; - struct mlx4_ib_db db; + struct mlx4_db db; struct mlx4_ib_wq rq; u32 doorbell_qpn; @@ -154,7 +136,7 @@ struct mlx4_ib_srq { struct ib_srq ibsrq; struct mlx4_srq msrq; struct mlx4_buf buf; - struct mlx4_ib_db db; + struct mlx4_db db; u64 *wrid; spinlock_t lock; int head; @@ -175,9 +157,6 @@ struct mlx4_ib_dev { struct mlx4_dev *dev; void __iomem *uar_map; - struct list_head pgdir_list; - struct mutex pgdir_mutex; - struct mlx4_uar priv_uar; u32 priv_pdn; MLX4_DECLARE_DOORBELL_LOCK(uar_lock); @@ -248,11 +227,9 @@ static inline struct mlx4_ib_ah *to_mah(struct ib_ah *ibah) return container_of(ibah, struct mlx4_ib_ah, ibah); } -int mlx4_ib_db_alloc(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db, int order); -void mlx4_ib_db_free(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db); int mlx4_ib_db_map_user(struct mlx4_ib_ucontext *context, unsigned long virt, - struct mlx4_ib_db *db); -void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_ib_db *db); + struct mlx4_db *db); +void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_db *db); struct ib_mr *mlx4_ib_get_dma_mr(struct ib_pd *pd, int acc); int mlx4_ib_umem_write_mtt(struct mlx4_ib_dev *dev, struct mlx4_mtt *mtt, diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index b75efae..80ea8b9 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -514,7 +514,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, goto err; if (!init_attr->srq) { - err = mlx4_ib_db_alloc(dev, &qp->db, 0); + err = mlx4_db_alloc(dev->dev, &qp->db, 0); if (err) goto err; @@ -580,7 +580,7 @@ err_buf: err_db: if (!pd->uobject && !init_attr->srq) - mlx4_ib_db_free(dev, &qp->db); + mlx4_db_free(dev->dev, &qp->db); err: return err; @@ -666,7 +666,7 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp, kfree(qp->rq.wrid); mlx4_buf_free(dev->dev, qp->buf_size, &qp->buf); if (!qp->ibqp.srq) - mlx4_ib_db_free(dev, &qp->db); + mlx4_db_free(dev->dev, &qp->db); } } diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c index beaa3b0..2046197 100644 --- a/drivers/infiniband/hw/mlx4/srq.c +++ b/drivers/infiniband/hw/mlx4/srq.c @@ -129,7 +129,7 @@ struct ib_srq *mlx4_ib_create_srq(struct ib_pd *pd, if (err) goto err_mtt; } else { - err = mlx4_ib_db_alloc(dev, &srq->db, 0); + err = mlx4_db_alloc(dev->dev, &srq->db, 0); if (err) goto err_srq; @@ -200,7 +200,7 @@ err_buf: err_db: if (!pd->uobject) - mlx4_ib_db_free(dev, &srq->db); + mlx4_db_free(dev->dev, &srq->db); err_srq: kfree(srq); @@ -267,7 +267,7 @@ int mlx4_ib_destroy_srq(struct ib_srq *srq) kfree(msrq->wrid); mlx4_buf_free(dev->dev, msrq->msrq.max << msrq->msrq.wqe_shift, &msrq->buf); - mlx4_ib_db_free(dev, &msrq->db); + mlx4_db_free(dev->dev, &msrq->db); } kfree(msrq); diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c index 75ef9d0..43c6d04 100644 --- a/drivers/net/mlx4/alloc.c +++ b/drivers/net/mlx4/alloc.c @@ -196,3 +196,114 @@ void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf) } } EXPORT_SYMBOL_GPL(mlx4_buf_free); + +static struct mlx4_db_pgdir *mlx4_alloc_db_pgdir(struct device *dma_device) +{ + struct mlx4_db_pgdir *pgdir; + + pgdir = kzalloc(sizeof *pgdir, GFP_KERNEL); + if (!pgdir) + return NULL; + + bitmap_fill(pgdir->order1, MLX4_DB_PER_PAGE / 2); + pgdir->bits[0] = pgdir->order0; + pgdir->bits[1] = pgdir->order1; + pgdir->db_page = dma_alloc_coherent(dma_device, PAGE_SIZE, + &pgdir->db_dma, GFP_KERNEL); + if (!pgdir->db_page) { + kfree(pgdir); + return NULL; + } + + return pgdir; +} + +static int mlx4_alloc_db_from_pgdir(struct mlx4_db_pgdir *pgdir, + struct mlx4_db *db, int order) +{ + int o; + int i; + + for (o = order; o <= 1; ++o) { + i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o); + if (i < MLX4_DB_PER_PAGE >> o) + goto found; + } + + return -ENOMEM; + +found: + clear_bit(i, pgdir->bits[o]); + + i <<= o; + + if (o > order) + set_bit(i ^ 1, pgdir->bits[order]); + + db->u.pgdir = pgdir; + db->index = i; + db->db = pgdir->db_page + db->index; + db->dma = pgdir->db_dma + db->index * 4; + db->order = order; + + return 0; +} + +int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + struct mlx4_db_pgdir *pgdir; + int ret = 0; + + mutex_lock(&priv->pgdir_mutex); + + list_for_each_entry(pgdir, &priv->pgdir_list, list) + if (!mlx4_alloc_db_from_pgdir(pgdir, db, order)) + goto out; + + pgdir = mlx4_alloc_db_pgdir(&(dev->pdev->dev)); + if (!pgdir) { + ret = -ENOMEM; + goto out; + } + + list_add(&pgdir->list, &priv->pgdir_list); + + /* This should never fail -- we just allocated an empty page: */ + WARN_ON(mlx4_alloc_db_from_pgdir(pgdir, db, order)); + +out: + mutex_unlock(&priv->pgdir_mutex); + + return ret; +} +EXPORT_SYMBOL_GPL(mlx4_db_alloc); + +void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + int o; + int i; + + mutex_lock(&priv->pgdir_mutex); + + o = db->order; + i = db->index; + + if (db->order == 0 && test_bit(i ^ 1, db->u.pgdir->order0)) { + clear_bit(i ^ 1, db->u.pgdir->order0); + ++o; + } + i >>= o; + set_bit(i, db->u.pgdir->bits[o]); + + if (bitmap_full(db->u.pgdir->order1, MLX4_DB_PER_PAGE / 2)) { + dma_free_coherent(&(dev->pdev->dev), PAGE_SIZE, + db->u.pgdir->db_page, db->u.pgdir->db_dma); + list_del(&db->u.pgdir->list); + kfree(db->u.pgdir); + } + + mutex_unlock(&priv->pgdir_mutex); +} +EXPORT_SYMBOL_GPL(mlx4_db_free); diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 49a4aca..a6aa49f 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -798,6 +798,9 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id) INIT_LIST_HEAD(&priv->ctx_list); spin_lock_init(&priv->ctx_lock); + INIT_LIST_HEAD(&priv->pgdir_list); + mutex_init(&priv->pgdir_mutex); + /* * Now reset the HCA before we touch the PCI capabilities or * attempt a firmware command, since a boot ROM may have left diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index 7333681..a4023c2 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -257,6 +257,9 @@ struct mlx4_priv { struct list_head ctx_list; spinlock_t ctx_lock; + struct list_head pgdir_list; + struct mutex pgdir_mutex; + struct mlx4_fw fw; struct mlx4_cmd cmd; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index ff7df1a..9c87dd3 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -37,6 +37,8 @@ #include #include +#include + #include enum { @@ -208,6 +210,37 @@ struct mlx4_mtt { int page_shift; }; +enum { + MLX4_DB_PER_PAGE = PAGE_SIZE / 4 +}; + +struct mlx4_db_pgdir { + struct list_head list; + DECLARE_BITMAP(order0, MLX4_DB_PER_PAGE); + DECLARE_BITMAP(order1, MLX4_DB_PER_PAGE / 2); + unsigned long *bits[2]; + __be32 *db_page; + dma_addr_t db_dma; +}; + +struct mlx4_user_db_page { + struct list_head list; + struct ib_umem *umem; + unsigned long user_virt; + int refcnt; +}; + +struct mlx4_db { + __be32 *db; + union { + struct mlx4_db_pgdir *pgdir; + struct mlx4_user_db_page *user_page; + } u; + dma_addr_t dma; + int index; + int order; +}; + struct mlx4_mr { struct mlx4_mtt mtt; u64 iova; @@ -341,6 +374,9 @@ int mlx4_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt, int mlx4_buf_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt, struct mlx4_buf *buf); +int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order); +void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db); + int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq); void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq); -- 1.5.4 From yevgenyp at mellanox.co.il Mon Apr 21 23:33:57 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Tue, 22 Apr 2008 09:33:57 +0300 Subject: [ofa-general][PATCH] mlx4_core: HW queues resource management (MP support, Patch 2) Message-ID: <480D86D5.30504@mellanox.co.il> >From 3b15a6bba9cb79805198f64985433a33a3a096dc Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Mon, 21 Apr 2008 11:06:41 +0300 Subject: [PATCH] mlx4_core: HW queues resource management Added HW queues management API. Wraps buffer and doorbell allocation and mtt write. Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/alloc.c | 44 +++++++++++++++++++++++++++++++++++++++++++ include/linux/mlx4/device.h | 11 ++++++++++ 2 files changed, 55 insertions(+), 0 deletions(-) diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c index 43c6d04..f36d79e 100644 --- a/drivers/net/mlx4/alloc.c +++ b/drivers/net/mlx4/alloc.c @@ -307,3 +307,47 @@ void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db) mutex_unlock(&priv->pgdir_mutex); } EXPORT_SYMBOL_GPL(mlx4_db_free); + +int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres, + int size, int max_direct) +{ + int err; + + err = mlx4_db_alloc(dev, &wqres->db, 1); + if (err) + return err; + *wqres->db.db = 0; + + if (mlx4_buf_alloc(dev, size, max_direct, &wqres->buf)) { + err = -ENOMEM; + goto err_db; + } + + err = mlx4_mtt_init(dev, wqres->buf.npages, wqres->buf.page_shift, + &wqres->mtt); + if (err) + goto err_buf; + err = mlx4_buf_write_mtt(dev, &wqres->mtt, &wqres->buf); + if (err) + goto err_mtt; + + return 0; + +err_mtt: + mlx4_mtt_cleanup(dev, &wqres->mtt); +err_buf: + mlx4_buf_free(dev, size, &wqres->buf); +err_db: + mlx4_db_free(dev, &wqres->db); + return err; +} +EXPORT_SYMBOL_GPL(mlx4_alloc_hwq_res); + +void mlx4_free_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres, + int size) +{ + mlx4_mtt_cleanup(dev, &wqres->mtt); + mlx4_buf_free(dev, size, &wqres->buf); + mlx4_db_free(dev, &wqres->db); +} +EXPORT_SYMBOL_GPL(mlx4_free_hwq_res); diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index d5fb774..0505732 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -241,6 +241,12 @@ struct mlx4_db { int order; }; +struct mlx4_hwq_resources { + struct mlx4_db db; + struct mlx4_mtt mtt; + struct mlx4_buf buf; +}; + struct mlx4_mr { struct mlx4_mtt mtt; u64 iova; @@ -377,6 +383,11 @@ int mlx4_buf_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt, int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order); void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db); +int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres, + int size, int max_direct); +void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres, + int size); + int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq); void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq); -- 1.5.4 From yevgenyp at mellanox.co.il Mon Apr 21 23:35:51 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Tue, 22 Apr 2008 09:35:51 +0300 Subject: [ofa-general][PATCH] mlx4: Qp range reservation (MP support, Patch 3) Message-ID: <480D8747.1080108@mellanox.co.il> >From 3978a59af72fddb9b98156a7ecf9018b8bf5b076 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Mon, 21 Apr 2008 13:26:14 +0300 Subject: [PATCH] mlx4: Qp range reservation Prior to allocating a qp, one need to reserve an aligned range of qps. The change is made to enable allocation of consecutive qps. Signed-off-by: Yevgeny Petrilin --- drivers/infiniband/hw/mlx4/qp.c | 9 +++++ drivers/net/mlx4/alloc.c | 77 ++++++++++++++++++++++++++++++++++++++- drivers/net/mlx4/mlx4.h | 2 + drivers/net/mlx4/qp.c | 44 ++++++++++++++++------- include/linux/mlx4/device.h | 5 ++- 5 files changed, 122 insertions(+), 15 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 80ea8b9..88aae1b 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -544,6 +544,11 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, } } + if (!sqpn) + err = mlx4_qp_reserve_range(dev->dev, 1, 1, &sqpn); + if (err) + goto err_wrid; + err = mlx4_qp_alloc(dev->dev, sqpn, &qp->mqp); if (err) goto err_wrid; @@ -654,6 +659,10 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp, mlx4_ib_unlock_cqs(send_cq, recv_cq); mlx4_qp_free(dev->dev, &qp->mqp); + + if (!is_sqp(dev, qp)) + mlx4_qp_release_range(dev->dev, qp->mqp.qpn, 1); + mlx4_mtt_cleanup(dev->dev, &qp->mtt); if (is_user) { diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c index f36d79e..4601506 100644 --- a/drivers/net/mlx4/alloc.c +++ b/drivers/net/mlx4/alloc.c @@ -73,7 +73,82 @@ void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj) spin_unlock(&bitmap->lock); } -int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved) +static unsigned long find_aligned_range(unsigned long *bitmap, + u32 start, u32 nbits, + int len, int align) +{ + unsigned long end, i; + +again: + start = ALIGN(start, align); + while ((start < nbits) && test_bit(start, bitmap)) + start += align; + if (start >= nbits) + return -1; + + end = start+len; + if (end > nbits) + return -1; + for (i = start+1; i < end; i++) { + if (test_bit(i, bitmap)) { + start = i+1; + goto again; + } + } + return start; +} + +u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align) +{ + u32 obj, i; + + if (likely(cnt == 1 && align == 1)) + return mlx4_bitmap_alloc(bitmap); + + spin_lock(&bitmap->lock); + + obj = find_aligned_range(bitmap->table, bitmap->last, + bitmap->max, cnt, align); + if (obj >= bitmap->max) { + bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask; + obj = find_aligned_range(bitmap->table, 0, + bitmap->max, + cnt, align); + } + + if (obj < bitmap->max) { + for (i = 0; i < cnt; i++) + set_bit(obj+i, bitmap->table); + if (obj == bitmap->last) { + bitmap->last = (obj + cnt); + if (bitmap->last >= bitmap->max) + bitmap->last = 0; + } + obj |= bitmap->top; + } else + obj = -1; + + spin_unlock(&bitmap->lock); + + return obj; +} + +void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt) +{ + u32 i; + + obj &= bitmap->max - 1; + + spin_lock(&bitmap->lock); + for (i = 0; i < cnt; i++) + clear_bit(obj+i, bitmap->table); + bitmap->last = min(bitmap->last, obj); + bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask; + spin_unlock(&bitmap->lock); +} + +int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, + u32 num, u32 mask, u32 reserved) { int i; diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index a4023c2..89d4ccc 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -287,6 +287,8 @@ static inline struct mlx4_priv *mlx4_priv(struct mlx4_dev *dev) u32 mlx4_bitmap_alloc(struct mlx4_bitmap *bitmap); void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj); +u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align); +void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt); int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved); void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap); diff --git a/drivers/net/mlx4/qp.c b/drivers/net/mlx4/qp.c index fa24e65..dff8e66 100644 --- a/drivers/net/mlx4/qp.c +++ b/drivers/net/mlx4/qp.c @@ -147,19 +147,42 @@ int mlx4_qp_modify(struct mlx4_dev *dev, struct mlx4_mtt *mtt, } EXPORT_SYMBOL_GPL(mlx4_qp_modify); -int mlx4_qp_alloc(struct mlx4_dev *dev, int sqpn, struct mlx4_qp *qp) +int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + struct mlx4_qp_table *qp_table = &priv->qp_table; + int qpn; + + qpn = mlx4_bitmap_alloc_range(&qp_table->bitmap, cnt, align); + if (qpn == -1) + return -ENOMEM; + + *base = qpn; + return 0; +} +EXPORT_SYMBOL_GPL(mlx4_qp_reserve_range); + +void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + struct mlx4_qp_table *qp_table = &priv->qp_table; + if (base_qpn < dev->caps.sqp_start + 8) + return; + + mlx4_bitmap_free_range(&qp_table->bitmap, base_qpn, cnt); +} +EXPORT_SYMBOL_GPL(mlx4_qp_release_range); + +int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp) { struct mlx4_priv *priv = mlx4_priv(dev); struct mlx4_qp_table *qp_table = &priv->qp_table; int err; - if (sqpn) - qp->qpn = sqpn; - else { - qp->qpn = mlx4_bitmap_alloc(&qp_table->bitmap); - if (qp->qpn == -1) - return -ENOMEM; - } + if (!qpn) + return -EINVAL; + + qp->qpn = qpn; err = mlx4_table_get(dev, &qp_table->qp_table, qp->qpn); if (err) @@ -208,9 +231,6 @@ err_put_qp: mlx4_table_put(dev, &qp_table->qp_table, qp->qpn); err_out: - if (!sqpn) - mlx4_bitmap_free(&qp_table->bitmap, qp->qpn); - return err; } EXPORT_SYMBOL_GPL(mlx4_qp_alloc); @@ -240,8 +260,6 @@ void mlx4_qp_free(struct mlx4_dev *dev, struct mlx4_qp *qp) mlx4_table_put(dev, &qp_table->auxc_table, qp->qpn); mlx4_table_put(dev, &qp_table->qp_table, qp->qpn); - if (qp->qpn >= dev->caps.sqp_start + 8) - mlx4_bitmap_free(&qp_table->bitmap, qp->qpn); } EXPORT_SYMBOL_GPL(mlx4_qp_free); diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 0505732..9c77bf3 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -392,7 +392,10 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq); void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq); -int mlx4_qp_alloc(struct mlx4_dev *dev, int sqpn, struct mlx4_qp *qp); +int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base); +void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt); + +int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp); void mlx4_qp_free(struct mlx4_dev *dev, struct mlx4_qp *qp); int mlx4_srq_alloc(struct mlx4_dev *dev, u32 pdn, struct mlx4_mtt *mtt, -- 1.5.4 From yevgenyp at mellanox.co.il Mon Apr 21 23:38:59 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Tue, 22 Apr 2008 09:38:59 +0300 Subject: [ofa-general][PATCH] mlx4: Prereserved Qp regions (MP support, Patch 4) Message-ID: <480D8803.1050404@mellanox.co.il> >From 2dd4f8abdedda736adca5818c98f7a67d339ba7e Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Mon, 21 Apr 2008 14:39:27 +0300 Subject: [PATCH] mlx4: Prereserved Qp regions. We reserve Qp ranges to be used by other modules in case the ports come up as Ethernet ports. The qps are reserved at the end of the QP table. (This way we assure that they are alligned to their size) We need to consider theese reserved ranges in bitmap creation : The effective max parameter. Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/alloc.c | 38 ++++++++++++++++-------- drivers/net/mlx4/fw.c | 5 +++ drivers/net/mlx4/fw.h | 2 + drivers/net/mlx4/main.c | 65 +++++++++++++++++++++++++++++++++++++++---- drivers/net/mlx4/mlx4.h | 4 ++ drivers/net/mlx4/qp.c | 55 ++++++++++++++++++++++++++++++++++-- include/linux/mlx4/device.h | 19 ++++++++++++- 7 files changed, 165 insertions(+), 23 deletions(-) diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c index 4601506..4b6074d 100644 --- a/drivers/net/mlx4/alloc.c +++ b/drivers/net/mlx4/alloc.c @@ -44,15 +44,18 @@ u32 mlx4_bitmap_alloc(struct mlx4_bitmap *bitmap) spin_lock(&bitmap->lock); - obj = find_next_zero_bit(bitmap->table, bitmap->max, bitmap->last); - if (obj >= bitmap->max) { + obj = find_next_zero_bit(bitmap->table, bitmap->effective_max, + bitmap->last); + if (obj >= bitmap->effective_max) { bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask; - obj = find_first_zero_bit(bitmap->table, bitmap->max); + obj = find_first_zero_bit(bitmap->table, bitmap->effective_max); } - if (obj < bitmap->max) { + if (obj < bitmap->effective_max) { set_bit(obj, bitmap->table); - bitmap->last = (obj + 1) & (bitmap->max - 1); + bitmap->last = (obj + 1); + if (bitmap->last == bitmap->effective_max) + bitmap->last = 0; obj |= bitmap->top; } else obj = -1; @@ -108,20 +111,20 @@ u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align) spin_lock(&bitmap->lock); obj = find_aligned_range(bitmap->table, bitmap->last, - bitmap->max, cnt, align); - if (obj >= bitmap->max) { + bitmap->effective_max, cnt, align); + if (obj >= bitmap->effective_max) { bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask; obj = find_aligned_range(bitmap->table, 0, - bitmap->max, + bitmap->effective_max, cnt, align); } - if (obj < bitmap->max) { + if (obj < bitmap->effective_max) { for (i = 0; i < cnt; i++) set_bit(obj+i, bitmap->table); if (obj == bitmap->last) { bitmap->last = (obj + cnt); - if (bitmap->last >= bitmap->max) + if (bitmap->last >= bitmap->effective_max) bitmap->last = 0; } obj |= bitmap->top; @@ -147,8 +150,9 @@ void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt) spin_unlock(&bitmap->lock); } -int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, - u32 num, u32 mask, u32 reserved) +int mlx4_bitmap_init_with_effective_max(struct mlx4_bitmap *bitmap, + u32 num, u32 mask, u32 reserved, + u32 effective_max) { int i; @@ -160,6 +164,7 @@ int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, bitmap->top = 0; bitmap->max = num; bitmap->mask = mask; + bitmap->effective_max = effective_max; spin_lock_init(&bitmap->lock); bitmap->table = kzalloc(BITS_TO_LONGS(num) * sizeof (long), GFP_KERNEL); if (!bitmap->table) @@ -171,6 +176,13 @@ int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, return 0; } +int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, + u32 num, u32 mask, u32 reserved) +{ + return mlx4_bitmap_init_with_effective_max(bitmap, num, mask, + reserved, num); +} + void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap) { kfree(bitmap->table); diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c index d82f275..b0ad0d1 100644 --- a/drivers/net/mlx4/fw.c +++ b/drivers/net/mlx4/fw.c @@ -325,6 +325,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) #define QUERY_PORT_MTU_OFFSET 0x01 #define QUERY_PORT_WIDTH_OFFSET 0x06 #define QUERY_PORT_MAX_GID_PKEY_OFFSET 0x07 +#define QUERY_PORT_MAX_MACVLAN_OFFSET 0x0a #define QUERY_PORT_MAX_VL_OFFSET 0x0b for (i = 1; i <= dev_cap->num_ports; ++i) { @@ -342,6 +343,10 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev_cap->max_pkeys[i] = 1 << (field & 0xf); MLX4_GET(field, outbox, QUERY_PORT_MAX_VL_OFFSET); dev_cap->max_vl[i] = field & 0xf; + MLX4_GET(field, outbox, QUERY_PORT_MAX_MACVLAN_OFFSET); + dev_cap->log_max_macs[i] = field & 0xf; + dev_cap->log_max_vlans[i] = field >> 4; + } } diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h index 306cb9b..a2e827c 100644 --- a/drivers/net/mlx4/fw.h +++ b/drivers/net/mlx4/fw.h @@ -97,6 +97,8 @@ struct mlx4_dev_cap { u32 reserved_lkey; u64 max_icm_sz; int max_gso_sz; + u8 log_max_macs[MLX4_MAX_PORTS + 1]; + u8 log_max_vlans[MLX4_MAX_PORTS + 1]; }; struct mlx4_adapter { diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index a6aa49f..f309532 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -85,6 +85,22 @@ static struct mlx4_profile default_profile = { .num_mtt = 1 << 20, }; +static int num_mac = 1; +module_param_named(num_mac, num_mac, int, 0444); +MODULE_PARM_DESC(num_mac, "Maximum number of MACs per ETH port " + "(1-127, default 1)"); + +static int num_vlan; +module_param_named(num_vlan, num_vlan, int, 0444); +MODULE_PARM_DESC(num_vlan, "Maximum number of VLANs per ETH port " + "(0-126, default 0)"); + +static int use_prio; +module_param_named(use_prio, use_prio, bool, 0444); +MODULE_PARM_DESC(use_prio, "Enable steering by VLAN priority on ETH ports " + "(0/1, default 0)"); + + static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) { int err; @@ -134,7 +150,6 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev->caps.max_rq_sg = dev_cap->max_rq_sg; dev->caps.max_wqes = dev_cap->max_qp_sz; dev->caps.max_qp_init_rdma = dev_cap->max_requester_per_qp; - dev->caps.reserved_qps = dev_cap->reserved_qps; dev->caps.max_srq_wqes = dev_cap->max_srq_sz; dev->caps.max_srq_sge = dev_cap->max_rq_sg - 1; dev->caps.reserved_srqs = dev_cap->reserved_srqs; @@ -161,6 +176,39 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev->caps.stat_rate_support = dev_cap->stat_rate_support; dev->caps.max_gso_sz = dev_cap->max_gso_sz; + dev->caps.log_num_macs = ilog2(roundup_pow_of_two(num_mac + 1)); + dev->caps.log_num_vlans = ilog2(roundup_pow_of_two(num_vlan + 2)); + dev->caps.log_num_prios = use_prio ? 3: 0; + + for (i = 1; i <= dev->caps.num_ports; ++i) { + if (dev->caps.log_num_macs > dev_cap->log_max_macs[i]) { + dev->caps.log_num_macs = dev_cap->log_max_macs[i]; + mlx4_warn(dev, "Requested number of MACs is too much " + "for port %d, reducing to %d.\n", + i, 1 << dev->caps.log_num_macs); + } + if (dev->caps.log_num_vlans > dev_cap->log_max_vlans[i]) { + dev->caps.log_num_vlans = dev_cap->log_max_vlans[i]; + mlx4_warn(dev, "Requested number of VLANs is too much " + "for port %d, reducing to %d.\n", + i, 1 << dev->caps.log_num_vlans); + } + } + + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW] = dev_cap->reserved_qps; + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_ETH_ADDR] = + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_ADDR] = + (1 << dev->caps.log_num_macs)* + (1 << dev->caps.log_num_vlans)* + (1 << dev->caps.log_num_prios)* + dev->caps.num_ports; + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH] = MLX4_NUM_FEXCH; + + dev->caps.reserved_qps = dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW] + + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_ETH_ADDR] + + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH] + + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH]; + return 0; } @@ -209,7 +257,8 @@ static int mlx4_init_cmpt_table(struct mlx4_dev *dev, u64 cmpt_base, ((u64) (MLX4_CMPT_TYPE_QP * cmpt_entry_sz) << MLX4_CMPT_SHIFT), cmpt_entry_sz, dev->caps.num_qps, - dev->caps.reserved_qps, 0, 0); + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], + 0, 0); if (err) goto err; @@ -334,7 +383,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap, init_hca->qpc_base, dev_cap->qpc_entry_sz, dev->caps.num_qps, - dev->caps.reserved_qps, 0, 0); + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], + 0, 0); if (err) { mlx4_err(dev, "Failed to map QP context memory, aborting.\n"); goto err_unmap_dmpt; @@ -344,7 +394,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap, init_hca->auxc_base, dev_cap->aux_entry_sz, dev->caps.num_qps, - dev->caps.reserved_qps, 0, 0); + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], + 0, 0); if (err) { mlx4_err(dev, "Failed to map AUXC context memory, aborting.\n"); goto err_unmap_qp; @@ -354,7 +405,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap, init_hca->altc_base, dev_cap->altc_entry_sz, dev->caps.num_qps, - dev->caps.reserved_qps, 0, 0); + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], + 0, 0); if (err) { mlx4_err(dev, "Failed to map ALTC context memory, aborting.\n"); goto err_unmap_auxc; @@ -364,7 +416,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap, init_hca->rdmarc_base, dev_cap->rdmarc_entry_sz << priv->qp_table.rdmarc_shift, dev->caps.num_qps, - dev->caps.reserved_qps, 0, 0); + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], + 0, 0); if (err) { mlx4_err(dev, "Failed to map RDMARC context memory, aborting\n"); goto err_unmap_altc; diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index 89d4ccc..b74405a 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -111,6 +111,7 @@ struct mlx4_bitmap { u32 last; u32 top; u32 max; + u32 effective_max; u32 mask; spinlock_t lock; unsigned long *table; @@ -290,6 +291,9 @@ void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj); u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align); void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt); int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved); +int mlx4_bitmap_init_with_effective_max(struct mlx4_bitmap *bitmap, + u32 num, u32 mask, u32 reserved, + u32 effective_max); void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap); int mlx4_reset(struct mlx4_dev *dev); diff --git a/drivers/net/mlx4/qp.c b/drivers/net/mlx4/qp.c index dff8e66..2d5be15 100644 --- a/drivers/net/mlx4/qp.c +++ b/drivers/net/mlx4/qp.c @@ -273,6 +273,7 @@ int mlx4_init_qp_table(struct mlx4_dev *dev) { struct mlx4_qp_table *qp_table = &mlx4_priv(dev)->qp_table; int err; + int reserved_from_top = 0; spin_lock_init(&qp_table->lock); INIT_RADIX_TREE(&dev->qp_table_tree, GFP_ATOMIC); @@ -282,9 +283,43 @@ int mlx4_init_qp_table(struct mlx4_dev *dev) * block of special QPs must be aligned to a multiple of 8, so * round up. */ - dev->caps.sqp_start = ALIGN(dev->caps.reserved_qps, 8); - err = mlx4_bitmap_init(&qp_table->bitmap, dev->caps.num_qps, - (1 << 24) - 1, dev->caps.sqp_start + 8); + dev->caps.sqp_start = + ALIGN(dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], 8); + + { + int sort[MLX4_QP_REGION_COUNT]; + int i, j, tmp; + int last_base = dev->caps.num_qps; + + for (i = 1; i < MLX4_QP_REGION_COUNT; ++i) + sort[i] = i; + + for (i = MLX4_QP_REGION_COUNT; i > 0; --i) { + for (j = 2; j < i; ++j) { + if (dev->caps.reserved_qps_cnt[sort[j]] > + dev->caps.reserved_qps_cnt[sort[j - 1]]) { + tmp = sort[j]; + sort[j] = sort[j - 1]; + sort[j - 1] = tmp; + } + } + } + + for (i = 1; i < MLX4_QP_REGION_COUNT; ++i) { + last_base -= dev->caps.reserved_qps_cnt[sort[i]]; + dev->caps.reserved_qps_base[sort[i]] = last_base; + reserved_from_top += + dev->caps.reserved_qps_cnt[sort[i]]; + } + + } + + err = mlx4_bitmap_init_with_effective_max(&qp_table->bitmap, + dev->caps.num_qps, + (1 << 23) - 1, + dev->caps.sqp_start + 8, + dev->caps.num_qps - + reserved_from_top); if (err) return err; @@ -297,6 +332,20 @@ void mlx4_cleanup_qp_table(struct mlx4_dev *dev) mlx4_bitmap_cleanup(&mlx4_priv(dev)->qp_table.bitmap); } +int mlx4_qp_get_region(struct mlx4_dev *dev, + enum qp_region region, + int *base_qpn, int *cnt) +{ + if ((region < 0) || (region >= MLX4_QP_REGION_COUNT)) + return -EINVAL; + + *base_qpn = dev->caps.reserved_qps_base[region]; + *cnt = dev->caps.reserved_qps_cnt[region]; + + return 0; +} +EXPORT_SYMBOL_GPL(mlx4_qp_get_region); + int mlx4_qp_query(struct mlx4_dev *dev, struct mlx4_qp *qp, struct mlx4_qp_context *context) { diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 9c77bf3..955eeca 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -135,6 +135,18 @@ enum { MLX4_STAT_RATE_OFFSET = 5 }; +enum qp_region { + MLX4_QP_REGION_FW = 0, + MLX4_QP_REGION_ETH_ADDR, + MLX4_QP_REGION_FC_ADDR, + MLX4_QP_REGION_FC_EXCH, + MLX4_QP_REGION_COUNT +}; + +enum { + MLX4_NUM_FEXCH = 64 * 1024, +}; + static inline u64 mlx4_fw_ver(u64 major, u64 minor, u64 subminor) { return (major << 32) | (minor << 16) | subminor; @@ -159,7 +171,6 @@ struct mlx4_caps { int max_rq_desc_sz; int max_qp_init_rdma; int max_qp_dest_rdma; - int reserved_qps; int sqp_start; int num_srqs; int max_srq_wqes; @@ -189,6 +200,12 @@ struct mlx4_caps { u16 stat_rate_support; u8 port_width_cap[MLX4_MAX_PORTS + 1]; int max_gso_sz; + int reserved_qps_cnt[MLX4_QP_REGION_COUNT]; + int reserved_qps; + int reserved_qps_base[MLX4_QP_REGION_COUNT]; + int log_num_macs; + int log_num_vlans; + int log_num_prios; }; struct mlx4_buf_list { -- 1.5.4 From yevgenyp at mellanox.co.il Mon Apr 21 23:49:26 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Tue, 22 Apr 2008 09:49:26 +0300 Subject: [ofa-general][PATCH] mlx4: Different port type support (MP support, Patch 5) Message-ID: <480D8A76.10301@mellanox.co.il> >From 0d3da6ad682c4655cd909aefe5bc294c55f5f711 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Mon, 21 Apr 2008 17:40:57 +0300 Subject: [PATCH] mlx4: Different port type support Multiprotocol supports different port types. The port types are delivered through module parameters, crossed with firmware capabilities. Each consumer of mlx4_core should query for supported port types, mlx4_ib can no longer assume that all phisical ports belong to it. Signed-off-by: Yevgeny Petrilin --- drivers/infiniband/hw/mlx4/mad.c | 6 +- drivers/infiniband/hw/mlx4/main.c | 12 ++++- drivers/infiniband/hw/mlx4/mlx4_ib.h | 2 + drivers/net/mlx4/fw.c | 4 ++ drivers/net/mlx4/fw.h | 1 + drivers/net/mlx4/main.c | 84 ++++++++++++++++++++++++++++++++++ include/linux/mlx4/device.h | 32 +++++++++++++ 7 files changed, 136 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c index 4c1e72f..d91ba56 100644 --- a/drivers/infiniband/hw/mlx4/mad.c +++ b/drivers/infiniband/hw/mlx4/mad.c @@ -297,7 +297,7 @@ int mlx4_ib_mad_init(struct mlx4_ib_dev *dev) int p, q; int ret; - for (p = 0; p < dev->dev->caps.num_ports; ++p) + for (p = 0; p < dev->num_ports; ++p) for (q = 0; q <= 1; ++q) { agent = ib_register_mad_agent(&dev->ib_dev, p + 1, q ? IB_QPT_GSI : IB_QPT_SMI, @@ -313,7 +313,7 @@ int mlx4_ib_mad_init(struct mlx4_ib_dev *dev) return 0; err: - for (p = 0; p < dev->dev->caps.num_ports; ++p) + for (p = 0; p < dev->num_ports; ++p) for (q = 0; q <= 1; ++q) if (dev->send_agent[p][q]) ib_unregister_mad_agent(dev->send_agent[p][q]); @@ -326,7 +326,7 @@ void mlx4_ib_mad_cleanup(struct mlx4_ib_dev *dev) struct ib_mad_agent *agent; int p, q; - for (p = 0; p < dev->dev->caps.num_ports; ++p) { + for (p = 0; p < dev->num_ports; ++p) { for (q = 0; q <= 1; ++q) { agent = dev->send_agent[p][q]; dev->send_agent[p][q] = NULL; diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 3c7f938..507dbe3 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -549,11 +549,15 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) MLX4_INIT_DOORBELL_LOCK(&ibdev->uar_lock); ibdev->dev = dev; + ibdev->ports_map = mlx4_get_ports_of_type(dev, MLX4_PORT_TYPE_IB); strlcpy(ibdev->ib_dev.name, "mlx4_%d", IB_DEVICE_NAME_MAX); ibdev->ib_dev.owner = THIS_MODULE; ibdev->ib_dev.node_type = RDMA_NODE_IB_CA; - ibdev->ib_dev.phys_port_cnt = dev->caps.num_ports; + ibdev->num_ports = 0; + mlx4_foreach_port(i, ibdev->ports_map) + ibdev->num_ports++; + ibdev->ib_dev.phys_port_cnt = ibdev->num_ports; ibdev->ib_dev.num_comp_vectors = 1; ibdev->ib_dev.dma_device = &dev->pdev->dev; @@ -667,7 +671,7 @@ static void mlx4_ib_remove(struct mlx4_dev *dev, void *ibdev_ptr) struct mlx4_ib_dev *ibdev = ibdev_ptr; int p; - for (p = 1; p <= dev->caps.num_ports; ++p) + for (p = 1; p <= ibdev->num_ports; ++p) mlx4_CLOSE_PORT(dev, p); mlx4_ib_mad_cleanup(ibdev); @@ -682,6 +686,10 @@ static void mlx4_ib_event(struct mlx4_dev *dev, void *ibdev_ptr, enum mlx4_dev_event event, int port) { struct ib_event ibev; + struct mlx4_ib_dev *ibdev = to_mdev((struct ib_device *) ibdev_ptr); + + if (port > ibdev->num_ports) + return; switch (event) { case MLX4_DEV_EVENT_PORT_UP: diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index 5cf9947..9d4f7a7 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -155,6 +155,8 @@ struct mlx4_ib_ah { struct mlx4_ib_dev { struct ib_device ib_dev; struct mlx4_dev *dev; + u32 ports_map; + int num_ports; void __iomem *uar_map; struct mlx4_uar priv_uar; diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c index b0ad0d1..e875b08 100644 --- a/drivers/net/mlx4/fw.c +++ b/drivers/net/mlx4/fw.c @@ -322,6 +322,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev_cap->max_pkeys[i] = 1 << (field & 0xf); } } else { +#define QUERY_PORT_SUPPORTED_TYPE_OFFSET 0x00 #define QUERY_PORT_MTU_OFFSET 0x01 #define QUERY_PORT_WIDTH_OFFSET 0x06 #define QUERY_PORT_MAX_GID_PKEY_OFFSET 0x07 @@ -334,6 +335,9 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) if (err) goto out; + MLX4_GET(field, outbox, + QUERY_PORT_SUPPORTED_TYPE_OFFSET); + dev_cap->supported_port_types[i] = field & 3; MLX4_GET(field, outbox, QUERY_PORT_MTU_OFFSET); dev_cap->max_mtu[i] = field & 0xf; MLX4_GET(field, outbox, QUERY_PORT_WIDTH_OFFSET); diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h index a2e827c..50a6a7d 100644 --- a/drivers/net/mlx4/fw.h +++ b/drivers/net/mlx4/fw.h @@ -97,6 +97,7 @@ struct mlx4_dev_cap { u32 reserved_lkey; u64 max_icm_sz; int max_gso_sz; + u8 supported_port_types[MLX4_MAX_PORTS + 1]; u8 log_max_macs[MLX4_MAX_PORTS + 1]; u8 log_max_vlans[MLX4_MAX_PORTS + 1]; }; diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index f309532..1651d8e 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -100,11 +100,50 @@ module_param_named(use_prio, use_prio, bool, 0444); MODULE_PARM_DESC(use_prio, "Enable steering by VLAN priority on ETH ports " "(0/1, default 0)"); +static char *port_type_arr[MLX4_MAX_PORTS] = { [0 ... (MLX4_MAX_PORTS-1)] = "ib"}; +module_param_array_named(port_type, port_type_arr, charp, NULL, 0444); +MODULE_PARM_DESC(port_type, "Ports L2 type (ib/eth/auto, entry per port, " + "comma seperated, default ib for all)"); + +static int mlx4_check_port_params(struct mlx4_dev *dev, + enum mlx4_port_type *port_type) +{ + if (port_type[0] != port_type[1] && + !(dev->caps.flags & MLX4_DEV_CAP_FLAG_DPDP)) { + mlx4_err(dev, "Only same port types supported " + "on this HCA, aborting.\n"); + return -EINVAL; + } + if ((port_type[0] == MLX4_PORT_TYPE_ETH) && + (port_type[1] == MLX4_PORT_TYPE_IB)) { + mlx4_err(dev, "eth-ib configuration is not supported.\n"); + return -EINVAL; + } + return 0; +} + +static void mlx4_str2port_type(char **port_str, + enum mlx4_port_type *port_type) +{ + int i; + + for (i = 0; i < MLX4_MAX_PORTS; i++) { + if (!strcmp(port_str[i], "eth")) + port_type[i] = MLX4_PORT_TYPE_ETH; + else + port_type[i] = MLX4_PORT_TYPE_IB; + } +} + + static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) { int err; int i; + enum mlx4_port_type port_type[MLX4_MAX_PORTS]; + + mlx4_str2port_type(port_type_arr, port_type); err = mlx4_QUERY_DEV_CAP(dev, dev_cap); if (err) { @@ -180,7 +219,24 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev->caps.log_num_vlans = ilog2(roundup_pow_of_two(num_vlan + 2)); dev->caps.log_num_prios = use_prio ? 3: 0; + err = mlx4_check_port_params(dev, port_type); + if (err) + return err; + for (i = 1; i <= dev->caps.num_ports; ++i) { + if (!dev_cap->supported_port_types[i]) { + mlx4_warn(dev, "FW doesn't support Multi Protocol, " + "loading IB only\n"); + dev->caps.port_type[i] = MLX4_PORT_TYPE_IB; + continue; + } + if (port_type[i-1] & dev_cap->supported_port_types[i]) + dev->caps.port_type[i] = port_type[i-1]; + else { + mlx4_err(dev, "Requested port type for port %d " + "not supported by HW\n", i); + return -ENODEV; + } if (dev->caps.log_num_macs > dev_cap->log_max_macs[i]) { dev->caps.log_num_macs = dev_cap->log_max_macs[i]; mlx4_warn(dev, "Requested number of MACs is too much " @@ -1004,10 +1060,38 @@ static struct pci_driver mlx4_driver = { .remove = __devexit_p(mlx4_remove_one) }; +static int __init mlx4_verify_params(void) +{ + int i; + + for (i = 0; i < MLX4_MAX_PORTS; ++i) { + if (strcmp(port_type_arr[i], "eth") && + strcmp(port_type_arr[i], "ib")) { + printk(KERN_WARNING "mlx4_core: bad port_type for " + "port %d: %s\n", i, port_type_arr[i]); + return -1; + } + } + if ((num_mac < 1) || (num_mac > 127)) { + printk(KERN_WARNING "mlx4_core: bad num_mac: %d\n", num_mac); + return -1; + } + + if ((num_vlan < 0) || (num_vlan > 126)) { + printk(KERN_WARNING "mlx4_core: bad num_vlan: %d\n", num_vlan); + return -1; + } + + return 0; +} + static int __init mlx4_init(void) { int ret; + if (mlx4_verify_params()) + return -EINVAL; + ret = mlx4_catas_init(); if (ret) return ret; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 955eeca..4279b2f 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -62,6 +62,7 @@ enum { MLX4_DEV_CAP_FLAG_IPOIB_CSUM = 1 << 7, MLX4_DEV_CAP_FLAG_BAD_PKEY_CNTR = 1 << 8, MLX4_DEV_CAP_FLAG_BAD_QKEY_CNTR = 1 << 9, + MLX4_DEV_CAP_FLAG_DPDP = 1 << 12, MLX4_DEV_CAP_FLAG_MEM_WINDOW = 1 << 16, MLX4_DEV_CAP_FLAG_APM = 1 << 17, MLX4_DEV_CAP_FLAG_ATOMIC = 1 << 18, @@ -143,6 +144,11 @@ enum qp_region { MLX4_QP_REGION_COUNT }; +enum mlx4_port_type { + MLX4_PORT_TYPE_IB = 1 << 0, + MLX4_PORT_TYPE_ETH = 1 << 1, +}; + enum { MLX4_NUM_FEXCH = 64 * 1024, }; @@ -206,6 +212,7 @@ struct mlx4_caps { int log_num_macs; int log_num_vlans; int log_num_prios; + enum mlx4_port_type port_type[MLX4_MAX_PORTS + 1]; }; struct mlx4_buf_list { @@ -365,6 +372,31 @@ struct mlx4_init_port_param { u64 si_guid; }; +static inline void mlx4_query_steer_cap(struct mlx4_dev *dev, int *log_mac, + int *log_vlan, int *log_prio) +{ + *log_mac = dev->caps.log_num_macs; + *log_vlan = dev->caps.log_num_vlans; + *log_prio = dev->caps.log_num_prios; +} + +static inline u32 mlx4_get_ports_of_type(struct mlx4_dev *dev, + enum mlx4_port_type ptype) +{ + u32 ret = 0; + int i; + + for (i = 1; i <= dev->caps.num_ports; ++i) { + if (dev->caps.port_type[i] == ptype) + ret |= 1 << (i-1); + } + return ret; +} + +#define mlx4_foreach_port(port, bitmap) \ + for ((port) = 1; (port) <= MLX4_MAX_PORTS; (port)++) \ + if (bitmap & 1 << ((port)-1)) + int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct, struct mlx4_buf *buf); void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf); -- 1.5.4 From andrea at qumranet.com Tue Apr 22 00:20:26 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 22 Apr 2008 09:20:26 +0200 Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12 In-Reply-To: <20080409185500.GT11364@sgi.com> References: <20080409131709.GR11364@sgi.com> <20080409144401.GT10133@duo.random> <20080409185500.GT11364@sgi.com> Message-ID: <20080422072026.GM12709@duo.random> This is a followup of the locking of the mmu-notifier methods against the secondary-mmu page fault, each driver can implement differently but this is to show an example of what I planned for KVM, others may follow closely if they find this useful. I post this as pseudocode to hide 99% of kvm internal complexities and to focus only on the locking. The KVM locking scheme should be something on these lines: invalidate_range_start { spin_lock(&kvm->mmu_lock); kvm->invalidate_range_count++; rmap-invalidate of sptes in range spin_unlock(&kvm->mmu_lock) } invalidate_range_end { spin_lock(&kvm->mmu_lock); kvm->invalidate_range_count--; spin_unlock(&kvm->mmu_lock) } invalidate_page { spin_lock(&kvm->mmu_lock); write_seqlock() rmap-invalidate of sptes of page write_sequnlock() spin_unlock(&kvm->mmu_lock) } kvm_page_fault { seq = read_seqlock() get_user_pages() (aka gfn_to_pfn() in kvm terms) spin_lock(&kvm->mmu_lock) if (seq_trylock(seq) || kvm->invalidate_range_count) goto out; /* reply page fault */ map sptes and build rmap out: spin_unlock(&kvm->mmu_lock) } This will allow to remove the page pinning from KVM. I'd appreciate if you Robin and Christoph can have a second look and pinpoint any potential issue in my plan. invalidate_page as you can notice, allows to decrease the fixed cost overhead from all VM code that works with a single page and where freeing the page _after_ calling invalidate_page is zero runtime/tlb cost. We need invalidate_range_begin/end because when we work on multiple pages, we can reduce cpu utilization and avoid many tlb flushes by holding off the kvm page fault when we work on the range. invalidate_page also allows to decrease the window where the kvm page fault could possibly need to be replied (the ptep_clear_flush <-> invalidate_page window is shorter than a invalidate_range_begin(PAGE_SIZE) <-> invalidate_range_end(PAGE_SIZE)). So even if only as a microoptimization it worth it to decrease the impact on the common VM code. The cost of having both a seqlock and a range_count is irrlevant in kvm terms as they'll be in the same cacheline and checked at the same time by the page fault and it won't require any additional blocking (or writing) lock. Note that the kvm page fault can't happen unless the cpu switches to guest mode, and it can't switch to guest mode if we're in the begin/end critical section, so in theory I could loop inside the page fault too without risking deadlocking, but replying it by restarting guest mode sounds nicer in sigkill/scheduling terms. Soon I'll release a new mmu notifier patchset with patch 1 being the mmu-notifier-core self-included and ready to go in -mm and mainline in time for 2.6.26. Then I'll be glad to help merging any further patch in the patchset to allow methods to sleep so XPMEM can run on mainline 2.6.27 the same way GRU/KVM/Quadrics will run fine on 2.6.26, in a fully backwards compatible way with 2.6.26 (and of course it doesn't really need to be backwards compatible because this is a kernel internal API only, ask Greg etc... ;). But that will likely require a new config option to avoid hurting AIM performance in fork because the anon_vma critical sections are so short in the fast path. From ruimario at gmail.com Tue Apr 22 03:09:52 2008 From: ruimario at gmail.com (Rui Machado) Date: Tue, 22 Apr 2008 12:09:52 +0200 Subject: [ofa-general] beginner resources Message-ID: <6978b4af0804220309t1ae34185y83ba69f9bbfa309b@mail.gmail.com> Hi list, is this the right list to ask totally beginner questions (even code snippets) or is there any other resource for this matter? Thank you all, Rui -------------- next part -------------- An HTML attachment was scrubbed... URL: From glebn at voltaire.com Tue Apr 22 04:14:13 2008 From: glebn at voltaire.com (Gleb Natapov) Date: Tue, 22 Apr 2008 14:14:13 +0300 Subject: [ofa-general] Problem with libibverbs and huge pages registration. In-Reply-To: References: <20080421141441.GF7771@minantech.com> Message-ID: <20080422111412.GH7771@minantech.com> On Mon, Apr 21, 2008 at 02:53:51PM -0700, Roland Dreier wrote: > > ibv_reg_mr() fails if I try to register a memory region backed by a > > huge page, but is not aligned to huge page boundary. Digging deeper I > > see that libibverbs aligns memory region to a regular page size and > > calls madvise() and the call fails. See program below to reproduce. > > The program assumes that hugetlbfs is mounted on /huge and there is at > > least one huge page available. I am not use it is possible to know if a > > memory buffer is backed by huge page to solve the problem. > > Hmm, not sure off the top of my head how we should deal with this. Me too :( > > > Another issue with libibverbs is that after first ibv_reg_mr() fails the > > second registration attempt of the same buffer succeed since > > ibv_madvise_range() doesn't cleanup after madvice failure and thinks > > that memory is already "madvised". > > I guess we shouldn't change the refcnt until after we know if madvise > has succeeded or not. Does the patch below help? I'm not sure if this > is a good enough fix -- we might have split up a node and want to > remerge it if the madvise fails... rolling back is a little tricky... I > think this will take a little more thought. > > - R. > > --- a/src/memory.c > +++ b/src/memory.c > @@ -506,8 +506,6 @@ static int ibv_madvise_range(void *base, size_t size, int advice) > __mm_add(tmp); > } > > - node->refcnt += inc; > - I suppose "if" below depends on updated refcnt, so update can't be moved down without changing the "if" statement. > if ((inc == -1 && node->refcnt == 0) || > (inc == 1 && node->refcnt == 1)) { > /* > @@ -532,6 +530,8 @@ static int ibv_madvise_range(void *base, size_t size, int advice) > goto out; > } > > + node->refcnt += inc; > + > node = __mm_next(node); > } > -- Gleb. From andrea at qumranet.com Tue Apr 22 05:00:56 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 22 Apr 2008 14:00:56 +0200 Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12 In-Reply-To: <20080422072026.GM12709@duo.random> References: <20080409131709.GR11364@sgi.com> <20080409144401.GT10133@duo.random> <20080409185500.GT11364@sgi.com> <20080422072026.GM12709@duo.random> Message-ID: <20080422120056.GR12709@duo.random> On Tue, Apr 22, 2008 at 09:20:26AM +0200, Andrea Arcangeli wrote: > invalidate_range_start { > spin_lock(&kvm->mmu_lock); > > kvm->invalidate_range_count++; > rmap-invalidate of sptes in range > write_seqlock; write_sequnlock; > spin_unlock(&kvm->mmu_lock) > } > > invalidate_range_end { > spin_lock(&kvm->mmu_lock); > > kvm->invalidate_range_count--; write_seqlock; write_sequnlock; > > spin_unlock(&kvm->mmu_lock) > } Robin correctly pointed out by PM there should be a seqlock in range_begin/end too like corrected above. I guess it's better to use an explicit sequence counter so we avoid an useless spinlock of the write_seqlock (mmu_lock is enough already in all places) and so we can increase it with a single op with +=2 in the range_begin/end. The above is a lower-perf version of the final locking but simpler for reading purposes. From bongos9 at bew-energie.de Tue Apr 22 05:49:29 2008 From: bongos9 at bew-energie.de (Harris Donahue) Date: Tue, 22 Apr 2008 13:49:29 +0100 Subject: [ofa-general] Negroes admire with the of the size - we will surpass them! Message-ID: <827063866.80325943804728@bew-energie.de> What you can exp mg ect First month you will notice an inc rt re mtb ase in p ow en yww is si pt ze of up to 1/2 in lor ch, you will also notice an in tc cre fb ase in se bu xu ww al desire, st kmr ron vr ger er jtv ecti ky ons and more enjoyable s rm e bw x. Second month you will notice an in ukn crea lv se in p qcz en mj is si rza ze of up to 1 inc zqd hes, plus an in ou cre df ase in Gir ygt th (Wid fdc th) of 5%, plus all the benefits of the first month. Third/Forth month you will notice an inc sbp rease in pe hoe nis si pgv ze of up to 3 inc xpk hes, plus an incre px ase in Gi ibu rth (Wid mi th) of 10%, plus all the benefits of the first month. Fifth/Sixth month you will notice an in ges cre qvg ase in p ub en yj is si pas ze of up to 4 in inr ch ip es, plus a i ap ncre zvl ase in Gir far th (Wi wh dth) of 20%, plus all the benefits of the first month. CLI umg CK HE cmp RE!!! -------------- next part -------------- An HTML attachment was scrubbed... URL: From holt at sgi.com Tue Apr 22 06:01:20 2008 From: holt at sgi.com (Robin Holt) Date: Tue, 22 Apr 2008 08:01:20 -0500 Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12 In-Reply-To: <20080422120056.GR12709@duo.random> References: <20080409131709.GR11364@sgi.com> <20080409144401.GT10133@duo.random> <20080409185500.GT11364@sgi.com> <20080422072026.GM12709@duo.random> <20080422120056.GR12709@duo.random> Message-ID: <20080422130120.GR22493@sgi.com> On Tue, Apr 22, 2008 at 02:00:56PM +0200, Andrea Arcangeli wrote: > On Tue, Apr 22, 2008 at 09:20:26AM +0200, Andrea Arcangeli wrote: > > invalidate_range_start { > > spin_lock(&kvm->mmu_lock); > > > > kvm->invalidate_range_count++; > > rmap-invalidate of sptes in range > > > > write_seqlock; write_sequnlock; I don't think you need it here since invalidate_range_count is already elevated which will accomplish the same effect. Thanks, Robin From t.jimenez at julsa.e.telefonica.net Tue Apr 22 04:20:04 2008 From: t.jimenez at julsa.e.telefonica.net (harvey rowan) Date: Tue, 22 Apr 2008 11:20:04 +0000 Subject: [ofa-general] general's naked video Message-ID: <000801c8a479$05c39900$559128b3@khprgj> Take a look at yourself :) HKHztnxLQA -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrea at qumranet.com Tue Apr 22 06:21:43 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 22 Apr 2008 15:21:43 +0200 Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12 In-Reply-To: <20080422130120.GR22493@sgi.com> References: <20080409131709.GR11364@sgi.com> <20080409144401.GT10133@duo.random> <20080409185500.GT11364@sgi.com> <20080422072026.GM12709@duo.random> <20080422120056.GR12709@duo.random> <20080422130120.GR22493@sgi.com> Message-ID: <20080422132143.GS12709@duo.random> On Tue, Apr 22, 2008 at 08:01:20AM -0500, Robin Holt wrote: > On Tue, Apr 22, 2008 at 02:00:56PM +0200, Andrea Arcangeli wrote: > > On Tue, Apr 22, 2008 at 09:20:26AM +0200, Andrea Arcangeli wrote: > > > invalidate_range_start { > > > spin_lock(&kvm->mmu_lock); > > > > > > kvm->invalidate_range_count++; > > > rmap-invalidate of sptes in range > > > > > > > write_seqlock; write_sequnlock; > > I don't think you need it here since invalidate_range_count is already > elevated which will accomplish the same effect. Agreed, seqlock only in range_end should be enough. BTW, the fact seqlock is needed regardless of invalidate_page existing or not, really makes invalidate_page a no brainer not just from the core VM point of view, but from the driver point of view too. The kvm_page_fault logic would be the same even if I remove invalidate_page from the mmu notifier patch but it'd run slower both when armed and disarmed. From holt at sgi.com Tue Apr 22 06:36:04 2008 From: holt at sgi.com (Robin Holt) Date: Tue, 22 Apr 2008 08:36:04 -0500 Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12 In-Reply-To: <20080422132143.GS12709@duo.random> References: <20080409131709.GR11364@sgi.com> <20080409144401.GT10133@duo.random> <20080409185500.GT11364@sgi.com> <20080422072026.GM12709@duo.random> <20080422120056.GR12709@duo.random> <20080422130120.GR22493@sgi.com> <20080422132143.GS12709@duo.random> Message-ID: <20080422133604.GN30298@sgi.com> On Tue, Apr 22, 2008 at 03:21:43PM +0200, Andrea Arcangeli wrote: > On Tue, Apr 22, 2008 at 08:01:20AM -0500, Robin Holt wrote: > > On Tue, Apr 22, 2008 at 02:00:56PM +0200, Andrea Arcangeli wrote: > > > On Tue, Apr 22, 2008 at 09:20:26AM +0200, Andrea Arcangeli wrote: > > > > invalidate_range_start { > > > > spin_lock(&kvm->mmu_lock); > > > > > > > > kvm->invalidate_range_count++; > > > > rmap-invalidate of sptes in range > > > > > > > > > > write_seqlock; write_sequnlock; > > > > I don't think you need it here since invalidate_range_count is already > > elevated which will accomplish the same effect. > > Agreed, seqlock only in range_end should be enough. BTW, the fact I am a little confused about the value of the seq_lock versus a simple atomic, but I assumed there is a reason and left it at that. > seqlock is needed regardless of invalidate_page existing or not, > really makes invalidate_page a no brainer not just from the core VM > point of view, but from the driver point of view too. The > kvm_page_fault logic would be the same even if I remove > invalidate_page from the mmu notifier patch but it'd run slower both > when armed and disarmed. I don't know what you mean by "it'd" run slower and what you mean by "armed and disarmed". For the sake of this discussion, I will assume "it'd" means the kernel in general and not KVM. With the two call sites for range_begin/range_end, I would agree we have more call sites, but the second is extremely likely to be cache hot. By disarmed, I will assume you mean no notifiers registered for a particular mm. In that case, the cache will make the second call effectively free. So, for the disarmed case, I see no measurable difference. For the case where there is a notifier registered, I certainly can see a difference. I am not certain how to quantify the difference as it depends on the callee. In the case of xpmem, our callout is always very expensive for the _start case. Our _end case is very light, but it is essentially the exact same steps we would perform for the _page callout. When I was discussing this difference with Jack, he reminded me that the GRU, due to its hardware, does not have any race issues with the invalidate_page callout simply doing the tlb shootdown and not modifying any of its internal structures. He then put a caveat on the discussion that _either_ method was acceptable as far as he was concerned. The real issue is getting a patch in that satisfies all needs and not whether there is a seperate invalidate_page callout. Thanks, Robin From tziporet at dev.mellanox.co.il Tue Apr 22 06:44:53 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Tue, 22 Apr 2008 16:44:53 +0300 Subject: [ofa-general] Re: [ewg] Agenda for the OFED meeting today In-Reply-To: <480D5088.1020005@opengridcomputing.com> References: <6C2C79E72C305246B504CBA17B5500C903D375E4@mtlexch01.mtl.com> <480D5088.1020005@opengridcomputing.com> Message-ID: <480DEBD5.3030209@mellanox.co.il> Steve Wise wrote: > > Sorry I missed today's call. If possible, I'd like a few weeks to get > the cxgb3 fixes tested and ready to go. That puts me around mid may. > I'll try and pull that in to make a RC1 of May 6, but I'm thinking I > might need another week or so. > > Please try to make most of the code ready for May 6. You can add more modifications for RC2 which is May 20. Tziporet From andrea at qumranet.com Tue Apr 22 06:48:47 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 22 Apr 2008 15:48:47 +0200 Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12 In-Reply-To: <20080422133604.GN30298@sgi.com> References: <20080409131709.GR11364@sgi.com> <20080409144401.GT10133@duo.random> <20080409185500.GT11364@sgi.com> <20080422072026.GM12709@duo.random> <20080422120056.GR12709@duo.random> <20080422130120.GR22493@sgi.com> <20080422132143.GS12709@duo.random> <20080422133604.GN30298@sgi.com> Message-ID: <20080422134847.GT12709@duo.random> On Tue, Apr 22, 2008 at 08:36:04AM -0500, Robin Holt wrote: > I am a little confused about the value of the seq_lock versus a simple > atomic, but I assumed there is a reason and left it at that. There's no value for anything but get_user_pages (get_user_pages takes its own lock internally though). I preferred to explain it as a seqlock because it was simpler for reading, but I totally agree in the final implementation it shouldn't be a seqlock. My code was meant to be pseudo-code only. It doesn't even need to be atomic ;). > I don't know what you mean by "it'd" run slower and what you mean by > "armed and disarmed". 1) when armed the time-window where the kvm-page-fault would be blocked would be a bit larger without invalidate_page for no good reason 2) if you were to remove invalidate_page when disarmed the VM could would need two branches instead of one in various places I don't want to waste cycles if not wasting them improves performance both when armed and disarmed. > For the sake of this discussion, I will assume "it'd" means the kernel in > general and not KVM. With the two call sites for range_begin/range_end, I actually meant for both. > By disarmed, I will assume you mean no notifiers registered for a > particular mm. In that case, the cache will make the second call > effectively free. So, for the disarmed case, I see no measurable > difference. For rmap is sure effective free, for do_wp_page it costs one branch for no good reason. > For the case where there is a notifier registered, I certainly can see > a difference. I am not certain how to quantify the difference as it Agreed. > When I was discussing this difference with Jack, he reminded me that > the GRU, due to its hardware, does not have any race issues with the > invalidate_page callout simply doing the tlb shootdown and not modifying > any of its internal structures. He then put a caveat on the discussion > that _either_ method was acceptable as far as he was concerned. The real > issue is getting a patch in that satisfies all needs and not whether > there is a seperate invalidate_page callout. Sure, we have that patch now, I'll send it out in a minute, I was just trying to explain why it makes sense to have an invalidate_page too (which remains the only difference by now), removing it would be a regression on all sides, even if a minor one. From tziporet at mellanox.co.il Tue Apr 22 06:59:20 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 22 Apr 2008 16:59:20 +0300 Subject: [ofa-general] OFED April 21 meeting summary In-Reply-To: <458BC6B0F287034F92FE78908BD01CE831A08338@mtlexch01.mtl.com> Message-ID: <6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com> OFED April 21 meeting summary about 1.3.1 plans and OFED 1.4 development: > 1. OFED 1.3.1: > 1.1 Planned changes: > ULPs changes: > IB-bonding - done > SRP failover - on work > SDP crashes - on work > RDS fixes for RDMA API - done > librdmacm 1.0.7 - done > Open MPI 1.2.6 - done uDAPL - on work > Low level drivers: - each HW vendor should reply when the > changes will be ready nes - will be ready on first week of May mlx4 - fixes are ready; changes to support Eth are under review of the submission to kernel so not clear if they will make it on time. cxgb3 - will be ready by middle of may. Majority of changes should be submitted for RC1. ipath - wait for update from Betsy ehca - wait for update from Christoph > 1.2 Schedule: we agreed that 2 release candidate should be sufficient > GA is planned for May-29 > - RC1 - May 6 > - RC2 - May 20 > > Note: daily builds of 1.3.1 are already available at: > http://www.openfabrics.org/builds/ofed-1.3.1 > > > 2. OFED 1.4: > Release features were presented at Sonoma (presentation available at > http://www.openfabrics.org/archives/april2008sonoma.htm) IPv6: Woody is looking for resources to add IPv6 support to the CMA. Hal noted that it will require a change in opensm too. Xsigo Vnic & Vhba - Not clear if they will make it Kernel tree is under work at: git://git.openfabrics.org/ofed_1_4/linux-2.6.git branch ofed_kernel We should try to get the kernel code to compile as soon as possible so everybody will be able to contribute code. Schedule reminder: ============== Release: Oct 06, 2008 Features freeze: Jun 25, 08 (kernel 2.6.26 based) Alpha: Jul 9, 08 Beta: Jul 30, 08 kernel 2.6.27-rcX (assuming it will be available) RC1: Aug 13, 08 RC2: Aug 27, 08 RC3-RC5/6 - every 5-10 days Latest RC to be used in OFA interop event GA: Oct 06 08 > Tziporet > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yevgenyp at mellanox.co.il Tue Apr 22 07:05:38 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Tue, 22 Apr 2008 17:05:38 +0300 Subject: [ofa-general][PATCH] mlx4: Port Ethernet mtu capabilities handle (MP support, Patch 6) Message-ID: <480DF0B2.3020203@mellanox.co.il> >From a37cec875c323ddebe4f0289e4bab774fd9ec0f4 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Tue, 22 Apr 2008 13:25:19 +0300 Subject: [PATCH] mlx4: Port Ethernet mtu capabilities handle Ethernet max mtu and default Mac address are revealed through QUERY_DEV_CAP command. The received mtu is crossed with requested max mtu (given as module parameter). Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/fw.c | 11 ++++++----- drivers/net/mlx4/fw.h | 4 +++- drivers/net/mlx4/main.c | 15 ++++++++++++++- include/linux/mlx4/device.h | 4 +++- 4 files changed, 26 insertions(+), 8 deletions(-) diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c index e875b08..1cbc30f 100644 --- a/drivers/net/mlx4/fw.c +++ b/drivers/net/mlx4/fw.c @@ -314,7 +314,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) MLX4_GET(field, outbox, QUERY_DEV_CAP_VL_PORT_OFFSET); dev_cap->max_vl[i] = field >> 4; MLX4_GET(field, outbox, QUERY_DEV_CAP_MTU_WIDTH_OFFSET); - dev_cap->max_mtu[i] = field >> 4; + dev_cap->ib_mtu[i] = field >> 4; dev_cap->max_port_width[i] = field & 0xf; MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_GID_OFFSET); dev_cap->max_gids[i] = 1 << (field & 0xf); @@ -339,7 +339,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) QUERY_PORT_SUPPORTED_TYPE_OFFSET); dev_cap->supported_port_types[i] = field & 3; MLX4_GET(field, outbox, QUERY_PORT_MTU_OFFSET); - dev_cap->max_mtu[i] = field & 0xf; + dev_cap->ib_mtu[i] = field & 0xf; MLX4_GET(field, outbox, QUERY_PORT_WIDTH_OFFSET); dev_cap->max_port_width[i] = field & 0xf; MLX4_GET(field, outbox, QUERY_PORT_MAX_GID_PKEY_OFFSET); @@ -350,7 +350,8 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) MLX4_GET(field, outbox, QUERY_PORT_MAX_MACVLAN_OFFSET); dev_cap->log_max_macs[i] = field & 0xf; dev_cap->log_max_vlans[i] = field >> 4; - + dev_cap->eth_mtu[i] = be16_to_cpu(((u16 *) outbox)[1]); + dev_cap->def_mac[i] = be64_to_cpu(((u64 *) outbox)[2]); } } @@ -388,7 +389,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) mlx4_dbg(dev, "Max CQEs: %d, max WQEs: %d, max SRQ WQEs: %d\n", dev_cap->max_cq_sz, dev_cap->max_qp_sz, dev_cap->max_srq_sz); mlx4_dbg(dev, "Local CA ACK delay: %d, max MTU: %d, port width cap: %d\n", - dev_cap->local_ca_ack_delay, 128 << dev_cap->max_mtu[1], + dev_cap->local_ca_ack_delay, 128 << dev_cap->ib_mtu[1], dev_cap->max_port_width[1]); mlx4_dbg(dev, "Max SQ desc size: %d, max SQ S/G: %d\n", dev_cap->max_sq_desc_sz, dev_cap->max_sq_sg); @@ -796,7 +797,7 @@ int mlx4_INIT_PORT(struct mlx4_dev *dev, int port) flags |= (dev->caps.port_width_cap[port] & 0xf) << INIT_PORT_PORT_WIDTH_SHIFT; MLX4_PUT(inbox, flags, INIT_PORT_FLAGS_OFFSET); - field = 128 << dev->caps.mtu_cap[port]; + field = 128 << dev->caps.ib_mtu_cap[port]; MLX4_PUT(inbox, field, INIT_PORT_MTU_OFFSET); field = dev->caps.gid_table_len[port]; MLX4_PUT(inbox, field, INIT_PORT_MAX_GID_OFFSET); diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h index 50a6a7d..ef964d5 100644 --- a/drivers/net/mlx4/fw.h +++ b/drivers/net/mlx4/fw.h @@ -61,11 +61,13 @@ struct mlx4_dev_cap { int local_ca_ack_delay; int num_ports; u32 max_msg_sz; - int max_mtu[MLX4_MAX_PORTS + 1]; + int ib_mtu[MLX4_MAX_PORTS + 1]; int max_port_width[MLX4_MAX_PORTS + 1]; int max_vl[MLX4_MAX_PORTS + 1]; int max_gids[MLX4_MAX_PORTS + 1]; int max_pkeys[MLX4_MAX_PORTS + 1]; + u64 def_mac[MLX4_MAX_PORTS + 1]; + int eth_mtu[MLX4_MAX_PORTS + 1]; u16 stat_rate_support; u32 flags; int reserved_uars; diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 1651d8e..754c07c 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -104,6 +104,11 @@ static struct mlx4_profile default_profile = { module_param_array_named(port_type, port_type_arr, charp, NULL, 0444); MODULE_PARM_DESC(port_type, "Ports L2 type (ib/eth/auto, entry per port, " "comma seperated, default ib for all)"); + +static int port_mtu[MLX4_MAX_PORTS] = { [0 ... (MLX4_MAX_PORTS-1)] = 9600}; +module_param_array_named(port_mtu, port_mtu, int, NULL, 0444); +MODULE_PARM_DESC(port_mtu, "Ports max mtu in Bytes, entry per port, " + "comma seperated, default 9600 for all"); static int mlx4_check_port_params(struct mlx4_dev *dev, enum mlx4_port_type *port_type) @@ -175,10 +180,12 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev->caps.num_ports = dev_cap->num_ports; for (i = 1; i <= dev->caps.num_ports; ++i) { dev->caps.vl_cap[i] = dev_cap->max_vl[i]; - dev->caps.mtu_cap[i] = dev_cap->max_mtu[i]; + dev->caps.ib_mtu_cap[i] = dev_cap->ib_mtu[i]; dev->caps.gid_table_len[i] = dev_cap->max_gids[i]; dev->caps.pkey_table_len[i] = dev_cap->max_pkeys[i]; dev->caps.port_width_cap[i] = dev_cap->max_port_width[i]; + dev->caps.eth_mtu_cap[i] = dev_cap->eth_mtu[i]; + dev->caps.def_mac[i] = dev_cap->def_mac[i]; } dev->caps.num_uars = dev_cap->uar_size / PAGE_SIZE; @@ -237,6 +244,12 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) "not supported by HW\n", i); return -ENODEV; } + if (port_mtu[i-1] <= dev->caps.eth_mtu_cap[i]) + dev->caps.eth_mtu_cap[i] = port_mtu[i-1]; + else + mlx4_warn(dev, "Requested mtu for port %d is larger " + "then supported, reducing to %d\n", + i, dev->caps.eth_mtu_cap[i]); if (dev->caps.log_num_macs > dev_cap->log_max_macs[i]) { dev->caps.log_num_macs = dev_cap->log_max_macs[i]; mlx4_warn(dev, "Requested number of MACs is too much " diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 4279b2f..b114ef3 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -162,7 +162,9 @@ struct mlx4_caps { u64 fw_ver; int num_ports; int vl_cap[MLX4_MAX_PORTS + 1]; - int mtu_cap[MLX4_MAX_PORTS + 1]; + int ib_mtu_cap[MLX4_MAX_PORTS + 1]; + u64 def_mac[MLX4_MAX_PORTS + 1]; + int eth_mtu_cap[MLX4_MAX_PORTS + 1]; int gid_table_len[MLX4_MAX_PORTS + 1]; int pkey_table_len[MLX4_MAX_PORTS + 1]; int local_ca_ack_delay; -- 1.5.4 From yevgenyp at mellanox.co.il Tue Apr 22 07:07:28 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Tue, 22 Apr 2008 17:07:28 +0300 Subject: [ofa-general][PATCH] mlx4: Mac Vlan Management (MP support, Patch 7) Message-ID: <480DF120.3010006@mellanox.co.il> >From 93d41d72b8878bfd8d67b6a48b70c392f108fe58 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Tue, 22 Apr 2008 14:28:36 +0300 Subject: [PATCH] mlx4: Mac Vlan Management mlx4_core is now responsible for managing Mac and Vlan filters for each port. It also notifies the FW which port type will be loaded, using the SET_PORT command. Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/Makefile | 2 +- drivers/net/mlx4/main.c | 18 +++ drivers/net/mlx4/mlx4.h | 35 ++++++ drivers/net/mlx4/port.c | 278 +++++++++++++++++++++++++++++++++++++++++++ include/linux/mlx4/cmd.h | 9 ++ include/linux/mlx4/device.h | 6 + 6 files changed, 347 insertions(+), 1 deletions(-) create mode 100644 drivers/net/mlx4/port.c diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile index 0952a65..f4932d8 100644 --- a/drivers/net/mlx4/Makefile +++ b/drivers/net/mlx4/Makefile @@ -1,4 +1,4 @@ obj-$(CONFIG_MLX4_CORE) += mlx4_core.o mlx4_core-y := alloc.o catas.o cmd.o cq.o eq.o fw.o icm.o intf.o main.o mcg.o \ - mr.o pd.o profile.o qp.o reset.o srq.o + mr.o pd.o profile.o qp.o reset.o srq.o port.o diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 754c07c..a528809 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -678,6 +678,7 @@ static int mlx4_setup_hca(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); int err; + int port; err = mlx4_init_uar_table(dev); if (err) { @@ -776,8 +777,25 @@ static int mlx4_setup_hca(struct mlx4_dev *dev) goto err_qp_table_free; } + for (port = 1; port <= dev->caps.num_ports; port++) { + err = mlx4_SET_PORT(dev, port); + if (err) { + mlx4_err(dev, "Failed to set port %d, aborting\n", + port); + goto err_mcg_table_free; + } + } + + for (port = 0; port < dev->caps.num_ports; port++) { + mlx4_init_mac_table(dev, port); + mlx4_init_vlan_table(dev, port); + } + return 0; +err_mcg_table_free: + mlx4_cleanup_mcg_table(dev); + err_qp_table_free: mlx4_cleanup_qp_table(dev); diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index b74405a..eff1c5a 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -251,6 +251,35 @@ struct mlx4_catas_err { struct list_head list; }; +struct mlx4_mac_table { +#define MLX4_MAX_MAC_NUM 128 +#define MLX4_MAC_MASK 0xffffffffffff +#define MLX4_MAC_VALID_SHIFT 63 +#define MLX4_MAC_TABLE_SIZE MLX4_MAX_MAC_NUM << 3 + __be64 entries[MLX4_MAX_MAC_NUM]; + int refs[MLX4_MAX_MAC_NUM]; + struct semaphore mac_sem; + int total; + int max; +}; + +struct mlx4_vlan_table { +#define MLX4_MAX_VLAN_NUM 126 +#define MLX4_VLAN_MASK 0xfff +#define MLX4_VLAN_VALID 1 << 31 +#define MLX4_VLAN_TABLE_SIZE MLX4_MAX_VLAN_NUM << 2 + __be32 entries[MLX4_MAX_VLAN_NUM]; + int refs[MLX4_MAX_VLAN_NUM]; + struct semaphore vlan_sem; + int total; + int max; +}; + +struct mlx4_port_info { + struct mlx4_mac_table mac_table; + struct mlx4_vlan_table vlan_table; +}; + struct mlx4_priv { struct mlx4_dev dev; @@ -279,6 +308,7 @@ struct mlx4_priv { struct mlx4_uar driver_uar; void __iomem *kar; + struct mlx4_port_info port[MLX4_MAX_PORTS]; }; static inline struct mlx4_priv *mlx4_priv(struct mlx4_dev *dev) @@ -351,4 +381,9 @@ void mlx4_srq_event(struct mlx4_dev *dev, u32 srqn, int event_type); void mlx4_handle_catas_err(struct mlx4_dev *dev); +void mlx4_init_mac_table(struct mlx4_dev *dev, u8 port); +void mlx4_init_vlan_table(struct mlx4_dev *dev, u8 port); + +int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port); + #endif /* MLX4_H */ diff --git a/drivers/net/mlx4/port.c b/drivers/net/mlx4/port.c new file mode 100644 index 0000000..910fc35 --- /dev/null +++ b/drivers/net/mlx4/port.c @@ -0,0 +1,278 @@ +/* + * Copyright (c) 2007 Mellanox Technologies. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#include +#include + +#include + +#include "mlx4.h" + +void mlx4_init_mac_table(struct mlx4_dev *dev, u8 port) +{ + struct mlx4_mac_table *table = &mlx4_priv(dev)->port[port].mac_table; + int i; + + sema_init(&table->mac_sem, 1); + for (i = 0; i < MLX4_MAX_MAC_NUM; i++) { + table->entries[i] = 0; + table->refs[i] = 0; + } + table->max = 1 << dev->caps.log_num_macs; + table->total = 0; +} + +void mlx4_init_vlan_table(struct mlx4_dev *dev, u8 port) +{ + struct mlx4_vlan_table *table = &mlx4_priv(dev)->port[port].vlan_table; + int i; + + sema_init(&table->vlan_sem, 1); + for (i = 0; i < MLX4_MAX_MAC_NUM; i++) { + table->entries[i] = 0; + table->refs[i] = 0; + } + table->max = 1 << dev->caps.log_num_vlans; + table->total = 0; +} + +static int mlx4_SET_PORT_mac_table(struct mlx4_dev *dev, u8 port, + __be64 *entries) +{ + struct mlx4_cmd_mailbox *mailbox; + u32 in_mod; + int err; + + mailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(mailbox)) + return PTR_ERR(mailbox); + + memcpy(mailbox->buf, entries, MLX4_MAC_TABLE_SIZE); + + in_mod = MLX4_SET_PORT_MAC_TABLE << 8 | port; + err = mlx4_cmd(dev, mailbox->dma, in_mod, 1, MLX4_CMD_SET_PORT, + MLX4_CMD_TIME_CLASS_B); + + mlx4_free_cmd_mailbox(dev, mailbox); + return err; +} + +int mlx4_register_mac(struct mlx4_dev *dev, u8 port, u64 mac, int *index) +{ + struct mlx4_mac_table *table = &mlx4_priv(dev)->port[port - 1].mac_table; + int i, err = 0; + int free = -1; + u64 valid = 1; + + mlx4_dbg(dev, "Registering mac : 0x%llx\n", mac); + down(&table->mac_sem); + for (i = 0; i < MLX4_MAX_MAC_NUM - 1; i++) { + if (free < 0 && !table->refs[i]) { + free = i; + continue; + } + + if (mac == (MLX4_MAC_MASK & be64_to_cpu(table->entries[i]))) { + /* Mac already registered, increase refernce count */ + *index = i; + ++table->refs[i]; + goto out; + } + } + mlx4_dbg(dev, "Free mac index is %d\n", free); + + if (table->total == table->max) { + /* No free mac entries */ + err = -ENOSPC; + goto out; + } + + /* Register new MAC */ + table->refs[free] = 1; + table->entries[free] = cpu_to_be64(mac | valid << MLX4_MAC_VALID_SHIFT); + + err = mlx4_SET_PORT_mac_table(dev, port, table->entries); + if (unlikely(err)) { + mlx4_err(dev, "Failed adding mac: 0x%llx\n", mac); + table->refs[free] = 0; + table->entries[free] = 0; + goto out; + } + + *index = free; + ++table->total; +out: + up(&table->mac_sem); + return err; +} +EXPORT_SYMBOL_GPL(mlx4_register_mac); + +void mlx4_unregister_mac(struct mlx4_dev *dev, u8 port, int index) +{ + struct mlx4_mac_table *table = &mlx4_priv(dev)->port[port - 1].mac_table; + + down(&table->mac_sem); + if (!table->refs[index]) { + mlx4_warn(dev, "No mac entry for index %d\n", index); + goto out; + } + if (--table->refs[index]) { + mlx4_warn(dev, "Have more references for index %d," + "no need to modify mac table\n", index); + goto out; + } + table->entries[index] = 0; + mlx4_SET_PORT_mac_table(dev, port, table->entries); + --table->total; +out: + up(&table->mac_sem); +} +EXPORT_SYMBOL_GPL(mlx4_unregister_mac); + +static int mlx4_SET_PORT_vlan_table(struct mlx4_dev *dev, u8 port, + __be32 *entries) +{ + struct mlx4_cmd_mailbox *mailbox; + u32 in_mod; + int err; + + mailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(mailbox)) + return PTR_ERR(mailbox); + + memcpy(mailbox->buf, entries, MLX4_VLAN_TABLE_SIZE); + in_mod = MLX4_SET_PORT_VLAN_TABLE << 8 | port; + err = mlx4_cmd(dev, mailbox->dma, in_mod, 1, MLX4_CMD_SET_PORT, + MLX4_CMD_TIME_CLASS_B); + + mlx4_free_cmd_mailbox(dev, mailbox); + + return err; +} + +int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index) +{ + struct mlx4_vlan_table *table = &mlx4_priv(dev)->port[port - 1].vlan_table; + int i, err = 0; + int free = -1; + + down(&table->vlan_sem); + for (i = 0; i < MLX4_MAX_VLAN_NUM; i++) { + if (free < 0 && (table->refs[i] == 0)) { + free = i; + continue; + } + + if (table->refs[i] && + (vlan == (MLX4_VLAN_MASK & + be32_to_cpu(table->entries[i])))) { + /* Vlan already registered, increase refernce count */ + *index = i; + ++table->refs[i]; + goto out; + } + } + + if (table->total == table->max) { + /* No free vlan entries */ + err = -ENOSPC; + goto out; + } + + /* Register new MAC */ + table->refs[free] = 1; + table->entries[free] = cpu_to_be32(vlan | MLX4_VLAN_VALID); + + err = mlx4_SET_PORT_vlan_table(dev, port, table->entries); + if (unlikely(err)) { + mlx4_warn(dev, "Failed adding vlan: %u\n", vlan); + table->refs[free] = 0; + table->entries[free] = 0; + goto out; + } + + *index = free; + ++table->total; +out: + up(&table->vlan_sem); + return err; +} +EXPORT_SYMBOL_GPL(mlx4_register_vlan); + +void mlx4_unregister_vlan(struct mlx4_dev *dev, u8 port, int index) +{ + struct mlx4_vlan_table *table = &mlx4_priv(dev)->port[port - 1].vlan_table; + + down(&table->vlan_sem); + if (!table->refs[index]) { + mlx4_warn(dev, "No vlan entry for index %d\n", index); + goto out; + } + if (--table->refs[index]) { + mlx4_dbg(dev, "Have more references for index %d," + "no need to modify vlan table\n", index); + goto out; + } + table->entries[index] = 0; + mlx4_SET_PORT_vlan_table(dev, port, table->entries); + --table->total; +out: + up(&table->vlan_sem); +} +EXPORT_SYMBOL_GPL(mlx4_unregister_vlan); + +int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port) +{ + struct mlx4_cmd_mailbox *mailbox; + int err; + u8 is_eth = (dev->caps.port_type[port] == MLX4_PORT_TYPE_ETH) ? 1 : 0; + + mailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(mailbox)) + return PTR_ERR(mailbox); + + memset(mailbox->buf, 0, 256); + if (is_eth) { + ((u8 *) mailbox->buf)[3] = 7; + ((__be16 *) mailbox->buf)[3] = + cpu_to_be16(dev->caps.eth_mtu_cap[port] + + ETH_HLEN + ETH_FCS_LEN); + ((__be16 *) mailbox->buf)[4] = cpu_to_be16(1 << 15); + ((__be16 *) mailbox->buf)[6] = cpu_to_be16(1 << 15); + } + err = mlx4_cmd(dev, mailbox->dma, port, is_eth, MLX4_CMD_SET_PORT, + MLX4_CMD_TIME_CLASS_B); + + mlx4_free_cmd_mailbox(dev, mailbox); + return err; +} diff --git a/include/linux/mlx4/cmd.h b/include/linux/mlx4/cmd.h index 77323a7..cf9c679 100644 --- a/include/linux/mlx4/cmd.h +++ b/include/linux/mlx4/cmd.h @@ -132,6 +132,15 @@ enum { MLX4_MAILBOX_SIZE = 4096 }; +enum { + /* set port opcode modifiers */ + MLX4_SET_PORT_GENERAL = 0x0, + MLX4_SET_PORT_RQP_CALC = 0x1, + MLX4_SET_PORT_MAC_TABLE = 0x2, + MLX4_SET_PORT_VLAN_TABLE = 0x3, + MLX4_SET_PORT_PRIO_MAP = 0x4, +}; + struct mlx4_dev; struct mlx4_cmd_mailbox { diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index b114ef3..4ca3a00 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -461,6 +461,12 @@ int mlx4_CLOSE_PORT(struct mlx4_dev *dev, int port); int mlx4_multicast_attach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16]); int mlx4_multicast_detach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16]); +int mlx4_register_mac(struct mlx4_dev *dev, u8 port, u64 mac, int *index); +void mlx4_unregister_mac(struct mlx4_dev *dev, u8 port, int index); + +int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index); +void mlx4_unregister_vlan(struct mlx4_dev *dev, u8 port, int index); + int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list, int npages, u64 iova, u32 *lkey, u32 *rkey); int mlx4_fmr_alloc(struct mlx4_dev *dev, u32 pd, u32 access, int max_pages, -- 1.5.4 From andrea at qumranet.com Tue Apr 22 06:51:16 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 22 Apr 2008 15:51:16 +0200 Subject: [ofa-general] [PATCH 00 of 12] mmu notifier #v13 Message-ID: Hello, This is the latest and greatest version of the mmu notifier patch #v13. Changes are mainly in the mm_lock that uses sort() suggested by Christoph. This reduces the complexity from O(N**2) to O(N*log(N)). I folded the mm_lock functionality together with the mmu-notifier-core 1/12 patch to make it self-contained. I recommend merging 1/12 into -mm/mainline ASAP. Lack of mmu notifiers is holding off KVM development. We are going to rework the way the pages are mapped and unmapped to work with pure pfn for pci passthrough without the use of page pinning, and we can't without mmu notifiers. This is not just a performance matter. KVM/GRU and AFAICT Quadrics are all covered by applying the single 1/12 patch that shall be shipped with 2.6.26. The risk of brekage by applying 1/12 is zero. Both when MMU_NOTIFIER=y and when it's =n, so it shouldn't be delayed further. XPMEM support comes with the later patches 2-12, risk for those patches is >0 and this is why the mmu-notifier-core is numbered 1/12 and not 12/12. Some are simple and can go in immediately but not all are so simple. 2-12/12 are posted as usual for review by the VM developers and so Robin can keep testing them on XPMEM and they can be merged later without any downside (they're mostly orthogonal with 1/12). From andrea at qumranet.com Tue Apr 22 06:51:18 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 22 Apr 2008 15:51:18 +0200 Subject: [ofa-general] [PATCH 02 of 12] Fix ia64 compilation failure because of common code include bug In-Reply-To: Message-ID: <3c804dca25b15017b220.1208872278@duo.random> # HG changeset patch # User Andrea Arcangeli # Date 1208872186 -7200 # Node ID 3c804dca25b15017b22008647783d6f5f3801fa9 # Parent ea87c15371b1bd49380c40c3f15f1c7ca4438af5 Fix ia64 compilation failure because of common code include bug. Signed-off-by: Andrea Arcangeli diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -10,6 +10,7 @@ #include #include #include +#include #include #include From andrea at qumranet.com Tue Apr 22 06:51:19 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 22 Apr 2008 15:51:19 +0200 Subject: [ofa-general] [PATCH 03 of 12] get_task_mm should not succeed if mmput() is running and has reduced In-Reply-To: Message-ID: # HG changeset patch # User Andrea Arcangeli # Date 1208872186 -7200 # Node ID a6672bdeead0d41b2ebd6846f731d43a611645b7 # Parent 3c804dca25b15017b22008647783d6f5f3801fa9 get_task_mm should not succeed if mmput() is running and has reduced the mm_users count to zero. This can occur if a processor follows a tasks pointer to an mm struct because that pointer is only cleared after the mmput(). If get_task_mm() succeeds after mmput() reduced the mm_users to zero then we have the lovely situation that one portion of the kernel is doing all the teardown work for an mm while another portion is happily using it. Signed-off-by: Christoph Lameter diff --git a/kernel/fork.c b/kernel/fork.c --- a/kernel/fork.c +++ b/kernel/fork.c @@ -442,7 +442,8 @@ if (task->flags & PF_BORROWED_MM) mm = NULL; else - atomic_inc(&mm->mm_users); + if (!atomic_inc_not_zero(&mm->mm_users)) + mm = NULL; } task_unlock(task); return mm; From andrea at qumranet.com Tue Apr 22 06:51:17 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 22 Apr 2008 15:51:17 +0200 Subject: [ofa-general] [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: Message-ID: # HG changeset patch # User Andrea Arcangeli # Date 1208870142 -7200 # Node ID ea87c15371b1bd49380c40c3f15f1c7ca4438af5 # Parent fb3bc9942fb78629d096bd07564f435d51d86e5f Core of mmu notifiers. Signed-off-by: Andrea Arcangeli Signed-off-by: Nick Piggin Signed-off-by: Christoph Lameter diff --git a/include/linux/mm.h b/include/linux/mm.h --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1050,6 +1050,27 @@ unsigned long addr, unsigned long len, unsigned long flags, struct page **pages); +/* + * mm_lock will take mmap_sem writably (to prevent all modifications + * and scanning of vmas) and then also takes the mapping locks for + * each of the vma to lockout any scans of pagetables of this address + * space. This can be used to effectively holding off reclaim from the + * address space. + * + * mm_lock can fail if there is not enough memory to store a pointer + * array to all vmas. + * + * mm_lock and mm_unlock are expensive operations that may take a long time. + */ +struct mm_lock_data { + spinlock_t **i_mmap_locks; + spinlock_t **anon_vma_locks; + size_t nr_i_mmap_locks; + size_t nr_anon_vma_locks; +}; +extern int mm_lock(struct mm_struct *mm, struct mm_lock_data *data); +extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data); + extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -225,6 +225,9 @@ #ifdef CONFIG_CGROUP_MEM_RES_CTLR struct mem_cgroup *mem_cgroup; #endif +#ifdef CONFIG_MMU_NOTIFIER + struct hlist_head mmu_notifier_list; +#endif }; #endif /* _LINUX_MM_TYPES_H */ diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h new file mode 100644 --- /dev/null +++ b/include/linux/mmu_notifier.h @@ -0,0 +1,229 @@ +#ifndef _LINUX_MMU_NOTIFIER_H +#define _LINUX_MMU_NOTIFIER_H + +#include +#include +#include + +struct mmu_notifier; +struct mmu_notifier_ops; + +#ifdef CONFIG_MMU_NOTIFIER + +struct mmu_notifier_ops { + /* + * Called after all other threads have terminated and the executing + * thread is the only remaining execution thread. There are no + * users of the mm_struct remaining. + */ + void (*release)(struct mmu_notifier *mn, + struct mm_struct *mm); + + /* + * clear_flush_young is called after the VM is + * test-and-clearing the young/accessed bitflag in the + * pte. This way the VM will provide proper aging to the + * accesses to the page through the secondary MMUs and not + * only to the ones through the Linux pte. + */ + int (*clear_flush_young)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long address); + + /* + * Before this is invoked any secondary MMU is still ok to + * read/write to the page previously pointed by the Linux pte + * because the old page hasn't been freed yet. If required + * set_page_dirty has to be called internally to this method. + */ + void (*invalidate_page)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long address); + + /* + * invalidate_range_start() and invalidate_range_end() must be + * paired and are called only when the mmap_sem is held and/or + * the semaphores protecting the reverse maps. Both functions + * may sleep. The subsystem must guarantee that no additional + * references to the pages in the range established between + * the call to invalidate_range_start() and the matching call + * to invalidate_range_end(). + * + * Invalidation of multiple concurrent ranges may be permitted + * by the driver or the driver may exclude other invalidation + * from proceeding by blocking on new invalidate_range_start() + * callback that overlap invalidates that are already in + * progress. Either way the establishment of sptes to the + * range can only be allowed if all invalidate_range_stop() + * function have been called. + * + * invalidate_range_start() is called when all pages in the + * range are still mapped and have at least a refcount of one. + * + * invalidate_range_end() is called when all pages in the + * range have been unmapped and the pages have been freed by + * the VM. + * + * The VM will remove the page table entries and potentially + * the page between invalidate_range_start() and + * invalidate_range_end(). If the page must not be freed + * because of pending I/O or other circumstances then the + * invalidate_range_start() callback (or the initial mapping + * by the driver) must make sure that the refcount is kept + * elevated. + * + * If the driver increases the refcount when the pages are + * initially mapped into an address space then either + * invalidate_range_start() or invalidate_range_end() may + * decrease the refcount. If the refcount is decreased on + * invalidate_range_start() then the VM can free pages as page + * table entries are removed. If the refcount is only + * droppped on invalidate_range_end() then the driver itself + * will drop the last refcount but it must take care to flush + * any secondary tlb before doing the final free on the + * page. Pages will no longer be referenced by the linux + * address space but may still be referenced by sptes until + * the last refcount is dropped. + */ + void (*invalidate_range_start)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, unsigned long end); + void (*invalidate_range_end)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, unsigned long end); +}; + +/* + * The notifier chains are protected by mmap_sem and/or the reverse map + * semaphores. Notifier chains are only changed when all reverse maps and + * the mmap_sem locks are taken. + * + * Therefore notifier chains can only be traversed when either + * + * 1. mmap_sem is held. + * 2. One of the reverse map locks is held (i_mmap_sem or anon_vma->sem). + * 3. No other concurrent thread can access the list (release) + */ +struct mmu_notifier { + struct hlist_node hlist; + const struct mmu_notifier_ops *ops; +}; + +static inline int mm_has_notifiers(struct mm_struct *mm) +{ + return unlikely(!hlist_empty(&mm->mmu_notifier_list)); +} + +extern int mmu_notifier_register(struct mmu_notifier *mn, + struct mm_struct *mm); +extern int mmu_notifier_unregister(struct mmu_notifier *mn, + struct mm_struct *mm); +extern void __mmu_notifier_release(struct mm_struct *mm); +extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address); +extern void __mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address); +extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm, + unsigned long start, unsigned long end); +extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end); + + +static inline void mmu_notifier_release(struct mm_struct *mm) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_release(mm); +} + +static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address) +{ + if (mm_has_notifiers(mm)) + return __mmu_notifier_clear_flush_young(mm, address); + return 0; +} + +static inline void mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_invalidate_page(mm, address); +} + +static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_invalidate_range_start(mm, start, end); +} + +static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_invalidate_range_end(mm, start, end); +} + +static inline void mmu_notifier_mm_init(struct mm_struct *mm) +{ + INIT_HLIST_HEAD(&mm->mmu_notifier_list); +} + +#define ptep_clear_flush_notify(__vma, __address, __ptep) \ +({ \ + pte_t __pte; \ + struct vm_area_struct *___vma = __vma; \ + unsigned long ___address = __address; \ + __pte = ptep_clear_flush(___vma, ___address, __ptep); \ + mmu_notifier_invalidate_page(___vma->vm_mm, ___address); \ + __pte; \ +}) + +#define ptep_clear_flush_young_notify(__vma, __address, __ptep) \ +({ \ + int __young; \ + struct vm_area_struct *___vma = __vma; \ + unsigned long ___address = __address; \ + __young = ptep_clear_flush_young(___vma, ___address, __ptep); \ + __young |= mmu_notifier_clear_flush_young(___vma->vm_mm, \ + ___address); \ + __young; \ +}) + +#else /* CONFIG_MMU_NOTIFIER */ + +static inline void mmu_notifier_release(struct mm_struct *mm) +{ +} + +static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address) +{ + return 0; +} + +static inline void mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address) +{ +} + +static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ +} + +static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ +} + +static inline void mmu_notifier_mm_init(struct mm_struct *mm) +{ +} + +#define ptep_clear_flush_young_notify ptep_clear_flush_young +#define ptep_clear_flush_notify ptep_clear_flush + +#endif /* CONFIG_MMU_NOTIFIER */ + +#endif /* _LINUX_MMU_NOTIFIER_H */ diff --git a/kernel/fork.c b/kernel/fork.c --- a/kernel/fork.c +++ b/kernel/fork.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include @@ -362,6 +363,7 @@ if (likely(!mm_alloc_pgd(mm))) { mm->def_flags = 0; + mmu_notifier_mm_init(mm); return mm; } diff --git a/mm/Kconfig b/mm/Kconfig --- a/mm/Kconfig +++ b/mm/Kconfig @@ -193,3 +193,7 @@ config VIRT_TO_BUS def_bool y depends on !ARCH_NO_VIRT_TO_BUS + +config MMU_NOTIFIER + def_bool y + bool "MMU notifier, for paging KVM/RDMA" diff --git a/mm/Makefile b/mm/Makefile --- a/mm/Makefile +++ b/mm/Makefile @@ -33,4 +33,5 @@ obj-$(CONFIG_SMP) += allocpercpu.o obj-$(CONFIG_QUICKLIST) += quicklist.o obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o +obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c --- a/mm/filemap_xip.c +++ b/mm/filemap_xip.c @@ -194,7 +194,7 @@ if (pte) { /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); - pteval = ptep_clear_flush(vma, address, pte); + pteval = ptep_clear_flush_notify(vma, address, pte); page_remove_rmap(page, vma); dec_mm_counter(mm, file_rss); BUG_ON(pte_dirty(pteval)); diff --git a/mm/fremap.c b/mm/fremap.c --- a/mm/fremap.c +++ b/mm/fremap.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -214,7 +215,9 @@ spin_unlock(&mapping->i_mmap_lock); } + mmu_notifier_invalidate_range_start(mm, start, start + size); err = populate_range(mm, vma, start, size, pgoff); + mmu_notifier_invalidate_range_end(mm, start, start + size); if (!err && !(flags & MAP_NONBLOCK)) { if (unlikely(has_write_lock)) { downgrade_write(&mm->mmap_sem); diff --git a/mm/hugetlb.c b/mm/hugetlb.c --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include @@ -799,6 +800,7 @@ BUG_ON(start & ~HPAGE_MASK); BUG_ON(end & ~HPAGE_MASK); + mmu_notifier_invalidate_range_start(mm, start, end); spin_lock(&mm->page_table_lock); for (address = start; address < end; address += HPAGE_SIZE) { ptep = huge_pte_offset(mm, address); @@ -819,6 +821,7 @@ } spin_unlock(&mm->page_table_lock); flush_tlb_range(vma, start, end); + mmu_notifier_invalidate_range_end(mm, start, end); list_for_each_entry_safe(page, tmp, &page_list, lru) { list_del(&page->lru); put_page(page); diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -51,6 +51,7 @@ #include #include #include +#include #include #include @@ -611,6 +612,9 @@ if (is_vm_hugetlb_page(vma)) return copy_hugetlb_page_range(dst_mm, src_mm, vma); + if (is_cow_mapping(vma->vm_flags)) + mmu_notifier_invalidate_range_start(src_mm, addr, end); + dst_pgd = pgd_offset(dst_mm, addr); src_pgd = pgd_offset(src_mm, addr); do { @@ -621,6 +625,11 @@ vma, addr, next)) return -ENOMEM; } while (dst_pgd++, src_pgd++, addr = next, addr != end); + + if (is_cow_mapping(vma->vm_flags)) + mmu_notifier_invalidate_range_end(src_mm, + vma->vm_start, end); + return 0; } @@ -825,7 +834,9 @@ unsigned long start = start_addr; spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL; int fullmm = (*tlbp)->fullmm; + struct mm_struct *mm = vma->vm_mm; + mmu_notifier_invalidate_range_start(mm, start_addr, end_addr); for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) { unsigned long end; @@ -876,6 +887,7 @@ } } out: + mmu_notifier_invalidate_range_end(mm, start_addr, end_addr); return start; /* which is now the end (or restart) address */ } @@ -1463,10 +1475,11 @@ { pgd_t *pgd; unsigned long next; - unsigned long end = addr + size; + unsigned long start = addr, end = addr + size; int err; BUG_ON(addr >= end); + mmu_notifier_invalidate_range_start(mm, start, end); pgd = pgd_offset(mm, addr); do { next = pgd_addr_end(addr, end); @@ -1474,6 +1487,7 @@ if (err) break; } while (pgd++, addr = next, addr != end); + mmu_notifier_invalidate_range_end(mm, start, end); return err; } EXPORT_SYMBOL_GPL(apply_to_page_range); @@ -1675,7 +1689,7 @@ * seen in the presence of one thread doing SMC and another * thread doing COW. */ - ptep_clear_flush(vma, address, page_table); + ptep_clear_flush_notify(vma, address, page_table); set_pte_at(mm, address, page_table, entry); update_mmu_cache(vma, address, entry); lru_cache_add_active(new_page); diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -26,6 +26,9 @@ #include #include #include +#include +#include +#include #include #include @@ -2038,6 +2041,7 @@ /* mm's last user has gone, and its about to be pulled down */ arch_exit_mmap(mm); + mmu_notifier_release(mm); lru_add_drain(); flush_cache_mm(mm); @@ -2242,3 +2246,143 @@ return 0; } + +static int mm_lock_cmp(const void *a, const void *b) +{ + cond_resched(); + if ((unsigned long)*(spinlock_t **)a < + (unsigned long)*(spinlock_t **)b) + return -1; + else if (a == b) + return 0; + else + return 1; +} + +static unsigned long mm_lock_sort(struct mm_struct *mm, spinlock_t **locks, + int anon) +{ + struct vm_area_struct *vma; + size_t i = 0; + + for (vma = mm->mmap; vma; vma = vma->vm_next) { + if (anon) { + if (vma->anon_vma) + locks[i++] = &vma->anon_vma->lock; + } else { + if (vma->vm_file && vma->vm_file->f_mapping) + locks[i++] = &vma->vm_file->f_mapping->i_mmap_lock; + } + } + + if (!i) + goto out; + + sort(locks, i, sizeof(spinlock_t *), mm_lock_cmp, NULL); + +out: + return i; +} + +static inline unsigned long mm_lock_sort_anon_vma(struct mm_struct *mm, + spinlock_t **locks) +{ + return mm_lock_sort(mm, locks, 1); +} + +static inline unsigned long mm_lock_sort_i_mmap(struct mm_struct *mm, + spinlock_t **locks) +{ + return mm_lock_sort(mm, locks, 0); +} + +static void mm_lock_unlock(spinlock_t **locks, size_t nr, int lock) +{ + spinlock_t *last = NULL; + size_t i; + + for (i = 0; i < nr; i++) + /* Multiple vmas may use the same lock. */ + if (locks[i] != last) { + BUG_ON((unsigned long) last > (unsigned long) locks[i]); + last = locks[i]; + if (lock) + spin_lock(last); + else + spin_unlock(last); + } +} + +static inline void __mm_lock(spinlock_t **locks, size_t nr) +{ + mm_lock_unlock(locks, nr, 1); +} + +static inline void __mm_unlock(spinlock_t **locks, size_t nr) +{ + mm_lock_unlock(locks, nr, 0); +} + +/* + * This operation locks against the VM for all pte/vma/mm related + * operations that could ever happen on a certain mm. This includes + * vmtruncate, try_to_unmap, and all page faults. The holder + * must not hold any mm related lock. A single task can't take more + * than one mm lock in a row or it would deadlock. + */ +int mm_lock(struct mm_struct *mm, struct mm_lock_data *data) +{ + spinlock_t **anon_vma_locks, **i_mmap_locks; + + down_write(&mm->mmap_sem); + if (mm->map_count) { + anon_vma_locks = vmalloc(sizeof(spinlock_t *) * mm->map_count); + if (unlikely(!anon_vma_locks)) { + up_write(&mm->mmap_sem); + return -ENOMEM; + } + + i_mmap_locks = vmalloc(sizeof(spinlock_t *) * mm->map_count); + if (unlikely(!i_mmap_locks)) { + up_write(&mm->mmap_sem); + vfree(anon_vma_locks); + return -ENOMEM; + } + + data->nr_anon_vma_locks = mm_lock_sort_anon_vma(mm, anon_vma_locks); + data->nr_i_mmap_locks = mm_lock_sort_i_mmap(mm, i_mmap_locks); + + if (data->nr_anon_vma_locks) { + __mm_lock(anon_vma_locks, data->nr_anon_vma_locks); + data->anon_vma_locks = anon_vma_locks; + } else + vfree(anon_vma_locks); + + if (data->nr_i_mmap_locks) { + __mm_lock(i_mmap_locks, data->nr_i_mmap_locks); + data->i_mmap_locks = i_mmap_locks; + } else + vfree(i_mmap_locks); + } + return 0; +} + +static void mm_unlock_vfree(spinlock_t **locks, size_t nr) +{ + __mm_unlock(locks, nr); + vfree(locks); +} + +/* avoid memory allocations for mm_unlock to prevent deadlock */ +void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data) +{ + if (mm->map_count) { + if (data->nr_anon_vma_locks) + mm_unlock_vfree(data->anon_vma_locks, + data->nr_anon_vma_locks); + if (data->i_mmap_locks) + mm_unlock_vfree(data->i_mmap_locks, + data->nr_i_mmap_locks); + } + up_write(&mm->mmap_sem); +} diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c new file mode 100644 --- /dev/null +++ b/mm/mmu_notifier.c @@ -0,0 +1,130 @@ +/* + * linux/mm/mmu_notifier.c + * + * Copyright (C) 2008 Qumranet, Inc. + * Copyright (C) 2008 SGI + * Christoph Lameter + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ + +#include +#include +#include +#include + +/* + * No synchronization. This function can only be called when only a single + * process remains that performs teardown. + */ +void __mmu_notifier_release(struct mm_struct *mm) +{ + struct mmu_notifier *mn; + + while (unlikely(!hlist_empty(&mm->mmu_notifier_list))) { + mn = hlist_entry(mm->mmu_notifier_list.first, + struct mmu_notifier, + hlist); + hlist_del(&mn->hlist); + if (mn->ops->release) + mn->ops->release(mn, mm); + } +} + +/* + * If no young bitflag is supported by the hardware, ->clear_flush_young can + * unmap the address and return 1 or 0 depending if the mapping previously + * existed or not. + */ +int __mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + int young = 0; + + hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) { + if (mn->ops->clear_flush_young) + young |= mn->ops->clear_flush_young(mn, mm, address); + } + + return young; +} + +void __mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + + hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) { + if (mn->ops->invalidate_page) + mn->ops->invalidate_page(mn, mm, address); + } +} + +void __mmu_notifier_invalidate_range_start(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + + hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) { + if (mn->ops->invalidate_range_start) + mn->ops->invalidate_range_start(mn, mm, start, end); + } +} + +void __mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + + hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) { + if (mn->ops->invalidate_range_end) + mn->ops->invalidate_range_end(mn, mm, start, end); + } +} + +/* + * Must not hold mmap_sem nor any other VM related lock when calling + * this registration function. + */ +int mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm) +{ + struct mm_lock_data data; + int ret; + + ret = mm_lock(mm, &data); + if (unlikely(ret)) + goto out; + hlist_add_head(&mn->hlist, &mm->mmu_notifier_list); + mm_unlock(mm, &data); +out: + return ret; +} +EXPORT_SYMBOL_GPL(mmu_notifier_register); + +/* + * mm_users can't go down to zero while mmu_notifier_unregister() + * runs or it can race with ->release. So a mm_users pin must + * be taken by the caller (if mm can be different from current->mm). + */ +int mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm) +{ + struct mm_lock_data data; + int ret; + + BUG_ON(!atomic_read(&mm->mm_users)); + + ret = mm_lock(mm, &data); + if (unlikely(ret)) + goto out; + hlist_del(&mn->hlist); + mm_unlock(mm, &data); +out: + return ret; +} +EXPORT_SYMBOL_GPL(mmu_notifier_unregister); diff --git a/mm/mprotect.c b/mm/mprotect.c --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include #include @@ -198,10 +199,12 @@ dirty_accountable = 1; } + mmu_notifier_invalidate_range_start(mm, start, end); if (is_vm_hugetlb_page(vma)) hugetlb_change_protection(vma, start, end, vma->vm_page_prot); else change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable); + mmu_notifier_invalidate_range_end(mm, start, end); vm_stat_account(mm, oldflags, vma->vm_file, -nrpages); vm_stat_account(mm, newflags, vma->vm_file, nrpages); return 0; diff --git a/mm/mremap.c b/mm/mremap.c --- a/mm/mremap.c +++ b/mm/mremap.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -74,7 +75,11 @@ struct mm_struct *mm = vma->vm_mm; pte_t *old_pte, *new_pte, pte; spinlock_t *old_ptl, *new_ptl; + unsigned long old_start; + old_start = old_addr; + mmu_notifier_invalidate_range_start(vma->vm_mm, + old_start, old_end); if (vma->vm_file) { /* * Subtle point from Rajesh Venkatasubramanian: before @@ -116,6 +121,7 @@ pte_unmap_unlock(old_pte - 1, old_ptl); if (mapping) spin_unlock(&mapping->i_mmap_lock); + mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end); } #define LATENCY_LIMIT (64 * PAGE_SIZE) diff --git a/mm/rmap.c b/mm/rmap.c --- a/mm/rmap.c +++ b/mm/rmap.c @@ -49,6 +49,7 @@ #include #include #include +#include #include @@ -287,7 +288,7 @@ if (vma->vm_flags & VM_LOCKED) { referenced++; *mapcount = 1; /* break early from loop */ - } else if (ptep_clear_flush_young(vma, address, pte)) + } else if (ptep_clear_flush_young_notify(vma, address, pte)) referenced++; /* Pretend the page is referenced if the task has the @@ -456,7 +457,7 @@ pte_t entry; flush_cache_page(vma, address, pte_pfn(*pte)); - entry = ptep_clear_flush(vma, address, pte); + entry = ptep_clear_flush_notify(vma, address, pte); entry = pte_wrprotect(entry); entry = pte_mkclean(entry); set_pte_at(mm, address, pte, entry); @@ -717,14 +718,14 @@ * skipped over this mm) then we should reactivate it. */ if (!migration && ((vma->vm_flags & VM_LOCKED) || - (ptep_clear_flush_young(vma, address, pte)))) { + (ptep_clear_flush_young_notify(vma, address, pte)))) { ret = SWAP_FAIL; goto out_unmap; } /* Nuke the page table entry. */ flush_cache_page(vma, address, page_to_pfn(page)); - pteval = ptep_clear_flush(vma, address, pte); + pteval = ptep_clear_flush_notify(vma, address, pte); /* Move the dirty bit to the physical page now the pte is gone. */ if (pte_dirty(pteval)) @@ -849,12 +850,12 @@ page = vm_normal_page(vma, address, *pte); BUG_ON(!page || PageAnon(page)); - if (ptep_clear_flush_young(vma, address, pte)) + if (ptep_clear_flush_young_notify(vma, address, pte)) continue; /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); - pteval = ptep_clear_flush(vma, address, pte); + pteval = ptep_clear_flush_notify(vma, address, pte); /* If nonlinear, store the file page offset in the pte. */ if (page->index != linear_page_index(vma, address)) From andrea at qumranet.com Tue Apr 22 06:51:20 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 22 Apr 2008 15:51:20 +0200 Subject: [ofa-general] [PATCH 04 of 12] Moves all mmu notifier methods outside the PT lock (first and not last In-Reply-To: Message-ID: # HG changeset patch # User Andrea Arcangeli # Date 1208872186 -7200 # Node ID ac9bb1fb3de2aa5d27210a28edf24f6577094076 # Parent a6672bdeead0d41b2ebd6846f731d43a611645b7 Moves all mmu notifier methods outside the PT lock (first and not last step to make them sleep capable). Signed-off-by: Andrea Arcangeli diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -169,27 +169,6 @@ INIT_HLIST_HEAD(&mm->mmu_notifier_list); } -#define ptep_clear_flush_notify(__vma, __address, __ptep) \ -({ \ - pte_t __pte; \ - struct vm_area_struct *___vma = __vma; \ - unsigned long ___address = __address; \ - __pte = ptep_clear_flush(___vma, ___address, __ptep); \ - mmu_notifier_invalidate_page(___vma->vm_mm, ___address); \ - __pte; \ -}) - -#define ptep_clear_flush_young_notify(__vma, __address, __ptep) \ -({ \ - int __young; \ - struct vm_area_struct *___vma = __vma; \ - unsigned long ___address = __address; \ - __young = ptep_clear_flush_young(___vma, ___address, __ptep); \ - __young |= mmu_notifier_clear_flush_young(___vma->vm_mm, \ - ___address); \ - __young; \ -}) - #else /* CONFIG_MMU_NOTIFIER */ static inline void mmu_notifier_release(struct mm_struct *mm) @@ -221,9 +200,6 @@ { } -#define ptep_clear_flush_young_notify ptep_clear_flush_young -#define ptep_clear_flush_notify ptep_clear_flush - #endif /* CONFIG_MMU_NOTIFIER */ #endif /* _LINUX_MMU_NOTIFIER_H */ diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c --- a/mm/filemap_xip.c +++ b/mm/filemap_xip.c @@ -194,11 +194,13 @@ if (pte) { /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); - pteval = ptep_clear_flush_notify(vma, address, pte); + pteval = ptep_clear_flush(vma, address, pte); page_remove_rmap(page, vma); dec_mm_counter(mm, file_rss); BUG_ON(pte_dirty(pteval)); pte_unmap_unlock(pte, ptl); + /* must invalidate_page _before_ freeing the page */ + mmu_notifier_invalidate_page(mm, address); page_cache_release(page); } } diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -1627,9 +1627,10 @@ */ page_table = pte_offset_map_lock(mm, pmd, address, &ptl); - page_cache_release(old_page); + new_page = NULL; if (!pte_same(*page_table, orig_pte)) goto unlock; + page_cache_release(old_page); page_mkwrite = 1; } @@ -1645,6 +1646,7 @@ if (ptep_set_access_flags(vma, address, page_table, entry,1)) update_mmu_cache(vma, address, entry); ret |= VM_FAULT_WRITE; + old_page = new_page = NULL; goto unlock; } @@ -1689,7 +1691,7 @@ * seen in the presence of one thread doing SMC and another * thread doing COW. */ - ptep_clear_flush_notify(vma, address, page_table); + ptep_clear_flush(vma, address, page_table); set_pte_at(mm, address, page_table, entry); update_mmu_cache(vma, address, entry); lru_cache_add_active(new_page); @@ -1701,12 +1703,18 @@ } else mem_cgroup_uncharge_page(new_page); - if (new_page) +unlock: + pte_unmap_unlock(page_table, ptl); + + if (new_page) { + if (new_page == old_page) + /* cow happened, notify before releasing old_page */ + mmu_notifier_invalidate_page(mm, address); page_cache_release(new_page); + } if (old_page) page_cache_release(old_page); -unlock: - pte_unmap_unlock(page_table, ptl); + if (dirty_page) { if (vma->vm_file) file_update_time(vma->vm_file); diff --git a/mm/rmap.c b/mm/rmap.c --- a/mm/rmap.c +++ b/mm/rmap.c @@ -275,7 +275,7 @@ unsigned long address; pte_t *pte; spinlock_t *ptl; - int referenced = 0; + int referenced = 0, clear_flush_young = 0; address = vma_address(page, vma); if (address == -EFAULT) @@ -288,8 +288,11 @@ if (vma->vm_flags & VM_LOCKED) { referenced++; *mapcount = 1; /* break early from loop */ - } else if (ptep_clear_flush_young_notify(vma, address, pte)) - referenced++; + } else { + clear_flush_young = 1; + if (ptep_clear_flush_young(vma, address, pte)) + referenced++; + } /* Pretend the page is referenced if the task has the swap token and is in the middle of a page fault. */ @@ -299,6 +302,10 @@ (*mapcount)--; pte_unmap_unlock(pte, ptl); + + if (clear_flush_young) + referenced += mmu_notifier_clear_flush_young(mm, address); + out: return referenced; } @@ -457,7 +464,7 @@ pte_t entry; flush_cache_page(vma, address, pte_pfn(*pte)); - entry = ptep_clear_flush_notify(vma, address, pte); + entry = ptep_clear_flush(vma, address, pte); entry = pte_wrprotect(entry); entry = pte_mkclean(entry); set_pte_at(mm, address, pte, entry); @@ -465,6 +472,10 @@ } pte_unmap_unlock(pte, ptl); + + if (ret) + mmu_notifier_invalidate_page(mm, address); + out: return ret; } @@ -717,15 +728,14 @@ * If it's recently referenced (perhaps page_referenced * skipped over this mm) then we should reactivate it. */ - if (!migration && ((vma->vm_flags & VM_LOCKED) || - (ptep_clear_flush_young_notify(vma, address, pte)))) { + if (!migration && (vma->vm_flags & VM_LOCKED)) { ret = SWAP_FAIL; goto out_unmap; } /* Nuke the page table entry. */ flush_cache_page(vma, address, page_to_pfn(page)); - pteval = ptep_clear_flush_notify(vma, address, pte); + pteval = ptep_clear_flush(vma, address, pte); /* Move the dirty bit to the physical page now the pte is gone. */ if (pte_dirty(pteval)) @@ -780,6 +790,8 @@ out_unmap: pte_unmap_unlock(pte, ptl); + if (ret != SWAP_FAIL) + mmu_notifier_invalidate_page(mm, address); out: return ret; } @@ -818,7 +830,7 @@ spinlock_t *ptl; struct page *page; unsigned long address; - unsigned long end; + unsigned long start, end; address = (vma->vm_start + cursor) & CLUSTER_MASK; end = address + CLUSTER_SIZE; @@ -839,6 +851,8 @@ if (!pmd_present(*pmd)) return; + start = address; + mmu_notifier_invalidate_range_start(mm, start, end); pte = pte_offset_map_lock(mm, pmd, address, &ptl); /* Update high watermark before we lower rss */ @@ -850,12 +864,12 @@ page = vm_normal_page(vma, address, *pte); BUG_ON(!page || PageAnon(page)); - if (ptep_clear_flush_young_notify(vma, address, pte)) + if (ptep_clear_flush_young(vma, address, pte)) continue; /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); - pteval = ptep_clear_flush_notify(vma, address, pte); + pteval = ptep_clear_flush(vma, address, pte); /* If nonlinear, store the file page offset in the pte. */ if (page->index != linear_page_index(vma, address)) @@ -871,6 +885,7 @@ (*mapcount)--; } pte_unmap_unlock(pte - 1, ptl); + mmu_notifier_invalidate_range_end(mm, start, end); } static int try_to_unmap_anon(struct page *page, int migration) From andrea at qumranet.com Tue Apr 22 06:51:21 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 22 Apr 2008 15:51:21 +0200 Subject: [ofa-general] [PATCH 05 of 12] Move the tlb flushing into free_pgtables. The conversion of the locks In-Reply-To: Message-ID: # HG changeset patch # User Andrea Arcangeli # Date 1208872186 -7200 # Node ID ee8c0644d5f67c1ef59142cce91b0bb6f34a53e0 # Parent ac9bb1fb3de2aa5d27210a28edf24f6577094076 Move the tlb flushing into free_pgtables. The conversion of the locks taken for reverse map scanning would require taking sleeping locks in free_pgtables() and we cannot sleep while gathering pages for a tlb flush. Move the tlb_gather/tlb_finish call to free_pgtables() to be done for each vma. This may add a number of tlb flushes depending on the number of vmas that cannot be coalesced into one. The first pointer argument to free_pgtables() can then be dropped. Signed-off-by: Christoph Lameter diff --git a/include/linux/mm.h b/include/linux/mm.h --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -751,8 +751,8 @@ void *private); void free_pgd_range(struct mmu_gather **tlb, unsigned long addr, unsigned long end, unsigned long floor, unsigned long ceiling); -void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *start_vma, - unsigned long floor, unsigned long ceiling); +void free_pgtables(struct vm_area_struct *start_vma, unsigned long floor, + unsigned long ceiling); int copy_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *vma); void unmap_mapping_range(struct address_space *mapping, diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -272,9 +272,11 @@ } while (pgd++, addr = next, addr != end); } -void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *vma, - unsigned long floor, unsigned long ceiling) +void free_pgtables(struct vm_area_struct *vma, unsigned long floor, + unsigned long ceiling) { + struct mmu_gather *tlb; + while (vma) { struct vm_area_struct *next = vma->vm_next; unsigned long addr = vma->vm_start; @@ -286,7 +288,8 @@ unlink_file_vma(vma); if (is_vm_hugetlb_page(vma)) { - hugetlb_free_pgd_range(tlb, addr, vma->vm_end, + tlb = tlb_gather_mmu(vma->vm_mm, 0); + hugetlb_free_pgd_range(&tlb, addr, vma->vm_end, floor, next? next->vm_start: ceiling); } else { /* @@ -299,9 +302,11 @@ anon_vma_unlink(vma); unlink_file_vma(vma); } - free_pgd_range(tlb, addr, vma->vm_end, + tlb = tlb_gather_mmu(vma->vm_mm, 0); + free_pgd_range(&tlb, addr, vma->vm_end, floor, next? next->vm_start: ceiling); } + tlb_finish_mmu(tlb, addr, vma->vm_end); vma = next; } } diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1752,9 +1752,9 @@ update_hiwater_rss(mm); unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL); vm_unacct_memory(nr_accounted); - free_pgtables(&tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS, + tlb_finish_mmu(tlb, start, end); + free_pgtables(vma, prev? prev->vm_end: FIRST_USER_ADDRESS, next? next->vm_start: 0); - tlb_finish_mmu(tlb, start, end); } /* @@ -2050,8 +2050,8 @@ /* Use -1 here to ensure all VMAs in the mm are unmapped */ end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL); vm_unacct_memory(nr_accounted); - free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 0); tlb_finish_mmu(tlb, 0, end); + free_pgtables(vma, FIRST_USER_ADDRESS, 0); /* * Walk the list again, actually closing and freeing it, From andrea at qumranet.com Tue Apr 22 06:51:22 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 22 Apr 2008 15:51:22 +0200 Subject: [ofa-general] [PATCH 06 of 12] Move the tlb flushing inside of unmap vmas. This saves us from passing In-Reply-To: Message-ID: # HG changeset patch # User Andrea Arcangeli # Date 1208872186 -7200 # Node ID fbce3fecb033eb3fba1d9c2398ac74401ce0ecb5 # Parent ee8c0644d5f67c1ef59142cce91b0bb6f34a53e0 Move the tlb flushing inside of unmap vmas. This saves us from passing a pointer to the TLB structure around and simplifies the callers. Signed-off-by: Christoph Lameter diff --git a/include/linux/mm.h b/include/linux/mm.h --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -723,8 +723,7 @@ struct page *vm_normal_page(struct vm_area_struct *, unsigned long, pte_t); unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size, struct zap_details *); -unsigned long unmap_vmas(struct mmu_gather **tlb, - struct vm_area_struct *start_vma, unsigned long start_addr, +unsigned long unmap_vmas(struct vm_area_struct *start_vma, unsigned long start_addr, unsigned long end_addr, unsigned long *nr_accounted, struct zap_details *); diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -804,7 +804,6 @@ /** * unmap_vmas - unmap a range of memory covered by a list of vma's - * @tlbp: address of the caller's struct mmu_gather * @vma: the starting vma * @start_addr: virtual address at which to start unmapping * @end_addr: virtual address at which to end unmapping @@ -816,20 +815,13 @@ * Unmap all pages in the vma list. * * We aim to not hold locks for too long (for scheduling latency reasons). - * So zap pages in ZAP_BLOCK_SIZE bytecounts. This means we need to - * return the ending mmu_gather to the caller. + * So zap pages in ZAP_BLOCK_SIZE bytecounts. * * Only addresses between `start' and `end' will be unmapped. * * The VMA list must be sorted in ascending virtual address order. - * - * unmap_vmas() assumes that the caller will flush the whole unmapped address - * range after unmap_vmas() returns. So the only responsibility here is to - * ensure that any thus-far unmapped pages are flushed before unmap_vmas() - * drops the lock and schedules. */ -unsigned long unmap_vmas(struct mmu_gather **tlbp, - struct vm_area_struct *vma, unsigned long start_addr, +unsigned long unmap_vmas(struct vm_area_struct *vma, unsigned long start_addr, unsigned long end_addr, unsigned long *nr_accounted, struct zap_details *details) { @@ -838,9 +830,14 @@ int tlb_start_valid = 0; unsigned long start = start_addr; spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL; - int fullmm = (*tlbp)->fullmm; + int fullmm; + struct mmu_gather *tlb; struct mm_struct *mm = vma->vm_mm; + lru_add_drain(); + tlb = tlb_gather_mmu(mm, 0); + update_hiwater_rss(mm); + fullmm = tlb->fullmm; mmu_notifier_invalidate_range_start(mm, start_addr, end_addr); for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) { unsigned long end; @@ -867,7 +864,7 @@ (HPAGE_SIZE / PAGE_SIZE); start = end; } else - start = unmap_page_range(*tlbp, vma, + start = unmap_page_range(tlb, vma, start, end, &zap_work, details); if (zap_work > 0) { @@ -875,22 +872,23 @@ break; } - tlb_finish_mmu(*tlbp, tlb_start, start); + tlb_finish_mmu(tlb, tlb_start, start); if (need_resched() || (i_mmap_lock && spin_needbreak(i_mmap_lock))) { if (i_mmap_lock) { - *tlbp = NULL; + tlb = NULL; goto out; } cond_resched(); } - *tlbp = tlb_gather_mmu(vma->vm_mm, fullmm); + tlb = tlb_gather_mmu(vma->vm_mm, fullmm); tlb_start_valid = 0; zap_work = ZAP_BLOCK_SIZE; } } + tlb_finish_mmu(tlb, start_addr, end_addr); out: mmu_notifier_invalidate_range_end(mm, start_addr, end_addr); return start; /* which is now the end (or restart) address */ @@ -906,18 +904,10 @@ unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size, struct zap_details *details) { - struct mm_struct *mm = vma->vm_mm; - struct mmu_gather *tlb; unsigned long end = address + size; unsigned long nr_accounted = 0; - lru_add_drain(); - tlb = tlb_gather_mmu(mm, 0); - update_hiwater_rss(mm); - end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details); - if (tlb) - tlb_finish_mmu(tlb, address, end); - return end; + return unmap_vmas(vma, address, end, &nr_accounted, details); } /* diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1744,15 +1744,10 @@ unsigned long start, unsigned long end) { struct vm_area_struct *next = prev? prev->vm_next: mm->mmap; - struct mmu_gather *tlb; unsigned long nr_accounted = 0; - lru_add_drain(); - tlb = tlb_gather_mmu(mm, 0); - update_hiwater_rss(mm); - unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL); + unmap_vmas(vma, start, end, &nr_accounted, NULL); vm_unacct_memory(nr_accounted); - tlb_finish_mmu(tlb, start, end); free_pgtables(vma, prev? prev->vm_end: FIRST_USER_ADDRESS, next? next->vm_start: 0); } @@ -2034,7 +2029,6 @@ /* Release all mmaps. */ void exit_mmap(struct mm_struct *mm) { - struct mmu_gather *tlb; struct vm_area_struct *vma = mm->mmap; unsigned long nr_accounted = 0; unsigned long end; @@ -2045,12 +2039,11 @@ lru_add_drain(); flush_cache_mm(mm); - tlb = tlb_gather_mmu(mm, 1); + /* Don't update_hiwater_rss(mm) here, do_exit already did */ /* Use -1 here to ensure all VMAs in the mm are unmapped */ - end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL); + end = unmap_vmas(vma, 0, -1, &nr_accounted, NULL); vm_unacct_memory(nr_accounted); - tlb_finish_mmu(tlb, 0, end); free_pgtables(vma, FIRST_USER_ADDRESS, 0); /* From andrea at qumranet.com Tue Apr 22 06:51:23 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 22 Apr 2008 15:51:23 +0200 Subject: [ofa-general] [PATCH 07 of 12] Add a function to rw_semaphores to check if there are any processes In-Reply-To: Message-ID: <8965539f4d174c79bd37.1208872283@duo.random> # HG changeset patch # User Andrea Arcangeli # Date 1208872187 -7200 # Node ID 8965539f4d174c79bd37e58e8b037d5db906e219 # Parent fbce3fecb033eb3fba1d9c2398ac74401ce0ecb5 Add a function to rw_semaphores to check if there are any processes waiting for the semaphore. Add rwsem_needbreak to sched.h that works in the same way as spinlock_needbreak(). Signed-off-by: Christoph Lameter diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -59,6 +59,8 @@ */ extern void downgrade_write(struct rw_semaphore *sem); +extern int rwsem_is_contended(struct rw_semaphore *sem); + #ifdef CONFIG_DEBUG_LOCK_ALLOC /* * nested locking. NOTE: rwsems are not allowed to recurse diff --git a/include/linux/sched.h b/include/linux/sched.h --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1984,6 +1984,15 @@ #endif } +static inline int rwsem_needbreak(struct rw_semaphore *sem) +{ +#ifdef CONFIG_PREEMPT + return rwsem_is_contended(sem); +#else + return 0; +#endif +} + /* * Reevaluate whether the task has signals pending delivery. * Wake the task if so. diff --git a/lib/rwsem-spinlock.c b/lib/rwsem-spinlock.c --- a/lib/rwsem-spinlock.c +++ b/lib/rwsem-spinlock.c @@ -305,6 +305,18 @@ spin_unlock_irqrestore(&sem->wait_lock, flags); } +int rwsem_is_contended(struct rw_semaphore *sem) +{ + /* + * Racy check for an empty list. False positives or negatives + * would be okay. False positive may cause a useless dropping of + * locks. False negatives may cause locks to be held a bit + * longer until the next check. + */ + return !list_empty(&sem->wait_list); +} + +EXPORT_SYMBOL(rwsem_is_contended); EXPORT_SYMBOL(__init_rwsem); EXPORT_SYMBOL(__down_read); EXPORT_SYMBOL(__down_read_trylock); diff --git a/lib/rwsem.c b/lib/rwsem.c --- a/lib/rwsem.c +++ b/lib/rwsem.c @@ -251,6 +251,18 @@ return sem; } +int rwsem_is_contended(struct rw_semaphore *sem) +{ + /* + * Racy check for an empty list. False positives or negatives + * would be okay. False positive may cause a useless dropping of + * locks. False negatives may cause locks to be held a bit + * longer until the next check. + */ + return !list_empty(&sem->wait_list); +} + +EXPORT_SYMBOL(rwsem_is_contended); EXPORT_SYMBOL(rwsem_down_read_failed); EXPORT_SYMBOL(rwsem_down_write_failed); EXPORT_SYMBOL(rwsem_wake); From andrea at qumranet.com Tue Apr 22 06:51:24 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 22 Apr 2008 15:51:24 +0200 Subject: [ofa-general] [PATCH 08 of 12] The conversion to a rwsem allows notifier callbacks during rmap traversal In-Reply-To: Message-ID: <6e04df1f4284689b1c46.1208872284@duo.random> # HG changeset patch # User Andrea Arcangeli # Date 1208872187 -7200 # Node ID 6e04df1f4284689b1c46e57a67559abe49ecf292 # Parent 8965539f4d174c79bd37e58e8b037d5db906e219 The conversion to a rwsem allows notifier callbacks during rmap traversal for files. A rw style lock also allows concurrent walking of the reverse map so that multiple processors can expire pages in the same memory area of the same process. So it increases the potential concurrency. Signed-off-by: Andrea Arcangeli Signed-off-by: Christoph Lameter diff --git a/Documentation/vm/locking b/Documentation/vm/locking --- a/Documentation/vm/locking +++ b/Documentation/vm/locking @@ -66,7 +66,7 @@ expand_stack(), it is hard to come up with a destructive scenario without having the vmlist protection in this case. -The page_table_lock nests with the inode i_mmap_lock and the kmem cache +The page_table_lock nests with the inode i_mmap_sem and the kmem cache c_spinlock spinlocks. This is okay, since the kmem code asks for pages after dropping c_spinlock. The page_table_lock also nests with pagecache_lock and pagemap_lru_lock spinlocks, and no code asks for memory with these locks diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c --- a/arch/x86/mm/hugetlbpage.c +++ b/arch/x86/mm/hugetlbpage.c @@ -69,7 +69,7 @@ if (!vma_shareable(vma, addr)) return; - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); vma_prio_tree_foreach(svma, &iter, &mapping->i_mmap, idx, idx) { if (svma == vma) continue; @@ -94,7 +94,7 @@ put_page(virt_to_page(spte)); spin_unlock(&mm->page_table_lock); out: - spin_unlock(&mapping->i_mmap_lock); + up_read(&mapping->i_mmap_sem); } /* diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -454,10 +454,10 @@ pgoff = offset >> PAGE_SHIFT; i_size_write(inode, offset); - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); if (!prio_tree_empty(&mapping->i_mmap)) hugetlb_vmtruncate_list(&mapping->i_mmap, pgoff); - spin_unlock(&mapping->i_mmap_lock); + up_read(&mapping->i_mmap_sem); truncate_hugepages(inode, offset); return 0; } diff --git a/fs/inode.c b/fs/inode.c --- a/fs/inode.c +++ b/fs/inode.c @@ -210,7 +210,7 @@ INIT_LIST_HEAD(&inode->i_devices); INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC); rwlock_init(&inode->i_data.tree_lock); - spin_lock_init(&inode->i_data.i_mmap_lock); + init_rwsem(&inode->i_data.i_mmap_sem); INIT_LIST_HEAD(&inode->i_data.private_list); spin_lock_init(&inode->i_data.private_lock); INIT_RAW_PRIO_TREE_ROOT(&inode->i_data.i_mmap); diff --git a/include/linux/fs.h b/include/linux/fs.h --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -503,7 +503,7 @@ unsigned int i_mmap_writable;/* count VM_SHARED mappings */ struct prio_tree_root i_mmap; /* tree of private and shared mappings */ struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */ - spinlock_t i_mmap_lock; /* protect tree, count, list */ + struct rw_semaphore i_mmap_sem; /* protect tree, count, list */ unsigned int truncate_count; /* Cover race condition with truncate */ unsigned long nrpages; /* number of total pages */ pgoff_t writeback_index;/* writeback starts here */ diff --git a/include/linux/mm.h b/include/linux/mm.h --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -716,7 +716,7 @@ struct address_space *check_mapping; /* Check page->mapping if set */ pgoff_t first_index; /* Lowest page->index to unmap */ pgoff_t last_index; /* Highest page->index to unmap */ - spinlock_t *i_mmap_lock; /* For unmap_mapping_range: */ + struct rw_semaphore *i_mmap_sem; /* For unmap_mapping_range: */ unsigned long truncate_count; /* Compare vm_truncate_count */ }; diff --git a/kernel/fork.c b/kernel/fork.c --- a/kernel/fork.c +++ b/kernel/fork.c @@ -274,12 +274,12 @@ atomic_dec(&inode->i_writecount); /* insert tmp into the share list, just after mpnt */ - spin_lock(&file->f_mapping->i_mmap_lock); + down_write(&file->f_mapping->i_mmap_sem); tmp->vm_truncate_count = mpnt->vm_truncate_count; flush_dcache_mmap_lock(file->f_mapping); vma_prio_tree_add(tmp, mpnt); flush_dcache_mmap_unlock(file->f_mapping); - spin_unlock(&file->f_mapping->i_mmap_lock); + up_write(&file->f_mapping->i_mmap_sem); } /* diff --git a/mm/filemap.c b/mm/filemap.c --- a/mm/filemap.c +++ b/mm/filemap.c @@ -61,16 +61,16 @@ /* * Lock ordering: * - * ->i_mmap_lock (vmtruncate) + * ->i_mmap_sem (vmtruncate) * ->private_lock (__free_pte->__set_page_dirty_buffers) * ->swap_lock (exclusive_swap_page, others) * ->mapping->tree_lock * * ->i_mutex - * ->i_mmap_lock (truncate->unmap_mapping_range) + * ->i_mmap_sem (truncate->unmap_mapping_range) * * ->mmap_sem - * ->i_mmap_lock + * ->i_mmap_sem * ->page_table_lock or pte_lock (various, mainly in memory.c) * ->mapping->tree_lock (arch-dependent flush_dcache_mmap_lock) * @@ -87,7 +87,7 @@ * ->sb_lock (fs/fs-writeback.c) * ->mapping->tree_lock (__sync_single_inode) * - * ->i_mmap_lock + * ->i_mmap_sem * ->anon_vma.lock (vma_adjust) * * ->anon_vma.lock diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c --- a/mm/filemap_xip.c +++ b/mm/filemap_xip.c @@ -184,7 +184,7 @@ if (!page) return; - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) { mm = vma->vm_mm; address = vma->vm_start + @@ -204,7 +204,7 @@ page_cache_release(page); } } - spin_unlock(&mapping->i_mmap_lock); + up_read(&mapping->i_mmap_sem); } /* diff --git a/mm/fremap.c b/mm/fremap.c --- a/mm/fremap.c +++ b/mm/fremap.c @@ -206,13 +206,13 @@ } goto out; } - spin_lock(&mapping->i_mmap_lock); + down_write(&mapping->i_mmap_sem); flush_dcache_mmap_lock(mapping); vma->vm_flags |= VM_NONLINEAR; vma_prio_tree_remove(vma, &mapping->i_mmap); vma_nonlinear_insert(vma, &mapping->i_mmap_nonlinear); flush_dcache_mmap_unlock(mapping); - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); } mmu_notifier_invalidate_range_start(mm, start, start + size); diff --git a/mm/hugetlb.c b/mm/hugetlb.c --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -790,7 +790,7 @@ struct page *page; struct page *tmp; /* - * A page gathering list, protected by per file i_mmap_lock. The + * A page gathering list, protected by per file i_mmap_sem. The * lock is used to avoid list corruption from multiple unmapping * of the same page since we are using page->lru. */ @@ -840,9 +840,9 @@ * do nothing in this case. */ if (vma->vm_file) { - spin_lock(&vma->vm_file->f_mapping->i_mmap_lock); + down_write(&vma->vm_file->f_mapping->i_mmap_sem); __unmap_hugepage_range(vma, start, end); - spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock); + up_write(&vma->vm_file->f_mapping->i_mmap_sem); } } @@ -1085,7 +1085,7 @@ BUG_ON(address >= end); flush_cache_range(vma, address, end); - spin_lock(&vma->vm_file->f_mapping->i_mmap_lock); + down_write(&vma->vm_file->f_mapping->i_mmap_sem); spin_lock(&mm->page_table_lock); for (; address < end; address += HPAGE_SIZE) { ptep = huge_pte_offset(mm, address); @@ -1100,7 +1100,7 @@ } } spin_unlock(&mm->page_table_lock); - spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock); + up_write(&vma->vm_file->f_mapping->i_mmap_sem); flush_tlb_range(vma, start, end); } diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -829,7 +829,7 @@ unsigned long tlb_start = 0; /* For tlb_finish_mmu */ int tlb_start_valid = 0; unsigned long start = start_addr; - spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL; + struct rw_semaphore *i_mmap_sem = details? details->i_mmap_sem: NULL; int fullmm; struct mmu_gather *tlb; struct mm_struct *mm = vma->vm_mm; @@ -875,8 +875,8 @@ tlb_finish_mmu(tlb, tlb_start, start); if (need_resched() || - (i_mmap_lock && spin_needbreak(i_mmap_lock))) { - if (i_mmap_lock) { + (i_mmap_sem && rwsem_needbreak(i_mmap_sem))) { + if (i_mmap_sem) { tlb = NULL; goto out; } @@ -1742,7 +1742,7 @@ /* * Helper functions for unmap_mapping_range(). * - * __ Notes on dropping i_mmap_lock to reduce latency while unmapping __ + * __ Notes on dropping i_mmap_sem to reduce latency while unmapping __ * * We have to restart searching the prio_tree whenever we drop the lock, * since the iterator is only valid while the lock is held, and anyway @@ -1761,7 +1761,7 @@ * can't efficiently keep all vmas in step with mapping->truncate_count: * so instead reset them all whenever it wraps back to 0 (then go to 1). * mapping->truncate_count and vma->vm_truncate_count are protected by - * i_mmap_lock. + * i_mmap_sem. * * In order to make forward progress despite repeatedly restarting some * large vma, note the restart_addr from unmap_vmas when it breaks out: @@ -1811,7 +1811,7 @@ restart_addr = zap_page_range(vma, start_addr, end_addr - start_addr, details); - need_break = need_resched() || spin_needbreak(details->i_mmap_lock); + need_break = need_resched() || rwsem_needbreak(details->i_mmap_sem); if (restart_addr >= end_addr) { /* We have now completed this vma: mark it so */ @@ -1825,9 +1825,9 @@ goto again; } - spin_unlock(details->i_mmap_lock); + up_write(details->i_mmap_sem); cond_resched(); - spin_lock(details->i_mmap_lock); + down_write(details->i_mmap_sem); return -EINTR; } @@ -1921,9 +1921,9 @@ details.last_index = hba + hlen - 1; if (details.last_index < details.first_index) details.last_index = ULONG_MAX; - details.i_mmap_lock = &mapping->i_mmap_lock; + details.i_mmap_sem = &mapping->i_mmap_sem; - spin_lock(&mapping->i_mmap_lock); + down_write(&mapping->i_mmap_sem); /* Protect against endless unmapping loops */ mapping->truncate_count++; @@ -1938,7 +1938,7 @@ unmap_mapping_range_tree(&mapping->i_mmap, &details); if (unlikely(!list_empty(&mapping->i_mmap_nonlinear))) unmap_mapping_range_list(&mapping->i_mmap_nonlinear, &details); - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); } EXPORT_SYMBOL(unmap_mapping_range); diff --git a/mm/migrate.c b/mm/migrate.c --- a/mm/migrate.c +++ b/mm/migrate.c @@ -211,12 +211,12 @@ if (!mapping) return; - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) remove_migration_pte(vma, old, new); - spin_unlock(&mapping->i_mmap_lock); + up_read(&mapping->i_mmap_sem); } /* diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -189,7 +189,7 @@ } /* - * Requires inode->i_mapping->i_mmap_lock + * Requires inode->i_mapping->i_mmap_sem */ static void __remove_shared_vm_struct(struct vm_area_struct *vma, struct file *file, struct address_space *mapping) @@ -217,9 +217,9 @@ if (file) { struct address_space *mapping = file->f_mapping; - spin_lock(&mapping->i_mmap_lock); + down_write(&mapping->i_mmap_sem); __remove_shared_vm_struct(vma, file, mapping); - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); } } @@ -442,7 +442,7 @@ mapping = vma->vm_file->f_mapping; if (mapping) { - spin_lock(&mapping->i_mmap_lock); + down_write(&mapping->i_mmap_sem); vma->vm_truncate_count = mapping->truncate_count; } anon_vma_lock(vma); @@ -452,7 +452,7 @@ anon_vma_unlock(vma); if (mapping) - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); mm->map_count++; validate_mm(mm); @@ -539,7 +539,7 @@ mapping = file->f_mapping; if (!(vma->vm_flags & VM_NONLINEAR)) root = &mapping->i_mmap; - spin_lock(&mapping->i_mmap_lock); + down_write(&mapping->i_mmap_sem); if (importer && vma->vm_truncate_count != next->vm_truncate_count) { /* @@ -623,7 +623,7 @@ if (anon_vma) spin_unlock(&anon_vma->lock); if (mapping) - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); if (remove_next) { if (file) @@ -2058,7 +2058,7 @@ /* Insert vm structure into process list sorted by address * and into the inode's i_mmap tree. If vm_file is non-NULL - * then i_mmap_lock is taken here. + * then i_mmap_sem is taken here. */ int insert_vm_struct(struct mm_struct * mm, struct vm_area_struct * vma) { diff --git a/mm/mremap.c b/mm/mremap.c --- a/mm/mremap.c +++ b/mm/mremap.c @@ -88,7 +88,7 @@ * and we propagate stale pages into the dst afterward. */ mapping = vma->vm_file->f_mapping; - spin_lock(&mapping->i_mmap_lock); + down_write(&mapping->i_mmap_sem); if (new_vma->vm_truncate_count && new_vma->vm_truncate_count != vma->vm_truncate_count) new_vma->vm_truncate_count = 0; @@ -120,7 +120,7 @@ pte_unmap_nested(new_pte - 1); pte_unmap_unlock(old_pte - 1, old_ptl); if (mapping) - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end); } diff --git a/mm/rmap.c b/mm/rmap.c --- a/mm/rmap.c +++ b/mm/rmap.c @@ -24,7 +24,7 @@ * inode->i_alloc_sem (vmtruncate_range) * mm->mmap_sem * page->flags PG_locked (lock_page) - * mapping->i_mmap_lock + * mapping->i_mmap_sem * anon_vma->lock * mm->page_table_lock or pte_lock * zone->lru_lock (in mark_page_accessed, isolate_lru_page) @@ -373,14 +373,14 @@ * The page lock not only makes sure that page->mapping cannot * suddenly be NULLified by truncation, it makes sure that the * structure at mapping cannot be freed and reused yet, - * so we can safely take mapping->i_mmap_lock. + * so we can safely take mapping->i_mmap_sem. */ BUG_ON(!PageLocked(page)); - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); /* - * i_mmap_lock does not stabilize mapcount at all, but mapcount + * i_mmap_sem does not stabilize mapcount at all, but mapcount * is more likely to be accurate if we note it after spinning. */ mapcount = page_mapcount(page); @@ -403,7 +403,7 @@ break; } - spin_unlock(&mapping->i_mmap_lock); + up_read(&mapping->i_mmap_sem); return referenced; } @@ -489,12 +489,12 @@ BUG_ON(PageAnon(page)); - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) { if (vma->vm_flags & VM_SHARED) ret += page_mkclean_one(page, vma); } - spin_unlock(&mapping->i_mmap_lock); + up_read(&mapping->i_mmap_sem); return ret; } @@ -930,7 +930,7 @@ unsigned long max_nl_size = 0; unsigned int mapcount; - spin_lock(&mapping->i_mmap_lock); + down_read(&mapping->i_mmap_sem); vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) { ret = try_to_unmap_one(page, vma, migration); if (ret == SWAP_FAIL || !page_mapped(page)) @@ -967,7 +967,6 @@ mapcount = page_mapcount(page); if (!mapcount) goto out; - cond_resched_lock(&mapping->i_mmap_lock); max_nl_size = (max_nl_size + CLUSTER_SIZE - 1) & CLUSTER_MASK; if (max_nl_cursor == 0) @@ -989,7 +988,6 @@ } vma->vm_private_data = (void *) max_nl_cursor; } - cond_resched_lock(&mapping->i_mmap_lock); max_nl_cursor += CLUSTER_SIZE; } while (max_nl_cursor <= max_nl_size); @@ -1001,7 +999,7 @@ list_for_each_entry(vma, &mapping->i_mmap_nonlinear, shared.vm_set.list) vma->vm_private_data = NULL; out: - spin_unlock(&mapping->i_mmap_lock); + up_write(&mapping->i_mmap_sem); return ret; } From andrea at qumranet.com Tue Apr 22 06:51:25 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 22 Apr 2008 15:51:25 +0200 Subject: [ofa-general] [PATCH 09 of 12] Convert the anon_vma spinlock to a rw semaphore. This allows concurrent In-Reply-To: Message-ID: # HG changeset patch # User Andrea Arcangeli # Date 1208872187 -7200 # Node ID bdb3d928a0ba91cdce2b61bd40a2f80bddbe4ff2 # Parent 6e04df1f4284689b1c46e57a67559abe49ecf292 Convert the anon_vma spinlock to a rw semaphore. This allows concurrent traversal of reverse maps for try_to_unmap() and page_mkclean(). It also allows the calling of sleeping functions from reverse map traversal as needed for the notifier callbacks. It includes possible concurrency. Rcu is used in some context to guarantee the presence of the anon_vma (try_to_unmap) while we acquire the anon_vma lock. We cannot take a semaphore within an rcu critical section. Add a refcount to the anon_vma structure which allow us to give an existence guarantee for the anon_vma structure independent of the spinlock or the list contents. The refcount can then be taken within the RCU section. If it has been taken successfully then the refcount guarantees the existence of the anon_vma. The refcount in anon_vma also allows us to fix a nasty issue in page migration where we fudged by using rcu for a long code path to guarantee the existence of the anon_vma. I think this is a bug because the anon_vma may become empty and get scheduled to be freed but then we increase the refcount again when the migration entries are removed. The refcount in general allows a shortening of RCU critical sections since we can do a rcu_unlock after taking the refcount. This is particularly relevant if the anon_vma chains contain hundreds of entries. However: - Atomic overhead increases in situations where a new reference to the anon_vma has to be established or removed. Overhead also increases when a speculative reference is used (try_to_unmap, page_mkclean, page migration). - There is the potential for more frequent processor change due to up_xxx letting waiting tasks run first. This results in f.e. the Aim9 brk performance test to got down by 10-15%. Signed-off-by: Christoph Lameter diff --git a/include/linux/rmap.h b/include/linux/rmap.h --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -25,7 +25,8 @@ * pointing to this anon_vma once its vma list is empty. */ struct anon_vma { - spinlock_t lock; /* Serialize access to vma list */ + atomic_t refcount; /* vmas on the list */ + struct rw_semaphore sem;/* Serialize access to vma list */ struct list_head head; /* List of private "related" vmas */ }; @@ -43,18 +44,31 @@ kmem_cache_free(anon_vma_cachep, anon_vma); } +struct anon_vma *grab_anon_vma(struct page *page); + +static inline void get_anon_vma(struct anon_vma *anon_vma) +{ + atomic_inc(&anon_vma->refcount); +} + +static inline void put_anon_vma(struct anon_vma *anon_vma) +{ + if (atomic_dec_and_test(&anon_vma->refcount)) + anon_vma_free(anon_vma); +} + static inline void anon_vma_lock(struct vm_area_struct *vma) { struct anon_vma *anon_vma = vma->anon_vma; if (anon_vma) - spin_lock(&anon_vma->lock); + down_write(&anon_vma->sem); } static inline void anon_vma_unlock(struct vm_area_struct *vma) { struct anon_vma *anon_vma = vma->anon_vma; if (anon_vma) - spin_unlock(&anon_vma->lock); + up_write(&anon_vma->sem); } /* diff --git a/mm/migrate.c b/mm/migrate.c --- a/mm/migrate.c +++ b/mm/migrate.c @@ -235,15 +235,16 @@ return; /* - * We hold the mmap_sem lock. So no need to call page_lock_anon_vma. + * We hold either the mmap_sem lock or a reference on the + * anon_vma. So no need to call page_lock_anon_vma. */ anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON); - spin_lock(&anon_vma->lock); + down_read(&anon_vma->sem); list_for_each_entry(vma, &anon_vma->head, anon_vma_node) remove_migration_pte(vma, old, new); - spin_unlock(&anon_vma->lock); + up_read(&anon_vma->sem); } /* @@ -623,7 +624,7 @@ int rc = 0; int *result = NULL; struct page *newpage = get_new_page(page, private, &result); - int rcu_locked = 0; + struct anon_vma *anon_vma = NULL; int charge = 0; if (!newpage) @@ -647,16 +648,14 @@ } /* * By try_to_unmap(), page->mapcount goes down to 0 here. In this case, - * we cannot notice that anon_vma is freed while we migrates a page. + * we cannot notice that anon_vma is freed while we migrate a page. * This rcu_read_lock() delays freeing anon_vma pointer until the end * of migration. File cache pages are no problem because of page_lock() * File Caches may use write_page() or lock_page() in migration, then, * just care Anon page here. */ - if (PageAnon(page)) { - rcu_read_lock(); - rcu_locked = 1; - } + if (PageAnon(page)) + anon_vma = grab_anon_vma(page); /* * Corner case handling: @@ -674,10 +673,7 @@ if (!PageAnon(page) && PagePrivate(page)) { /* * Go direct to try_to_free_buffers() here because - * a) that's what try_to_release_page() would do anyway - * b) we may be under rcu_read_lock() here, so we can't - * use GFP_KERNEL which is what try_to_release_page() - * needs to be effective. + * that's what try_to_release_page() would do anyway */ try_to_free_buffers(page); } @@ -698,8 +694,8 @@ } else if (charge) mem_cgroup_end_migration(newpage); rcu_unlock: - if (rcu_locked) - rcu_read_unlock(); + if (anon_vma) + put_anon_vma(anon_vma); unlock: diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -567,7 +567,7 @@ if (vma->anon_vma) anon_vma = vma->anon_vma; if (anon_vma) { - spin_lock(&anon_vma->lock); + down_write(&anon_vma->sem); /* * Easily overlooked: when mprotect shifts the boundary, * make sure the expanding vma has anon_vma set if the @@ -621,7 +621,7 @@ } if (anon_vma) - spin_unlock(&anon_vma->lock); + up_write(&anon_vma->sem); if (mapping) up_write(&mapping->i_mmap_sem); diff --git a/mm/rmap.c b/mm/rmap.c --- a/mm/rmap.c +++ b/mm/rmap.c @@ -69,7 +69,7 @@ if (anon_vma) { allocated = NULL; locked = anon_vma; - spin_lock(&locked->lock); + down_write(&locked->sem); } else { anon_vma = anon_vma_alloc(); if (unlikely(!anon_vma)) @@ -81,6 +81,7 @@ /* page_table_lock to protect against threads */ spin_lock(&mm->page_table_lock); if (likely(!vma->anon_vma)) { + get_anon_vma(anon_vma); vma->anon_vma = anon_vma; list_add_tail(&vma->anon_vma_node, &anon_vma->head); allocated = NULL; @@ -88,7 +89,7 @@ spin_unlock(&mm->page_table_lock); if (locked) - spin_unlock(&locked->lock); + up_write(&locked->sem); if (unlikely(allocated)) anon_vma_free(allocated); } @@ -99,14 +100,17 @@ { BUG_ON(vma->anon_vma != next->anon_vma); list_del(&next->anon_vma_node); + put_anon_vma(vma->anon_vma); } void __anon_vma_link(struct vm_area_struct *vma) { struct anon_vma *anon_vma = vma->anon_vma; - if (anon_vma) + if (anon_vma) { + get_anon_vma(anon_vma); list_add_tail(&vma->anon_vma_node, &anon_vma->head); + } } void anon_vma_link(struct vm_area_struct *vma) @@ -114,36 +118,32 @@ struct anon_vma *anon_vma = vma->anon_vma; if (anon_vma) { - spin_lock(&anon_vma->lock); + get_anon_vma(anon_vma); + down_write(&anon_vma->sem); list_add_tail(&vma->anon_vma_node, &anon_vma->head); - spin_unlock(&anon_vma->lock); + up_write(&anon_vma->sem); } } void anon_vma_unlink(struct vm_area_struct *vma) { struct anon_vma *anon_vma = vma->anon_vma; - int empty; if (!anon_vma) return; - spin_lock(&anon_vma->lock); + down_write(&anon_vma->sem); list_del(&vma->anon_vma_node); - - /* We must garbage collect the anon_vma if it's empty */ - empty = list_empty(&anon_vma->head); - spin_unlock(&anon_vma->lock); - - if (empty) - anon_vma_free(anon_vma); + up_write(&anon_vma->sem); + put_anon_vma(anon_vma); } static void anon_vma_ctor(struct kmem_cache *cachep, void *data) { struct anon_vma *anon_vma = data; - spin_lock_init(&anon_vma->lock); + init_rwsem(&anon_vma->sem); + atomic_set(&anon_vma->refcount, 0); INIT_LIST_HEAD(&anon_vma->head); } @@ -157,9 +157,9 @@ * Getting a lock on a stable anon_vma from a page off the LRU is * tricky: page_lock_anon_vma rely on RCU to guard against the races. */ -static struct anon_vma *page_lock_anon_vma(struct page *page) +struct anon_vma *grab_anon_vma(struct page *page) { - struct anon_vma *anon_vma; + struct anon_vma *anon_vma = NULL; unsigned long anon_mapping; rcu_read_lock(); @@ -170,17 +170,26 @@ goto out; anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); - spin_lock(&anon_vma->lock); - return anon_vma; + if (!atomic_inc_not_zero(&anon_vma->refcount)) + anon_vma = NULL; out: rcu_read_unlock(); - return NULL; + return anon_vma; +} + +static struct anon_vma *page_lock_anon_vma(struct page *page) +{ + struct anon_vma *anon_vma = grab_anon_vma(page); + + if (anon_vma) + down_read(&anon_vma->sem); + return anon_vma; } static void page_unlock_anon_vma(struct anon_vma *anon_vma) { - spin_unlock(&anon_vma->lock); - rcu_read_unlock(); + up_read(&anon_vma->sem); + put_anon_vma(anon_vma); } /* From andrea at qumranet.com Tue Apr 22 06:51:26 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 22 Apr 2008 15:51:26 +0200 Subject: [ofa-general] [PATCH 10 of 12] Convert mm_lock to use semaphores after i_mmap_lock and anon_vma_lock In-Reply-To: Message-ID: # HG changeset patch # User Andrea Arcangeli # Date 1208872187 -7200 # Node ID f8210c45f1c6f8b38d15e5dfebbc5f7c1f890c93 # Parent bdb3d928a0ba91cdce2b61bd40a2f80bddbe4ff2 Convert mm_lock to use semaphores after i_mmap_lock and anon_vma_lock conversion. Signed-off-by: Andrea Arcangeli diff --git a/include/linux/mm.h b/include/linux/mm.h --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1062,10 +1062,10 @@ * mm_lock and mm_unlock are expensive operations that may take a long time. */ struct mm_lock_data { - spinlock_t **i_mmap_locks; - spinlock_t **anon_vma_locks; - size_t nr_i_mmap_locks; - size_t nr_anon_vma_locks; + struct rw_semaphore **i_mmap_sems; + struct rw_semaphore **anon_vma_sems; + size_t nr_i_mmap_sems; + size_t nr_anon_vma_sems; }; extern int mm_lock(struct mm_struct *mm, struct mm_lock_data *data); extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data); diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2243,8 +2243,8 @@ static int mm_lock_cmp(const void *a, const void *b) { cond_resched(); - if ((unsigned long)*(spinlock_t **)a < - (unsigned long)*(spinlock_t **)b) + if ((unsigned long)*(struct rw_semaphore **)a < + (unsigned long)*(struct rw_semaphore **)b) return -1; else if (a == b) return 0; @@ -2252,7 +2252,7 @@ return 1; } -static unsigned long mm_lock_sort(struct mm_struct *mm, spinlock_t **locks, +static unsigned long mm_lock_sort(struct mm_struct *mm, struct rw_semaphore **sems, int anon) { struct vm_area_struct *vma; @@ -2261,59 +2261,59 @@ for (vma = mm->mmap; vma; vma = vma->vm_next) { if (anon) { if (vma->anon_vma) - locks[i++] = &vma->anon_vma->lock; + sems[i++] = &vma->anon_vma->sem; } else { if (vma->vm_file && vma->vm_file->f_mapping) - locks[i++] = &vma->vm_file->f_mapping->i_mmap_lock; + sems[i++] = &vma->vm_file->f_mapping->i_mmap_sem; } } if (!i) goto out; - sort(locks, i, sizeof(spinlock_t *), mm_lock_cmp, NULL); + sort(sems, i, sizeof(struct rw_semaphore *), mm_lock_cmp, NULL); out: return i; } static inline unsigned long mm_lock_sort_anon_vma(struct mm_struct *mm, - spinlock_t **locks) + struct rw_semaphore **sems) { - return mm_lock_sort(mm, locks, 1); + return mm_lock_sort(mm, sems, 1); } static inline unsigned long mm_lock_sort_i_mmap(struct mm_struct *mm, - spinlock_t **locks) + struct rw_semaphore **sems) { - return mm_lock_sort(mm, locks, 0); + return mm_lock_sort(mm, sems, 0); } -static void mm_lock_unlock(spinlock_t **locks, size_t nr, int lock) +static void mm_lock_unlock(struct rw_semaphore **sems, size_t nr, int lock) { - spinlock_t *last = NULL; + struct rw_semaphore *last = NULL; size_t i; for (i = 0; i < nr; i++) /* Multiple vmas may use the same lock. */ - if (locks[i] != last) { - BUG_ON((unsigned long) last > (unsigned long) locks[i]); - last = locks[i]; + if (sems[i] != last) { + BUG_ON((unsigned long) last > (unsigned long) sems[i]); + last = sems[i]; if (lock) - spin_lock(last); + down_write(last); else - spin_unlock(last); + up_write(last); } } -static inline void __mm_lock(spinlock_t **locks, size_t nr) +static inline void __mm_lock(struct rw_semaphore **sems, size_t nr) { - mm_lock_unlock(locks, nr, 1); + mm_lock_unlock(sems, nr, 1); } -static inline void __mm_unlock(spinlock_t **locks, size_t nr) +static inline void __mm_unlock(struct rw_semaphore **sems, size_t nr) { - mm_lock_unlock(locks, nr, 0); + mm_lock_unlock(sems, nr, 0); } /* @@ -2325,57 +2325,57 @@ */ int mm_lock(struct mm_struct *mm, struct mm_lock_data *data) { - spinlock_t **anon_vma_locks, **i_mmap_locks; + struct rw_semaphore **anon_vma_sems, **i_mmap_sems; down_write(&mm->mmap_sem); if (mm->map_count) { - anon_vma_locks = vmalloc(sizeof(spinlock_t *) * mm->map_count); - if (unlikely(!anon_vma_locks)) { + anon_vma_sems = vmalloc(sizeof(struct rw_semaphore *) * mm->map_count); + if (unlikely(!anon_vma_sems)) { up_write(&mm->mmap_sem); return -ENOMEM; } - i_mmap_locks = vmalloc(sizeof(spinlock_t *) * mm->map_count); - if (unlikely(!i_mmap_locks)) { + i_mmap_sems = vmalloc(sizeof(struct rw_semaphore *) * mm->map_count); + if (unlikely(!i_mmap_sems)) { up_write(&mm->mmap_sem); - vfree(anon_vma_locks); + vfree(anon_vma_sems); return -ENOMEM; } - data->nr_anon_vma_locks = mm_lock_sort_anon_vma(mm, anon_vma_locks); - data->nr_i_mmap_locks = mm_lock_sort_i_mmap(mm, i_mmap_locks); + data->nr_anon_vma_sems = mm_lock_sort_anon_vma(mm, anon_vma_sems); + data->nr_i_mmap_sems = mm_lock_sort_i_mmap(mm, i_mmap_sems); - if (data->nr_anon_vma_locks) { - __mm_lock(anon_vma_locks, data->nr_anon_vma_locks); - data->anon_vma_locks = anon_vma_locks; + if (data->nr_anon_vma_sems) { + __mm_lock(anon_vma_sems, data->nr_anon_vma_sems); + data->anon_vma_sems = anon_vma_sems; } else - vfree(anon_vma_locks); + vfree(anon_vma_sems); - if (data->nr_i_mmap_locks) { - __mm_lock(i_mmap_locks, data->nr_i_mmap_locks); - data->i_mmap_locks = i_mmap_locks; + if (data->nr_i_mmap_sems) { + __mm_lock(i_mmap_sems, data->nr_i_mmap_sems); + data->i_mmap_sems = i_mmap_sems; } else - vfree(i_mmap_locks); + vfree(i_mmap_sems); } return 0; } -static void mm_unlock_vfree(spinlock_t **locks, size_t nr) +static void mm_unlock_vfree(struct rw_semaphore **sems, size_t nr) { - __mm_unlock(locks, nr); - vfree(locks); + __mm_unlock(sems, nr); + vfree(sems); } /* avoid memory allocations for mm_unlock to prevent deadlock */ void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data) { if (mm->map_count) { - if (data->nr_anon_vma_locks) - mm_unlock_vfree(data->anon_vma_locks, - data->nr_anon_vma_locks); - if (data->i_mmap_locks) - mm_unlock_vfree(data->i_mmap_locks, - data->nr_i_mmap_locks); + if (data->nr_anon_vma_sems) + mm_unlock_vfree(data->anon_vma_sems, + data->nr_anon_vma_sems); + if (data->i_mmap_sems) + mm_unlock_vfree(data->i_mmap_sems, + data->nr_i_mmap_sems); } up_write(&mm->mmap_sem); } From andrea at qumranet.com Tue Apr 22 06:51:27 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 22 Apr 2008 15:51:27 +0200 Subject: [ofa-general] [PATCH 11 of 12] XPMEM would have used sys_madvise() except that madvise_dontneed() In-Reply-To: Message-ID: <128d705f38c8a774ac11.1208872287@duo.random> # HG changeset patch # User Andrea Arcangeli # Date 1208872187 -7200 # Node ID 128d705f38c8a774ac11559db445787ce6e91c77 # Parent f8210c45f1c6f8b38d15e5dfebbc5f7c1f890c93 XPMEM would have used sys_madvise() except that madvise_dontneed() returns an -EINVAL if VM_PFNMAP is set, which is always true for the pages XPMEM imports from other partitions and is also true for uncached pages allocated locally via the mspec allocator. XPMEM needs zap_page_range() functionality for these types of pages as well as 'normal' pages. Signed-off-by: Dean Nelson diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -909,6 +909,7 @@ return unmap_vmas(vma, address, end, &nr_accounted, details); } +EXPORT_SYMBOL_GPL(zap_page_range); /* * Do a quick page-table lookup for a single page. From andrea at qumranet.com Tue Apr 22 06:51:28 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 22 Apr 2008 15:51:28 +0200 Subject: [ofa-general] [PATCH 12 of 12] This patch adds a lock ordering rule to avoid a potential deadlock when In-Reply-To: Message-ID: # HG changeset patch # User Andrea Arcangeli # Date 1208872187 -7200 # Node ID e847039ee2e815088661933b7195584847dc7540 # Parent 128d705f38c8a774ac11559db445787ce6e91c77 This patch adds a lock ordering rule to avoid a potential deadlock when multiple mmap_sems need to be locked. Signed-off-by: Dean Nelson diff --git a/mm/filemap.c b/mm/filemap.c --- a/mm/filemap.c +++ b/mm/filemap.c @@ -79,6 +79,9 @@ * * ->i_mutex (generic_file_buffered_write) * ->mmap_sem (fault_in_pages_readable->do_page_fault) + * + * When taking multiple mmap_sems, one should lock the lowest-addressed + * one first proceeding on up to the highest-addressed one. * * ->i_mutex * ->i_alloc_sem (various) From yevgenyp at mellanox.co.il Tue Apr 22 07:10:07 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Tue, 22 Apr 2008 17:10:07 +0300 Subject: [ofa-general][PATCH] mlx4: Dynamic port configuration (MP support, Patch 7) Message-ID: <480DF1BF.5000702@mellanox.co.il> >From e13bef843cb2c7cee5a0ba388d97e21188087424 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Tue, 22 Apr 2008 15:14:30 +0300 Subject: [PATCH] mlx4: Dynamic port configuration Port type can be set using sysfs interface when the low level driver is up. The low level driver unregisters all its customers and then registers them again with the new port types (which they query for in add_one) Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/main.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 97 insertions(+), 0 deletions(-) diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index a528809..e3fd4e9 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -281,6 +281,96 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) return 0; } +static int mlx4_change_port_types(struct mlx4_dev *dev, + enum mlx4_port_type *port_types) +{ + int i; + int err = 0; + int change = 0; + int port; + + for (i = 0; i < MLX4_MAX_PORTS; i++) { + if (port_types[i] != dev->caps.port_type[i + 1]) { + change = 1; + dev->caps.port_type[i + 1] = port_types[i]; + } + } + if (change) { + mlx4_unregister_device(dev); + for (port = 1; port <= dev->caps.num_ports; port++) { + mlx4_CLOSE_PORT(dev, port); + err = mlx4_SET_PORT(dev, port); + if (err) { + mlx4_err(dev, "Failed to set port %d, " + "aborting\n", port); + return err; + } + } + err = mlx4_register_device(dev); + } + return err; +} + +static ssize_t show_port_type(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct pci_dev *pdev = to_pci_dev(dev); + struct mlx4_dev *mdev = pci_get_drvdata(pdev); + int i; + + sprintf(buf, "Current port types:\n"); + for (i = 1; i <= MLX4_MAX_PORTS; i++) { + sprintf(buf, "%sPort%d: %s\n", buf, i, + (mdev->caps.port_type[i] == MLX4_PORT_TYPE_IB)? + "ib": "eth"); + } + return strlen(buf); +} + +static ssize_t set_port_type(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct pci_dev *pdev = to_pci_dev(dev); + struct mlx4_dev *mdev = pci_get_drvdata(pdev); + char *type; + enum mlx4_port_type port_types[MLX4_MAX_PORTS]; + char *loc_buf; + char *ptr; + int i; + int err = 0; + + loc_buf = kmalloc(count + 1, GFP_KERNEL); + if (!loc_buf) + return -ENOMEM; + + ptr = loc_buf; + memcpy(loc_buf, buf, count + 1); + for (i = 0; i < MLX4_MAX_PORTS; i++) { + type = strsep(&loc_buf, ","); + if (!strcmp(type, "ib")) + port_types[i] = MLX4_PORT_TYPE_IB; + else if (!strcmp(type, "eth")) + port_types[i] = MLX4_PORT_TYPE_ETH; + else { + dev_warn(dev, "%s is not acceptable port type " + "(use 'eth' or 'ib' only)\n", type); + err = -EINVAL; + goto out; + } + } + err = mlx4_check_port_params(mdev, port_types); + if (err) + goto out; + + err = mlx4_change_port_types(mdev, port_types); +out: + kfree(ptr); + return err ? err: count; +} +static DEVICE_ATTR(mlx4_port_type, S_IWUGO | S_IRUGO, show_port_type, set_port_type); + static int mlx4_load_fw(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); @@ -979,8 +1069,14 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id) pci_set_drvdata(pdev, dev); + if (device_create_file(&pdev->dev, &dev_attr_mlx4_port_type)) + goto err_sysfs; + return 0; +err_sysfs: + mlx4_unregister_device(dev); + err_cleanup: mlx4_cleanup_mcg_table(dev); mlx4_cleanup_qp_table(dev); @@ -1036,6 +1132,7 @@ static void mlx4_remove_one(struct pci_dev *pdev) int p; if (dev) { + device_remove_file(&pdev->dev, &dev_attr_mlx4_port_type); mlx4_unregister_device(dev); for (p = 1; p <= dev->caps.num_ports; ++p) -- 1.5.4 From yevgenyp at mellanox.co.il Tue Apr 22 07:13:54 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Tue, 22 Apr 2008 17:13:54 +0300 Subject: [ofa-general][PATCH] mlx4: Completion EQ per cpu (MP support, Patch 10) Message-ID: <480DF2A2.8030602@mellanox.co.il> >From 2a2d22208f6fdba4c0c2afdf0ed12ef07b93d661 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Tue, 22 Apr 2008 16:39:47 +0300 Subject: [PATCH] mlx4: Completion EQ per cpu Completion eq's are created per cpu. Created cq's are attached to an eq by "Round Robin" algorithm, unless a specific eq was requested. Signed-off-by: Yevgeny Petrilin --- drivers/infiniband/hw/mlx4/cq.c | 2 +- drivers/net/mlx4/cq.c | 19 ++++++++++++++++--- drivers/net/mlx4/eq.c | 39 ++++++++++++++++++++++++++------------- drivers/net/mlx4/main.c | 14 ++++++++------ drivers/net/mlx4/mlx4.h | 6 ++++-- include/linux/mlx4/device.h | 3 ++- 6 files changed, 57 insertions(+), 26 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 63daf52..732f812 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -221,7 +221,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector } err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, uar, - cq->db.dma, &cq->mcq, 0); + cq->db.dma, &cq->mcq, vector, 0); if (err) goto err_dbmap; diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c index d893cc1..bbb4c7b 100644 --- a/drivers/net/mlx4/cq.c +++ b/drivers/net/mlx4/cq.c @@ -189,7 +189,7 @@ EXPORT_SYMBOL_GPL(mlx4_cq_resize); int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq, - int collapsed) + unsigned vector, int collapsed) { struct mlx4_priv *priv = mlx4_priv(dev); struct mlx4_cq_table *cq_table = &priv->cq_table; @@ -227,7 +227,20 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, cq_context->flags = cpu_to_be32(!!collapsed << 18); cq_context->logsize_usrpage = cpu_to_be32((ilog2(nent) << 24) | uar->index); - cq_context->comp_eqn = priv->eq_table.eq[MLX4_EQ_COMP].eqn; + + if (vector > priv->eq_table.num_comp_eqs) { + err = -EINVAL; + goto err_radix; + } + + if (vector == 0) { + vector = priv->eq_table.last_comp_eq % + priv->eq_table.num_comp_eqs + 1; + priv->eq_table.last_comp_eq = vector; + } + cq->comp_eq_idx = MLX4_EQ_COMP_CPU0 + vector - 1; + cq_context->comp_eqn = priv->eq_table.eq[MLX4_EQ_COMP_CPU0 + + vector - 1].eqn; cq_context->log_page_size = mtt->page_shift - MLX4_ICM_PAGE_SHIFT; mtt_addr = mlx4_mtt_addr(dev, mtt); @@ -276,7 +289,7 @@ void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq) if (err) mlx4_warn(dev, "HW2SW_CQ failed (%d) for CQN %06x\n", err, cq->cqn); - synchronize_irq(priv->eq_table.eq[MLX4_EQ_COMP].irq); + synchronize_irq(priv->eq_table.eq[cq->comp_eq_idx].irq); spin_lock_irq(&cq_table->lock); radix_tree_delete(&cq_table->tree, cq->cqn); diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c index e141a15..b4676db 100644 --- a/drivers/net/mlx4/eq.c +++ b/drivers/net/mlx4/eq.c @@ -265,7 +265,7 @@ static irqreturn_t mlx4_interrupt(int irq, void *dev_ptr) writel(priv->eq_table.clr_mask, priv->eq_table.clr_int); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs; ++i) work |= mlx4_eq_int(dev, &priv->eq_table.eq[i]); return IRQ_RETVAL(work); @@ -482,7 +482,7 @@ static void mlx4_free_irqs(struct mlx4_dev *dev) if (eq_table->have_irq) free_irq(dev->pdev->irq, dev); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < MLX4_EQ_COMP_CPU0 + eq_table->num_comp_eqs; ++i) if (eq_table->eq[i].have_irq) free_irq(eq_table->eq[i].irq, eq_table->eq + i); } @@ -553,6 +553,7 @@ void mlx4_unmap_eq_icm(struct mlx4_dev *dev) int mlx4_init_eq_table(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); + int req_eqs; int err; int i; @@ -573,11 +574,22 @@ int mlx4_init_eq_table(struct mlx4_dev *dev) priv->eq_table.clr_int = priv->clr_base + (priv->eq_table.inta_pin < 32 ? 4 : 0); - err = mlx4_create_eq(dev, dev->caps.num_cqs + MLX4_NUM_SPARE_EQE, - (dev->flags & MLX4_FLAG_MSI_X) ? MLX4_EQ_COMP : 0, - &priv->eq_table.eq[MLX4_EQ_COMP]); - if (err) - goto err_out_unmap; + priv->eq_table.num_comp_eqs = 0; + req_eqs = (dev->flags & MLX4_FLAG_MSI_X) ? num_online_cpus() : 1; + while (req_eqs) { + err = mlx4_create_eq( + dev, dev->caps.num_cqs + MLX4_NUM_SPARE_EQE, + (dev->flags & MLX4_FLAG_MSI_X) ? + (MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs) : 0, + &priv->eq_table.eq[MLX4_EQ_COMP_CPU0 + + priv->eq_table.num_comp_eqs]); + if (err) + goto err_out_comp; + + priv->eq_table.num_comp_eqs++; + req_eqs--; + } + priv->eq_table.last_comp_eq = 0; err = mlx4_create_eq(dev, MLX4_NUM_ASYNC_EQE + MLX4_NUM_SPARE_EQE, (dev->flags & MLX4_FLAG_MSI_X) ? MLX4_EQ_ASYNC : 0, @@ -587,11 +599,12 @@ int mlx4_init_eq_table(struct mlx4_dev *dev) if (dev->flags & MLX4_FLAG_MSI_X) { static const char *eq_name[] = { - [MLX4_EQ_COMP] = DRV_NAME " (comp)", + [MLX4_EQ_COMP_CPU0...MLX4_NUM_EQ] = "comp_" DRV_NAME, [MLX4_EQ_ASYNC] = DRV_NAME " (async)" }; - for (i = 0; i < MLX4_NUM_EQ; ++i) { + for (i = 0; i < MLX4_EQ_COMP_CPU0 + + priv->eq_table.num_comp_eqs; ++i) { err = request_irq(priv->eq_table.eq[i].irq, mlx4_msi_x_interrupt, 0, eq_name[i], priv->eq_table.eq + i); @@ -616,7 +629,7 @@ int mlx4_init_eq_table(struct mlx4_dev *dev) mlx4_warn(dev, "MAP_EQ for async EQ %d failed (%d)\n", priv->eq_table.eq[MLX4_EQ_ASYNC].eqn, err); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs; ++i) eq_set_ci(&priv->eq_table.eq[i], 1); return 0; @@ -625,9 +638,9 @@ err_out_async: mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_ASYNC]); err_out_comp: - mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_COMP]); + for (i = 0; i < priv->eq_table.num_comp_eqs; ++i) + mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_COMP_CPU0 + i]); -err_out_unmap: mlx4_unmap_clr_int(dev); mlx4_free_irqs(dev); @@ -646,7 +659,7 @@ void mlx4_cleanup_eq_table(struct mlx4_dev *dev) mlx4_free_irqs(dev); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs; ++i) mlx4_free_eq(dev, &priv->eq_table.eq[i]); mlx4_unmap_clr_int(dev); diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index e3fd4e9..aecb1f2 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -922,22 +922,24 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); struct msix_entry entries[MLX4_NUM_EQ]; + int needed_vectors = MLX4_EQ_COMP_CPU0 + num_online_cpus(); int err; int i; if (msi_x) { - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < needed_vectors; ++i) entries[i].entry = i; - err = pci_enable_msix(dev->pdev, entries, ARRAY_SIZE(entries)); + err = pci_enable_msix(dev->pdev, entries, needed_vectors); if (err) { if (err > 0) - mlx4_info(dev, "Only %d MSI-X vectors available, " - "not using MSI-X\n", err); + mlx4_info(dev, "Only %d MSI-X vectors " + "available, need %d. Not using MSI-X\n", + err, needed_vectors); goto no_msi; } - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < needed_vectors; ++i) priv->eq_table.eq[i].irq = entries[i].vector; dev->flags |= MLX4_FLAG_MSI_X; @@ -945,7 +947,7 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev) } no_msi: - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < needed_vectors; ++i) priv->eq_table.eq[i].irq = dev->pdev->irq; } diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index eff1c5a..2201a99 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -64,8 +64,8 @@ enum { enum { MLX4_EQ_ASYNC, - MLX4_EQ_COMP, - MLX4_NUM_EQ + MLX4_EQ_COMP_CPU0, + MLX4_NUM_EQ = MLX4_EQ_COMP_CPU0 + NR_CPUS }; enum { @@ -211,6 +211,8 @@ struct mlx4_eq_table { void __iomem *uar_map[(MLX4_NUM_EQ + 6) / 4]; u32 clr_mask; struct mlx4_eq eq[MLX4_NUM_EQ]; + int num_comp_eqs; + int last_comp_eq; u64 icm_virt; struct page *icm_page; dma_addr_t icm_dma; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 93c17aa..673462c 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -312,6 +312,7 @@ struct mlx4_cq { int arm_sn; int cqn; + int comp_eq_idx; atomic_t refcount; struct completion free; @@ -441,7 +442,7 @@ void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres, int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq, - int collapsed); + unsigned vector, int collapsed); void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq); int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base); -- 1.5.4 From yevgenyp at mellanox.co.il Tue Apr 22 07:12:10 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Tue, 22 Apr 2008 17:12:10 +0300 Subject: ***SPAM*** [ofa-general][PATCH] mlx4: Collapsed CQ support (MP support, Patch 9) Message-ID: <480DF23A.7090304@mellanox.co.il> >From 749a2b62acc505a9ab2437eddb4cdd45503183d0 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Tue, 22 Apr 2008 15:50:51 +0300 Subject: [PATCH] mlx4: Collapsed CQ support Changed cq creation API to support the creation of collapsed cqs. Signed-off-by: Yevgeny Petrilin --- drivers/infiniband/hw/mlx4/cq.c | 2 +- drivers/net/mlx4/cq.c | 4 +++- include/linux/mlx4/device.h | 3 ++- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 5e570bb..63daf52 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -221,7 +221,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector } err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, uar, - cq->db.dma, &cq->mcq); + cq->db.dma, &cq->mcq, 0); if (err) goto err_dbmap; diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c index caa5bcf..d893cc1 100644 --- a/drivers/net/mlx4/cq.c +++ b/drivers/net/mlx4/cq.c @@ -188,7 +188,8 @@ int mlx4_cq_resize(struct mlx4_dev *dev, struct mlx4_cq *cq, EXPORT_SYMBOL_GPL(mlx4_cq_resize); int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, - struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq) + struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq, + int collapsed) { struct mlx4_priv *priv = mlx4_priv(dev); struct mlx4_cq_table *cq_table = &priv->cq_table; @@ -224,6 +225,7 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, cq_context = mailbox->buf; memset(cq_context, 0, sizeof *cq_context); + cq_context->flags = cpu_to_be32(!!collapsed << 18); cq_context->logsize_usrpage = cpu_to_be32((ilog2(nent) << 24) | uar->index); cq_context->comp_eqn = priv->eq_table.eq[MLX4_EQ_COMP].eqn; cq_context->log_page_size = mtt->page_shift - MLX4_ICM_PAGE_SHIFT; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 4ca3a00..93c17aa 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -440,7 +440,8 @@ void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres, int size); int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, - struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq); + struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq, + int collapsed); void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq); int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base); -- 1.5.4 From hrosenstock at xsigo.com Tue Apr 22 07:17:22 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Tue, 22 Apr 2008 07:17:22 -0700 Subject: [ofa-general] ***SPAM*** Re: [ewg] OFED April 21 meeting summary In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com> Message-ID: <1208873842.18376.297.camel@hrosenstock-ws.xsigo.com> Hi Tziporet, On Tue, 2008-04-22 at 16:59 +0300, Tziporet Koren wrote: > OFED April 21 meeting summary about 1.3.1 plans and OFED 1.4 > development: > 2. OFED 1.4: > > Release features were presented at Sonoma (presentation > available at > http://www.openfabrics.org/archives/april2008sonoma.htm) > > IPv6: Woody is looking for resources to add IPv6 support to > the CMA. Hal noted that it will require a change in opensm > too. > > Xsigo Vnic & Vhba - Not clear if they will make it > > Kernel tree is under work at: > git://git.openfabrics.org/ofed_1_4/linux-2.6.git branch > ofed_kernel > We should try to get the kernel code to compile as soon as > possible so everybody will be able to contribute code. My notes also had: Reliable multicast was thought not to be able to make OFED 1.4 -- Hal From ronniz at mellanox.co.il Tue Apr 22 07:17:57 2008 From: ronniz at mellanox.co.il (Ronni Zimmermann) Date: Tue, 22 Apr 2008 17:17:57 +0300 Subject: [ofa-general] ***SPAM*** add device capabilities flag to indicate support in creation of UC QPs which are attached to a SRQ Message-ID: <6C2C79E72C305246B504CBA17B5500C903DA9BD6@mtlexch01.mtl.com> Hi, According to the IB spec release 1.2.1 (section 11-7.2-1.1), an HCA can support attachment of UC QPs to a SRQ. Since it's possible for an HCA to support SRQs without supporting attachment of UC QPs to them, I believe we need a new device capabilities flag to indicate whether or not the device supports this operation. Regards, Ronni. Ronni Zimmermann SW Verification Group Mellanox Technologies Ltd. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pyrrhictj2 at deutsche-hausbau.de Tue Apr 22 07:38:45 2008 From: pyrrhictj2 at deutsche-hausbau.de (Valerie Weber) Date: Tue, 22 Apr 2008 20:08:45 +0530 Subject: [ofa-general] Don't get ripped off my fakes anymore Message-ID: <01c8a4b4$ab5c7190$92a1c675@pyrrhictj2> L ru N vzg Hthe#1P bmv en pbg is En zh larg ktq eme irh nt P cm il ape ls Gain 3-4 in ul ch pj es in just months!100% HE qq RB de AL FOR tkv MULA100% SAFE and NO SIDE EFFE yan CTS100% GUA mqo RANT ems EED RE bb SU ys LTSOrder L wf N iy H now and start gaining in apf ch hwj es ... -------------- next part -------------- An HTML attachment was scrubbed... URL: From paulina.sn.lau at hkcsl.com Tue Apr 22 07:41:16 2008 From: paulina.sn.lau at hkcsl.com (EuroSoftware) Date: Tue, 22 Apr 2008 16:41:16 +0200 Subject: [ofa-general] Grosse Auswahl der Software zum runterladen Message-ID: <257016145.62069872251717@hkcsl.com> Ihre Software kommt ganz schnell ins Haus. Zahlen Sie und laden Sie es runter! Wir verkaufen Programme in allen europaeischen Sprachen, fuer Windows und fuer Macintosh. Wir verkaufen nur originale Vollversionen, aber sehr preiswert.Unseres Team von kompetenten Mitarbeitern kuemmert sich rasch um Ihr Anliegen! Wir antworten ganz schnell und es wird auch eine Geld-Zurueck-Garantie angeboten.Bei uns bekommen Sie nur perfekte Software http://fradeakino.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dada1 at cosmosbay.com Tue Apr 22 07:56:10 2008 From: dada1 at cosmosbay.com (Eric Dumazet) Date: Tue, 22 Apr 2008 16:56:10 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: References: Message-ID: <480DFC8A.8040105@cosmosbay.com> Andrea Arcangeli a écrit : > + > +static int mm_lock_cmp(const void *a, const void *b) > +{ > + cond_resched(); > + if ((unsigned long)*(spinlock_t **)a < > + (unsigned long)*(spinlock_t **)b) > + return -1; > + else if (a == b) > + return 0; > + else > + return 1; > +} > + This compare function looks unusual... It should work, but sort() could be faster if the if (a == b) test had a chance to be true eventually... static int mm_lock_cmp(const void *a, const void *b) { unsigned long la = (unsigned long)*(spinlock_t **)a; unsigned long lb = (unsigned long)*(spinlock_t **)b; cond_resched(); if (la < lb) return -1; if (la > lb) return 1; return 0; } From eddy at cchm.com Tue Apr 22 06:22:17 2008 From: eddy at cchm.com (Man's response) Date: Tue, 22 Apr 2008 13:22:17 +0000 Subject: [ofa-general] No Impotence Message-ID: <000a01c8a48a$02f20970$da69d9a3@htfeiwuf> Canadian heathcare - your pharmacy shop! Canadian Healthcare is your convenient, safe and private online source for approved pharmacy prescriptions. We sell exact generic equivalents of US FDA approved prescription drugs through our fully-licensed pharmacies. Visit Canadian healthcare! -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrea at qumranet.com Tue Apr 22 08:15:30 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 22 Apr 2008 17:15:30 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <480DFC8A.8040105@cosmosbay.com> References: <480DFC8A.8040105@cosmosbay.com> Message-ID: <20080422151529.GE24536@duo.random> On Tue, Apr 22, 2008 at 04:56:10PM +0200, Eric Dumazet wrote: > Andrea Arcangeli a écrit : >> + >> +static int mm_lock_cmp(const void *a, const void *b) >> +{ >> + cond_resched(); >> + if ((unsigned long)*(spinlock_t **)a < >> + (unsigned long)*(spinlock_t **)b) >> + return -1; >> + else if (a == b) >> + return 0; >> + else >> + return 1; >> +} >> + > This compare function looks unusual... > It should work, but sort() could be faster if the > if (a == b) test had a chance to be true eventually... Hmm, are you saying my mm_lock_cmp won't return 0 if a==b? > static int mm_lock_cmp(const void *a, const void *b) > { > unsigned long la = (unsigned long)*(spinlock_t **)a; > unsigned long lb = (unsigned long)*(spinlock_t **)b; > > cond_resched(); > if (la < lb) > return -1; > if (la > lb) > return 1; > return 0; > } If your intent is to use the assumption that there are going to be few equal entries, you should have used likely(la > lb) to signal it's rarely going to return zero or gcc is likely free to do whatever it wants with the above. Overall that function is such a slow path that this is going to be lost in the noise. My suggestion would be to defer microoptimizations like this after 1/12 will be applied to mainline. Thanks! From tbrgkbgsucqx at bodyrepairs.com Tue Apr 22 08:20:39 2008 From: tbrgkbgsucqx at bodyrepairs.com (Maurice Madrid) Date: Wed, 23 Apr 2008 00:20:39 +0900 Subject: [ofa-general] Re: Hi Brand watch for style Message-ID: <01c8a4d7$dba4e580$7a79f57d@tbrgkbgsucqx> Classic never dies. Visit us at http://theeshoebox.com From avi at qumranet.com Tue Apr 22 08:24:20 2008 From: avi at qumranet.com (Avi Kivity) Date: Tue, 22 Apr 2008 18:24:20 +0300 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080422151529.GE24536@duo.random> References: <480DFC8A.8040105@cosmosbay.com> <20080422151529.GE24536@duo.random> Message-ID: <480E0324.6050006@qumranet.com> Andrea Arcangeli wrote: > On Tue, Apr 22, 2008 at 04:56:10PM +0200, Eric Dumazet wrote: > >> Andrea Arcangeli a écrit : >> >>> + >>> +static int mm_lock_cmp(const void *a, const void *b) >>> +{ >>> + cond_resched(); >>> + if ((unsigned long)*(spinlock_t **)a < >>> + (unsigned long)*(spinlock_t **)b) >>> + return -1; >>> + else if (a == b) >>> + return 0; >>> + else >>> + return 1; >>> +} >>> + >>> >> This compare function looks unusual... >> It should work, but sort() could be faster if the >> if (a == b) test had a chance to be true eventually... >> > > Hmm, are you saying my mm_lock_cmp won't return 0 if a==b? > > You need to compare *a to *b (at least, that's what you're doing for the < case). -- error compiling committee.c: too many arguments to function From holt at sgi.com Tue Apr 22 08:26:00 2008 From: holt at sgi.com (Robin Holt) Date: Tue, 22 Apr 2008 10:26:00 -0500 Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12 In-Reply-To: <20080422134847.GT12709@duo.random> References: <20080409131709.GR11364@sgi.com> <20080409144401.GT10133@duo.random> <20080409185500.GT11364@sgi.com> <20080422072026.GM12709@duo.random> <20080422120056.GR12709@duo.random> <20080422130120.GR22493@sgi.com> <20080422132143.GS12709@duo.random> <20080422133604.GN30298@sgi.com> <20080422134847.GT12709@duo.random> Message-ID: <20080422152600.GP30298@sgi.com> Andrew, Could we get direction/guidance from you as regards the invalidate_page() callout of Andrea's patch set versus the invalidate_range_start/invalidate_range_end callout pairs of Christoph's patchset? This is only in the context of the __xip_unmap, do_wp_page, page_mkclean_one, and try_to_unmap_one call sites. On Tue, Apr 22, 2008 at 03:48:47PM +0200, Andrea Arcangeli wrote: > On Tue, Apr 22, 2008 at 08:36:04AM -0500, Robin Holt wrote: > > I am a little confused about the value of the seq_lock versus a simple > > atomic, but I assumed there is a reason and left it at that. > > There's no value for anything but get_user_pages (get_user_pages takes > its own lock internally though). I preferred to explain it as a > seqlock because it was simpler for reading, but I totally agree in the > final implementation it shouldn't be a seqlock. My code was meant to > be pseudo-code only. It doesn't even need to be atomic ;). Unless there is additional locking in your fault path, I think it does need to be atomic. > > I don't know what you mean by "it'd" run slower and what you mean by > > "armed and disarmed". > > 1) when armed the time-window where the kvm-page-fault would be > blocked would be a bit larger without invalidate_page for no good > reason But that is a distinction without a difference. In the _start/_end case, kvm's fault handler will not have any _DIRECT_ blocking, but get_user_pages() had certainly better block waiting for some other lock to prevent the process's pages being refaulted. I am no VM expert, but that seems like it is critical to having a consistent virtual address space. Effectively, you have a delay on the kvm fault handler beginning when either invalidate_page() is entered or invalidate_range_start() is entered until when the _CALLER_ of the invalidate* method has unlocked. That time will remain essentailly identical for either case. I would argue you would be hard pressed to even measure the difference. > 2) if you were to remove invalidate_page when disarmed the VM could > would need two branches instead of one in various places Those branches are conditional upon there being list entries. That check should be extremely cheap. The vast majority of cases will have no registered notifiers. The second check for the _end callout will be from cpu cache. > I don't want to waste cycles if not wasting them improves performance > both when armed and disarmed. In summary, I think we have narrowed down the case of no registered notifiers to being infinitesimal. The case of registered notifiers being a distinction without a difference. > > When I was discussing this difference with Jack, he reminded me that > > the GRU, due to its hardware, does not have any race issues with the > > invalidate_page callout simply doing the tlb shootdown and not modifying > > any of its internal structures. He then put a caveat on the discussion > > that _either_ method was acceptable as far as he was concerned. The real > > issue is getting a patch in that satisfies all needs and not whether > > there is a seperate invalidate_page callout. > > Sure, we have that patch now, I'll send it out in a minute, I was just > trying to explain why it makes sense to have an invalidate_page too > (which remains the only difference by now), removing it would be a > regression on all sides, even if a minor one. I think GRU is the only compelling case I have heard for having the invalidate_page seperate. In the case of the GRU, the hardware enforces a lifetime of the invalidate which covers all in-progress faults including ones where the hardware is informed after the flush of a PTE. in all cases, once the GRU invalidate instruction is issued, all active requests are invalidated. Future faults will be blocked in get_user_pages(). Without that special feature of the hardware, I don't think any code simplification exists. I, of course, reserve the right to be wrong. I believe the argument against a seperate invalidate_page() callout was Christoph's interpretation of Andrew's comments. I am not certain Andrew was aware of this special aspects of the GRU hardware and whether that had been factored into the discussion at that point in time. Thanks, Robin From saluteqrz5 at forum-institut.de Tue Apr 22 08:26:41 2008 From: saluteqrz5 at forum-institut.de (Moses Ouellette) Date: Tue, 22 Apr 2008 07:26:41 -0800 Subject: [ofa-general] You must be The Real Man with huge dignity Message-ID: <01c8a44a$355ece80$8f0f505c@saluteqrz5> Gain 3-4 in bx ch zd es in just mo ne nt apd hs!LLNNHHSOLUTIONSClick here!!! -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Tue Apr 22 08:01:51 2008 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 22 Apr 2008 08:01:51 -0700 Subject: ***SPAM*** Re: [ofa-general][PATCH] mlx4: Completion EQ per cpu (MP support, Patch 10) In-Reply-To: <480DF2A2.8030602@mellanox.co.il> Message-ID: Hello Yevgeny, Can you give more details of this patch? What's the relationship between CQ, EQ, port? I was thinking to implement it in upper layer. Is it better to implement in upper layer protocol, rather than device layer? thanks Shirley -------------- next part -------------- An HTML attachment was scrubbed... URL: From dada1 at cosmosbay.com Tue Apr 22 08:37:38 2008 From: dada1 at cosmosbay.com (Eric Dumazet) Date: Tue, 22 Apr 2008 17:37:38 +0200 Subject: [ofa-general] ***SPAM*** Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080422151529.GE24536@duo.random> References: <480DFC8A.8040105@cosmosbay.com> <20080422151529.GE24536@duo.random> Message-ID: <480E0642.6080109@cosmosbay.com> Andrea Arcangeli a écrit : > On Tue, Apr 22, 2008 at 04:56:10PM +0200, Eric Dumazet wrote: > >> Andrea Arcangeli a écrit : >> >>> + >>> +static int mm_lock_cmp(const void *a, const void *b) >>> +{ >>> + cond_resched(); >>> + if ((unsigned long)*(spinlock_t **)a < >>> + (unsigned long)*(spinlock_t **)b) >>> + return -1; >>> + else if (a == b) >>> + return 0; >>> + else >>> + return 1; >>> +} >>> + >>> >> This compare function looks unusual... >> It should work, but sort() could be faster if the >> if (a == b) test had a chance to be true eventually... >> > > Hmm, are you saying my mm_lock_cmp won't return 0 if a==b? > I am saying your intent was probably to test else if ((unsigned long)*(spinlock_t **)a == (unsigned long)*(spinlock_t **)b) return 0; Because a and b are pointers to the data you want to compare. You need to dereference them. >> static int mm_lock_cmp(const void *a, const void *b) >> { >> unsigned long la = (unsigned long)*(spinlock_t **)a; >> unsigned long lb = (unsigned long)*(spinlock_t **)b; >> >> cond_resched(); >> if (la < lb) >> return -1; >> if (la > lb) >> return 1; >> return 0; >> } >> > > If your intent is to use the assumption that there are going to be few > equal entries, you should have used likely(la > lb) to signal it's > rarely going to return zero or gcc is likely free to do whatever it > wants with the above. Overall that function is such a slow path that > this is going to be lost in the noise. My suggestion would be to defer > microoptimizations like this after 1/12 will be applied to mainline. > > Thanks! > > Hum, it's not a micro-optimization, but a bug fix. :) Sorry if it was not clear From rdreier at cisco.com Tue Apr 22 08:45:26 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 22 Apr 2008 08:45:26 -0700 Subject: [ofa-general] Problem with libibverbs and huge pages registration. In-Reply-To: <20080422111412.GH7771@minantech.com> (Gleb Natapov's message of "Tue, 22 Apr 2008 14:14:13 +0300") References: <20080421141441.GF7771@minantech.com> <20080422111412.GH7771@minantech.com> Message-ID: > I suppose "if" below depends on updated refcnt, so update can't be moved > down without changing the "if" statement. Yes, good point. And also I think we need to undo splitting/merging if we fail to do the operation. This all needs more care. From nicole at pacificahost.com Tue Apr 22 09:38:18 2008 From: nicole at pacificahost.com (Mason Pace) Date: Tue, 22 Apr 2008 18:38:18 +0200 Subject: [ofa-general] Wir machen Ihren grossen Schwanz viel groesser Message-ID: <01c8a4a8$08410900$518cca57@nicole> Probieren Sie doch unser Produkt, anstatt sich ueber die Groesse Ihres Schwanzes zu aergern. Es ist eine absolut sichere Methode der Vergroesserung, unglaubliche Ergebnisse lassen auf sich nicht lange warten und Sie werden ueberrascht sein, wie schnell und wie gross ihr Glied wird. http://frulleon.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrea at qumranet.com Tue Apr 22 09:46:15 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 22 Apr 2008 18:46:15 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <480E0642.6080109@cosmosbay.com> References: <480DFC8A.8040105@cosmosbay.com> <20080422151529.GE24536@duo.random> <480E0642.6080109@cosmosbay.com> Message-ID: <20080422164615.GG24536@duo.random> On Tue, Apr 22, 2008 at 05:37:38PM +0200, Eric Dumazet wrote: > I am saying your intent was probably to test > > else if ((unsigned long)*(spinlock_t **)a == > (unsigned long)*(spinlock_t **)b) > return 0; Indeed... > Hum, it's not a micro-optimization, but a bug fix. :) The good thing is that even if this bug would lead to a system crash, it would be still zero risk for everybody that isn't using KVM/GRU actively with mmu notifiers. The important thing is that this patch has zero risk to introduce regressions into the kernel, both when enabled and disabled, it's like a new driver. I'll shortly resend 1/12 and likely 12/12 for theoretical correctness. For now you can go ahead testing with this patch as it'll work fine despite of the bug (if it wasn't the case I would have noticed already ;). From opvoz at hij.jp Tue Apr 22 08:14:26 2008 From: opvoz at hij.jp (Watches) Date: Tue, 22 Apr 2008 15:14:26 +0000 Subject: [ofa-general] Rolex Watches Message-ID: <000701c8a49a$02a0b739$988aee8e@qauqbps> Replica Watches - cheap and really good solution! What is a replica watch and how is it different from the real watches? A replica watch is a watch made similar to that of the real brand ones, except, at a much lower cost. A real Rolex can go up to hundreds of thousands of dollars, but you can get a replica similar to that one, for only a few hundred dollars. This allows the normal everyday person to be able to look and feel classy, without having to actually spend such ridiculous amounts of money on it. Visit our replica watches shop! -------------- next part -------------- An HTML attachment was scrubbed... URL: From PHF at zurich.ibm.com Tue Apr 22 10:02:17 2008 From: PHF at zurich.ibm.com (Philip Frey1) Date: Tue, 22 Apr 2008 19:02:17 +0200 Subject: [ofa-general] CM ID Message-ID: I have realised that the verbs of the rdma_cm_id are only valid after a call to rdma_resolve_addr(). How can I create a memory region before connecting to the remote host? In order to create an ibv_mr, I need a protection domain (PD). For creating a PD, I need an ibv_context which I get from cm_id->verbs but they are only valid after resolving the address. So what would be the correct way to call ibv_alloc_pd() and ibv_reg_mr() before resolving the address which I might not yet know (especially on the server side). Many thanks, Philip -------------- next part -------------- An HTML attachment was scrubbed... URL: From MingChun-anyar at Taycon.co.nz Tue Apr 22 10:18:35 2008 From: MingChun-anyar at Taycon.co.nz (MingChun) Date: Tue, 22 Apr 2008 19:18:35 +0200 Subject: [ofa-general] =?iso-8859-1?q?Don=92t_hesitate_to_order_the_best_a?= =?iso-8859-1?q?nd_chipset_medical_goods=2E?= Message-ID: <1976BDB2.5B%MingChun-anyar@Taycon.co.nz> My family is secured. We buy all medicines in the best medical store! http://www.powereals.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue Apr 22 10:23:38 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 22 Apr 2008 10:23:38 -0700 Subject: [ofa-general] CM ID In-Reply-To: (Philip Frey1's message of "Tue, 22 Apr 2008 19:02:17 +0200") References: Message-ID: > I have realised that the verbs of the rdma_cm_id are only valid after a > call to rdma_resolve_addr(). > > How can I create a memory region before connecting to the remote host? > In order to create an ibv_mr, I need a protection domain (PD). > For creating a PD, I need an ibv_context which I get from cm_id->verbs but > they are only valid after > resolving the address. > > So what would be the correct way to call ibv_alloc_pd() and ibv_reg_mr() > before resolving the address > which I might not yet know (especially on the server side). It doesn't really make sense to use any verbs before you have resolved the address, because you don't know which device will be used until the address is used. - R. From rdreier at cisco.com Tue Apr 22 10:24:37 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 22 Apr 2008 10:24:37 -0700 Subject: [ofa-general] Re: add device capabilities flag to indicate support in creation of UC QPs which are attached to a SRQ In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903DA9BD6@mtlexch01.mtl.com> (Ronni Zimmermann's message of "Tue, 22 Apr 2008 17:17:57 +0300") References: <6C2C79E72C305246B504CBA17B5500C903DA9BD6@mtlexch01.mtl.com> Message-ID: > According to the IB spec release 1.2.1 (section 11-7.2-1.1), an HCA can > support attachment of UC QPs to a SRQ. > Since it's possible for an HCA to support SRQs without supporting > attachment of UC QPs to them, I believe we need a new device > capabilities flag to indicate whether or not the device supports this > operation. OK I guess, although we seem to be using up device capability flags at an alarming rate. I guess in the not-too-distant future we'll have to extend the API to allow more capabilities. - R. From clemons at basswoodpartners.com Tue Apr 22 10:42:30 2008 From: clemons at basswoodpartners.com (Ollie Mclean) Date: Tue, 22 Apr 2008 18:42:30 +0100 Subject: [ofa-general] Ueberaschen Sie Ihre geliebte mit Ihrem Glied Message-ID: <01c8a4a8$9e752f00$bfde9ac1@clemons> Werden Sie mit Ihrem Umfang gluecklich und seien Sie damit gluecklich und zufrieden. Unser Produkt hat schon millionen von Maennern geholfen, die Penisgroesse zu bekommen, von der Sie immer getraeumt haben. Lassen Sie sich ueberraschen, Sie werden garantiert gluecklich werden.http://frulleon.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.hefty at intel.com Tue Apr 22 11:17:11 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 22 Apr 2008 11:17:11 -0700 Subject: [ofa-general] beginner resources In-Reply-To: <6978b4af0804220309t1ae34185y83ba69f9bbfa309b@mail.gmail.com> References: <6978b4af0804220309t1ae34185y83ba69f9bbfa309b@mail.gmail.com> Message-ID: <000101c8a4a5$16208c10$40fc070a@amr.corp.intel.com> > is this the right list to ask totally beginner questions > (even code snippets) or is there any other resource for this matter? Beginner questions are fine. But you may be directed to a spec, RFC, man page, etc. Code examples are available with the userspace libraries (libibverbs, librdmacm) that may help. The libraries also provide man pages for the various APIs. - Sean From holt at sgi.com Tue Apr 22 11:22:13 2008 From: holt at sgi.com (Robin Holt) Date: Tue, 22 Apr 2008 13:22:13 -0500 Subject: [ofa-general] Re: [PATCH 00 of 12] mmu notifier #v13 In-Reply-To: References: Message-ID: <20080422182213.GS22493@sgi.com> I believe the differences between your patch set and Christoph's need to be understood and a compromise approach agreed upon. Those differences, as I understand them, are: 1) invalidate_page: You retain an invalidate_page() callout. I believe we have progressed that discussion to the point that it requires some direction for Andrew, Linus, or somebody in authority. The basics of the difference distill down to no expected significant performance difference between the two. The invalidate_page() callout potentially can simplify GRU code. It does provide a more complex api for the users of mmu_notifier which, IIRC, Christoph had interpretted from one of Andrew's earlier comments as being undesirable. I vaguely recall that sentiment as having been expressed. 2) Range callout names: Your range callouts are invalidate_range_start and invalidate_range_end whereas Christoph's are start and end. I do not believe this has been discussed in great detail. I know I have expressed a preference for your names. I admit to having failed to follow up on this issue. I certainly believe we could come to an agreement quickly if pressed. 3) The structure of the patch set: Christoph's upcoming release orders the patches so the prerequisite patches are seperately reviewable and each file is only touched by a single patch. Additionally, that allows mmu_notifiers to be introduced as a single patch with sleeping functionality from its inception and an API which remains unchanged. Your patch set, however, introduces one API, then turns around and changes that API. Again, the desire to make it an unchanging API was expressed by, IIRC, Andrew. This does represent a risk to XPMEM as the non-sleeping API may become entrenched and make acceptance of the sleeping version less acceptable. Can we agree upon this list of issues? Thank you, Robin Holt From sean.hefty at intel.com Tue Apr 22 11:25:10 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 22 Apr 2008 11:25:10 -0700 Subject: [ofa-general][PATCH] mlx4: Prereserved Qp regions (MP support, Patch4) In-Reply-To: <480D8803.1050404@mellanox.co.il> References: <480D8803.1050404@mellanox.co.il> Message-ID: <000201c8a4a6$334addd0$40fc070a@amr.corp.intel.com> >We reserve Qp ranges to be used by other modules in case >the ports come up as Ethernet ports. >The qps are reserved at the end of the QP table. >(This way we assure that they are alligned to their size) Can you explain this in more detail? What are the 'other modules'? Are you reserving specific QP numbers? Are the QPs only reserved when running over Ethernet? Why is this done/needed exactly? I don't really understand the alignment comment, but that's a separate issue for me. - Sean From sean.hefty at intel.com Tue Apr 22 11:27:18 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 22 Apr 2008 11:27:18 -0700 Subject: ***SPAM*** [ofa-general][PATCH] mlx4: Collapsed CQ support (MPsupport, Patch 9) In-Reply-To: <480DF23A.7090304@mellanox.co.il> References: <480DF23A.7090304@mellanox.co.il> Message-ID: <000301c8a4a6$7fb9d270$40fc070a@amr.corp.intel.com> >Changed cq creation API to support the creation of collapsed cqs. What is a 'collapsed cq'? (mayb you explained this in a different part of the patch series that I haven't looked at yet...) - Sean From granger at fcb-online.com Tue Apr 22 11:30:31 2008 From: granger at fcb-online.com (Charlotte Lacy) Date: Tue, 22 Apr 2008 10:30:31 -0800 Subject: [ofa-general] Lesen Sie genau und Sie bekommen grosses Glied Message-ID: <01c8a463$e3c33d80$53c29079@granger> Werden Sie mit Ihrem Umfang gluecklich und seien Sie damit gluecklich und zufrieden. Unser Markenprodukt ist schon weltweit dafuer bekannt, dass er die besten Erfolge in kuerzester Zeit erzielt und hat schon das Vertrauen von Millionen Kunden weltweit gewonnen. Es gibt keinen Grund, warum auch Sie das nicht probieren koennenhttp://frulleon.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From holleredmna at steelcase-werndl.de Tue Apr 22 11:39:09 2008 From: holleredmna at steelcase-werndl.de (Sabrina Flores) Date: Tue, 22 Apr 2008 19:39:09 +0100 Subject: [ofa-general] Give your partner new feelings while have a sex Message-ID: <01c8a4b0$88abd0e0$aaeb0f53@holleredmna> L zor N at H So fiw lu gly tio lh nsGa ti in 3-4 In qbz ch gqg es in just months! CLI zny CK HE zlo RE!!! -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrea at qumranet.com Tue Apr 22 11:43:35 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 22 Apr 2008 20:43:35 +0200 Subject: [ofa-general] Re: [PATCH 00 of 12] mmu notifier #v13 In-Reply-To: <20080422182213.GS22493@sgi.com> References: <20080422182213.GS22493@sgi.com> Message-ID: <20080422184335.GN24536@duo.random> On Tue, Apr 22, 2008 at 01:22:13PM -0500, Robin Holt wrote: > 1) invalidate_page: You retain an invalidate_page() callout. I believe > we have progressed that discussion to the point that it requires some > direction for Andrew, Linus, or somebody in authority. The basics > of the difference distill down to no expected significant performance > difference between the two. The invalidate_page() callout potentially > can simplify GRU code. It does provide a more complex api for the > users of mmu_notifier which, IIRC, Christoph had interpretted from one > of Andrew's earlier comments as being undesirable. I vaguely recall > that sentiment as having been expressed. invalidate_page as demonstrated in KVM pseudocode doesn't change the locking requirements, and it has the benefit of reducing the window of time the secondary page fault has to be masked and at the same time _halves_ the number of _hooks_ in the VM every time the VM deal with single pages (example: do_wp_page hot path). As long as we can't fully converge because of point 3, it'd rather keep invalidate_page to be better. But that's by far not a priority to keep. > 2) Range callout names: Your range callouts are invalidate_range_start > and invalidate_range_end whereas Christoph's are start and end. I do not > believe this has been discussed in great detail. I know I have expressed > a preference for your names. I admit to having failed to follow up on > this issue. I certainly believe we could come to an agreement quickly > if pressed. I think using ->start ->end is a mistake, think when we later add mprotect_range_start/end. Here too I keep the better names only because we can't converge on point 3 (the API will eventually change, like every other kernel interal API, even core things like __free_page have been mostly obsoleted). > 3) The structure of the patch set: Christoph's upcoming release orders > the patches so the prerequisite patches are seperately reviewable > and each file is only touched by a single patch. Additionally, that Each file touched by a single patch? I doubt... The split is about the same, the main difference is the merge ordering, I always had the zero risk part at the head, he moved it at the tail when he incorporated #v12 into his patchset. > allows mmu_notifiers to be introduced as a single patch with sleeping > functionality from its inception and an API which remains unchanged. > Your patch set, however, introduces one API, then turns around and > changes that API. Again, the desire to make it an unchanging API was > expressed by, IIRC, Andrew. This does represent a risk to XPMEM as > the non-sleeping API may become entrenched and make acceptance of the > sleeping version less acceptable. > > Can we agree upon this list of issues? This is a kernel internal API, so it will definitely change over time. It's nothing close to a syscall. Also note: the API is obviously defined in mmu_notifier.h and none of the 2-12 patches touches mmu_notifier.h. So the extension of the method semantics is 100% backwards compatible. My patch order and API backward compatible extension over the patchset is done to allow 2.6.26 to fully support KVM/GRU and 2.6.27 to support XPMEM as well. KVM/GRU won't notice any difference once the support for XPMEM is added, but even if the API would completely change in 2.6.27, that's still better than no functionality at all in 2.6.26. From tziporet at dev.mellanox.co.il Tue Apr 22 11:53:26 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Tue, 22 Apr 2008 21:53:26 +0300 Subject: ***SPAM*** [ofa-general][PATCH] mlx4: Collapsed CQ support (MPsupport, Patch 9) In-Reply-To: <000301c8a4a6$7fb9d270$40fc070a@amr.corp.intel.com> References: <480DF23A.7090304@mellanox.co.il> <000301c8a4a6$7fb9d270$40fc070a@amr.corp.intel.com> Message-ID: <480E3426.5060907@mellanox.co.il> Sean Hefty wrote: >> Changed cq creation API to support the creation of collapsed cqs. >> > > What is a 'collapsed cq'? (mayb you explained this in a different part of the > patch series that I haven't looked at yet...) > > Collapsed CQ is a HW feature of ConnectX. If you have ConnectX PRM you can read more details about it. Tziporet From rdreier at cisco.com Tue Apr 22 11:55:44 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 22 Apr 2008 11:55:44 -0700 Subject: [ofa-general] [PATCH/RFC] RDMA/nes: Use print_mac() to format ethernet addresses for printing Message-ID: Removing open-coded MAC formats shrinks the source and the generated code too, eg on x86-64: add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-103 (-103) function old new delta make_cm_node 932 912 -20 nes_netdev_set_mac_address 427 406 -21 nes_netdev_set_multicast_list 1148 1124 -24 nes_probe 2349 2311 -38 Signed-off-by: Roland Dreier --- drivers/infiniband/hw/nes/nes.c | 10 ++++------ drivers/infiniband/hw/nes/nes_cm.c | 8 +++----- drivers/infiniband/hw/nes/nes_nic.c | 18 ++++++++---------- 3 files changed, 15 insertions(+), 21 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes.c b/drivers/infiniband/hw/nes/nes.c index b046262..c0671ad 100644 --- a/drivers/infiniband/hw/nes/nes.c +++ b/drivers/infiniband/hw/nes/nes.c @@ -353,13 +353,11 @@ struct ib_qp *nes_get_qp(struct ib_device *device, int qpn) */ static void nes_print_macaddr(struct net_device *netdev) { - nes_debug(NES_DBG_INIT, "%s: MAC %02X:%02X:%02X:%02X:%02X:%02X, IRQ %u\n", - netdev->name, - netdev->dev_addr[0], netdev->dev_addr[1], netdev->dev_addr[2], - netdev->dev_addr[3], netdev->dev_addr[4], netdev->dev_addr[5], - netdev->irq); -} + DECLARE_MAC_BUF(mac); + nes_debug(NES_DBG_INIT, "%s: %s, IRQ %u\n", + netdev->name, print_mac(mac, netdev->dev_addr), netdev->irq); +} /** * nes_interrupt - handle interrupts diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index d073862..b53bceb 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -1054,6 +1054,7 @@ static struct nes_cm_node *make_cm_node(struct nes_cm_core *cm_core, int arpindex = 0; struct nes_device *nesdev; struct nes_adapter *nesadapter; + DECLARE_MAC_BUF(mac); /* create an hte and cm_node for this instance */ cm_node = kzalloc(sizeof(*cm_node), GFP_ATOMIC); @@ -1116,11 +1117,8 @@ static struct nes_cm_node *make_cm_node(struct nes_cm_core *cm_core, /* copy the mac addr to node context */ memcpy(cm_node->rem_mac, nesadapter->arp_table[arpindex].mac_addr, ETH_ALEN); - nes_debug(NES_DBG_CM, "Remote mac addr from arp table:%02x," - " %02x, %02x, %02x, %02x, %02x\n", - cm_node->rem_mac[0], cm_node->rem_mac[1], - cm_node->rem_mac[2], cm_node->rem_mac[3], - cm_node->rem_mac[4], cm_node->rem_mac[5]); + nes_debug(NES_DBG_CM, "Remote mac addr from arp table: %s\n", + print_mac(mac, cm_node->rem_mac)); add_hte_node(cm_core, cm_node); atomic_inc(&cm_nodes_created); diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c index 01cd0ef..e5366b0 100644 --- a/drivers/infiniband/hw/nes/nes_nic.c +++ b/drivers/infiniband/hw/nes/nes_nic.c @@ -787,16 +787,14 @@ static int nes_netdev_set_mac_address(struct net_device *netdev, void *p) int i; u32 macaddr_low; u16 macaddr_high; + DECLARE_MAC_BUF(mac); if (!is_valid_ether_addr(mac_addr->sa_data)) return -EADDRNOTAVAIL; memcpy(netdev->dev_addr, mac_addr->sa_data, netdev->addr_len); - printk(PFX "%s: Address length = %d, Address = %02X%02X%02X%02X%02X%02X..\n", - __func__, netdev->addr_len, - mac_addr->sa_data[0], mac_addr->sa_data[1], - mac_addr->sa_data[2], mac_addr->sa_data[3], - mac_addr->sa_data[4], mac_addr->sa_data[5]); + printk(PFX "%s: Address length = %d, Address = %s\n", + __func__, netdev->addr_len, print_mac(mac, mac_addr->sa_data)); macaddr_high = ((u16)netdev->dev_addr[0]) << 8; macaddr_high += (u16)netdev->dev_addr[1]; macaddr_low = ((u32)netdev->dev_addr[2]) << 24; @@ -878,11 +876,11 @@ static void nes_netdev_set_multicast_list(struct net_device *netdev) if (mc_nic_index < 0) mc_nic_index = nesvnic->nic_index; if (multicast_addr) { - nes_debug(NES_DBG_NIC_RX, "Assigning MC Address = %02X%02X%02X%02X%02X%02X to register 0x%04X nic_idx=%d\n", - multicast_addr->dmi_addr[0], multicast_addr->dmi_addr[1], - multicast_addr->dmi_addr[2], multicast_addr->dmi_addr[3], - multicast_addr->dmi_addr[4], multicast_addr->dmi_addr[5], - perfect_filter_register_address+(mc_index * 8), mc_nic_index); + DECLARE_MAC_BUF(mac); + nes_debug(NES_DBG_NIC_RX, "Assigning MC Address %s to register 0x%04X nic_idx=%d\n", + print_mac(mac, multicast_addr->dmi_addr), + perfect_filter_register_address+(mc_index * 8), + mc_nic_index); macaddr_high = ((u16)multicast_addr->dmi_addr[0]) << 8; macaddr_high += (u16)multicast_addr->dmi_addr[1]; macaddr_low = ((u32)multicast_addr->dmi_addr[2]) << 24; -- 1.5.5.1 From sean.hefty at intel.com Tue Apr 22 12:17:13 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 22 Apr 2008 12:17:13 -0700 Subject: [Fwd: [ofa-general] More responder_resources problems] In-Reply-To: <1208888819.689.38.camel@hrosenstock-ws.xsigo.com> References: <1208888819.689.38.camel@hrosenstock-ws.xsigo.com> Message-ID: <000801c8a4ad$791313d0$40fc070a@amr.corp.intel.com> >Just wanted to be sure you saw this posting from Jason :-) If you >haven't had time to get to it, that's fine but wanted to make sure it >didn't get lost in the email as I've seen messages dropped... Sorry for >the noise. Thanks - I never saw it. >So my expectation on how the spec outlines this should work is that >the requesting side does essentially: > ibv_query_device(verbs,&devAttr); > req.responder_resources = devAttr.max_qp_rd_atom; > req.initiator_depth = devAttr.max_qp_init_rd_atom; > >When making the req (assuming it wants the maximum). > >The passive side should then take req.initiator_depth, limit it to its >devAttr.max_qp_rd_atom (and layer a client limit on top of that) and >assign it to max_dest_rd_atomic on its QP, and also assign it to >rep.responder_resources. > >Next, the passive side should take req.responder_resources, limit it >to devAttr.max_qp_init_rd_atom (and again layer a client limit on top of >that), and assign it to max_rd_atomic on its QP, and return it in >rep.initiator_depth. > >The active side should, generally, use the form above and use the >values in the rep to program its max_rd_atomic and max_dest_rd_atomic. > >I can't find any of this in any of the cm libraries - and this is the >sort of thing I was expecting to find in kernel cm.c, since other than >letting the client on the passive side specify lower limits there >really isn't much latitude here. The initiator_depth and responder_resources are control by the CM ULP, and are specified when calling send_cm_req / send_cm_rep. The exchanged values are reported through the req_event/rep_event parameters. The behavior that you're describing is done by the kernel cm. Look in ib_send_cm_req / ib_send_cm_rep / cm_req_handler / cm_rep_handler. >The particular change you introduced to support DAPL strikes me as >just strange, overriding the incoming initator_depth with the passive >side's responder_resources choice and then not returing that change in >the rep makes no sense to me at all and could cause a slow down since >the two ends are now mismatched. The active side initiator_depth and responder_resources are set by the active side when calling ib_send_cm_req. The passive side initializes its values to the data carried in the REQ. When the passive sides sends a REP, it is allowed to reduce the values. The CM adjusts both the passive and active side values based on the data in the REP. Mismatched ends end up with the connection being broken. >(Assuming that max_dest_rd_atomic corrisponds to responder resources >and that max_rd_atomic corrisponds to initiator depth as discussed in This is correct. - Sean From synergisticl340 at hvsv-vs-kassel.de Tue Apr 22 12:35:51 2008 From: synergisticl340 at hvsv-vs-kassel.de (Valerie Bonner) Date: Wed, 23 Apr 2008 04:35:51 +0900 Subject: [ofa-general] Extend your possibilities in your private life Message-ID: <01c8a4fb$824ead80$8c598c79@synergisticl340> Watch out for all the scams, L vnt N wz H Ma nk xd ivm ik pi voj ll zps s are the only real deal, plus they are offering a b xn ig sa jq le right now! L utp N fsu H Ma omz xd wh ik p of il st ls help users:Shoot like a p db or hc nst lvd arBeef up your s ux iz ra eI pc ncr sw ea ywe se l ui en wcm gt xf h and widthIn fd cr yyt ea acs se s shr ex pkv ua ye l sta nss mi po naHa ftu rd jbl er e doy rec crv tio wof ns, she will feel it! Li mk nk he eg re -------------- next part -------------- An HTML attachment was scrubbed... URL: From holt at sgi.com Tue Apr 22 12:42:23 2008 From: holt at sgi.com (Robin Holt) Date: Tue, 22 Apr 2008 14:42:23 -0500 Subject: [ofa-general] Re: [PATCH 00 of 12] mmu notifier #v13 In-Reply-To: <20080422184335.GN24536@duo.random> References: <20080422182213.GS22493@sgi.com> <20080422184335.GN24536@duo.random> Message-ID: <20080422194223.GT22493@sgi.com> On Tue, Apr 22, 2008 at 08:43:35PM +0200, Andrea Arcangeli wrote: > On Tue, Apr 22, 2008 at 01:22:13PM -0500, Robin Holt wrote: > > 1) invalidate_page: You retain an invalidate_page() callout. I believe > > we have progressed that discussion to the point that it requires some > > direction for Andrew, Linus, or somebody in authority. The basics > > of the difference distill down to no expected significant performance > > difference between the two. The invalidate_page() callout potentially > > can simplify GRU code. It does provide a more complex api for the > > users of mmu_notifier which, IIRC, Christoph had interpretted from one > > of Andrew's earlier comments as being undesirable. I vaguely recall > > that sentiment as having been expressed. > > invalidate_page as demonstrated in KVM pseudocode doesn't change the > locking requirements, and it has the benefit of reducing the window of > time the secondary page fault has to be masked and at the same time > _halves_ the number of _hooks_ in the VM every time the VM deal with > single pages (example: do_wp_page hot path). As long as we can't fully > converge because of point 3, it'd rather keep invalidate_page to be > better. But that's by far not a priority to keep. Christoph, Jack and I just discussed invalidate_page(). I don't think the point Andrew was making is that compelling in this circumstance. The code has change fairly remarkably. Would you have any objection to putting it back into your patch/agreeing to it remaining in Andrea's patch? If not, I think we can put this issue aside until Andrew gets out of the merge window and can decide it. Either way, the patches become much more similar with this in. Thanks, Robin From swise at opengridcomputing.com Tue Apr 22 13:00:00 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 22 Apr 2008 15:00:00 -0500 Subject: [ofa-general] Agenda for the OFED meeting today In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903D375E4@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C903D375E4@mtlexch01.mtl.com> Message-ID: <480E43C0.6080107@opengridcomputing.com> An HTML attachment was scrubbed... URL: From clameter at sgi.com Tue Apr 22 13:19:29 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 22 Apr 2008 13:19:29 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: References: Message-ID: Thanks for adding most of my enhancements. But 1. There is no real need for invalidate_page(). Can be done with invalidate_start/end. Needlessly complicates the API. One of the objections by Andrew was that there mere multiple callbacks that perform similar functions. 2. The locks that are used are later changed to semaphores. This is f.e. true for mm_lock / mm_unlock. The diffs will be smaller if the lock conversion is done first and then mm_lock is introduced. The way the patches are structured means that reviewers cannot review the final version of mm_lock etc etc. The lock conversion needs to come first. 3. As noted by Eric and also contained in private post from yesterday by me: The cmp function needs to retrieve the value before doing comparisons which is not done for the == of a and b. From clameter at sgi.com Tue Apr 22 13:22:55 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 22 Apr 2008 13:22:55 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 02 of 12] Fix ia64 compilation failure because of common code include bug In-Reply-To: <3c804dca25b15017b220.1208872278@duo.random> References: <3c804dca25b15017b220.1208872278@duo.random> Message-ID: Looks like this is not complete. There are numerous .h files missing which means that various structs are undefined (fs.h and rmap.h are needed f.e.) which leads to surprises when dereferencing fields of these struct. It seems that mm_types.h is expected to be included only in certain contexts. Could you make sure to include all necessary .h files? Or add some docs to clarify the situation here. From clameter at sgi.com Tue Apr 22 13:23:16 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 22 Apr 2008 13:23:16 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 03 of 12] get_task_mm should not succeed if mmput() is running and has reduced In-Reply-To: References: Message-ID: Missing signoff by you. From clameter at sgi.com Tue Apr 22 13:24:21 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 22 Apr 2008 13:24:21 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 04 of 12] Moves all mmu notifier methods outside the PT lock (first and not last In-Reply-To: References: Message-ID: Reverts a part of an earlier patch. Why isnt this merged into 1 of 12? From clameter at sgi.com Tue Apr 22 13:25:09 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 22 Apr 2008 13:25:09 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 05 of 12] Move the tlb flushing into free_pgtables. The conversion of the locks In-Reply-To: References: Message-ID: Why are the subjects all screwed up? They are the first line of the description instead of the subject line of my patches. From clameter at sgi.com Tue Apr 22 13:26:13 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 22 Apr 2008 13:26:13 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 10 of 12] Convert mm_lock to use semaphores after i_mmap_lock and anon_vma_lock In-Reply-To: References: Message-ID: Doing the right patch ordering would have avoided this patch and allow better review. From clameter at sgi.com Tue Apr 22 13:28:25 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 22 Apr 2008 13:28:25 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 00 of 12] mmu notifier #v13 In-Reply-To: <20080422184335.GN24536@duo.random> References: <20080422182213.GS22493@sgi.com> <20080422184335.GN24536@duo.random> Message-ID: On Tue, 22 Apr 2008, Andrea Arcangeli wrote: > My patch order and API backward compatible extension over the patchset > is done to allow 2.6.26 to fully support KVM/GRU and 2.6.27 to support > XPMEM as well. KVM/GRU won't notice any difference once the support > for XPMEM is added, but even if the API would completely change in > 2.6.27, that's still better than no functionality at all in 2.6.26. Please redo the patchset with the right order. To my knowledge there is no chance of this getting merged for 2.6.26. From clameter at sgi.com Tue Apr 22 13:30:53 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 22 Apr 2008 13:30:53 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 00 of 12] mmu notifier #v13 In-Reply-To: <20080422194223.GT22493@sgi.com> References: <20080422182213.GS22493@sgi.com> <20080422184335.GN24536@duo.random> <20080422194223.GT22493@sgi.com> Message-ID: On Tue, 22 Apr 2008, Robin Holt wrote: > putting it back into your patch/agreeing to it remaining in Andrea's > patch? If not, I think we can put this issue aside until Andrew gets > out of the merge window and can decide it. Either way, the patches > become much more similar with this in. One solution would be to separate the invalidate_page() callout into a patch at the very end that can be omitted. AFACIT There is no compelling reason to have this callback and it complicates the API for the device driver writers. Not having this callback makes the way that mmu notifiers are called from the VM uniform which is a desirable goal. From holt at sgi.com Tue Apr 22 13:31:14 2008 From: holt at sgi.com (Robin Holt) Date: Tue, 22 Apr 2008 15:31:14 -0500 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: References: Message-ID: <20080422203114.GQ30298@sgi.com> On Tue, Apr 22, 2008 at 01:19:29PM -0700, Christoph Lameter wrote: > Thanks for adding most of my enhancements. But > > 1. There is no real need for invalidate_page(). Can be done with > invalidate_start/end. Needlessly complicates the API. One > of the objections by Andrew was that there mere multiple > callbacks that perform similar functions. While I agree with that reading of Andrew's email about invalidate_page, I think the GRU hardware makes a strong enough case to justify the two seperate callouts. Due to the GRU hardware, we can assure that invalidate_page terminates all pending GRU faults (that includes faults that are just beginning) and can therefore be completed without needing any locking. The invalidate_page() callout gets turned into a GRU flush instruction and we return. Because the invalidate_range_start() leaves the page table information available, we can not use a single page _start to mimick that functionality. Therefore, there is a documented case justifying the seperate callouts. I agree the case is fairly weak, but it does exist. Given Andrea's unwillingness to move and Jack's documented case, it is my opinion the most likely compromise is to leave in the invalidate_page() callout. Thanks, Robin From veos at bpierce.com Tue Apr 22 13:40:06 2008 From: veos at bpierce.com (Seth Gallegos) Date: Tue, 22 Apr 2008 22:40:06 +0200 Subject: [ofa-general] Re: From me scent of love Message-ID: <064209810.18352688586832@bpierce.com> Don't get left behind, smell like a winner! Get all girls around! http://www.icnha.net/r/ From rdreier at cisco.com Tue Apr 22 13:46:38 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 22 Apr 2008 13:46:38 -0700 Subject: [ofa-general] Re: [PATCH] IPoIB 4K MTU support In-Reply-To: <1208681551.5271.11.camel@localhost.localdomain> (Shirley Ma's message of "Sun, 20 Apr 2008 01:52:31 -0700") References: <1208681551.5271.11.camel@localhost.localdomain> Message-ID: Thanks, applied with some cleanups as below. As an aside, in the case where we need to use a fragment in the receive skb, does it make sense to make the initial linear part bigger so the TCP and IP headers fit there (and the kernel doesn't have to look into the fragment list to handle the packet)? Also, is there any clean way where a kernel with PAGE_SIZE > 4096 can have ud_need_sg evaluate to 0 at compile time, so that all the unneeded code can be thrown out by the compiler? > + return (IPOIB_UD_BUF_SIZE(ib_mtu) > PAGE_SIZE) ? 1 : 0; I've never understood this style: it makes no sense to do return bool ? 1 : 0; instead of just return bool; > +static inline void ipoib_ud_dma_unmap_rx(struct ipoib_dev_priv *priv, > + u64 mapping[IPOIB_UD_RX_SG]) > +{ > + if (ipoib_ud_need_sg(priv->max_ib_mtu)) { > + ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_UD_HEAD_SIZE, DMA_FROM_DEVICE); > + ib_dma_unmap_page(priv->ca, mapping[1], PAGE_SIZE, DMA_FROM_DEVICE); > + } else > + ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_UD_BUF_SIZE(priv->max_ib_mtu), DMA_FROM_DEVICE); > +} > + > +static inline void ipoib_ud_skb_put_frags(struct ipoib_dev_priv *priv, > + struct sk_buff *skb, > + unsigned int length) > +{ > + if (ipoib_ud_need_sg(priv->max_ib_mtu)) { > + skb_frag_t *frag = &skb_shinfo(skb)->frags[0]; > + /* > + * There is only two buffers needed for max_payload = 4K, > + * first buf size is IPOIB_UD_HEAD_SIZE > + */ > + skb->tail += IPOIB_UD_HEAD_SIZE; > + frag->size = length - IPOIB_UD_HEAD_SIZE; > + skb->data_len += frag->size; > + skb->truesize += frag->size; > + skb->len += length; > + } else > + skb_put(skb, length); > + > +} These are pretty big to put in a header file as inlines... I moved them to the only .c file where they're used. - R. From rdreier at cisco.com Tue Apr 22 13:55:11 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 22 Apr 2008 13:55:11 -0700 Subject: [ofa-general] Re: [PATCH 5/5] IB/ehca: Bump version number to 0026 In-Reply-To: <200804211008.17023.fenkes@de.ibm.com> (Joachim Fenkes's message of "Mon, 21 Apr 2008 09:08:16 +0100") References: <200804211003.10695.fenkes@de.ibm.com> <200804211008.17023.fenkes@de.ibm.com> Message-ID: thanks, applied all 5. From jgunthorpe at obsidianresearch.com Tue Apr 22 14:00:49 2008 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 22 Apr 2008 15:00:49 -0600 Subject: [Fwd: [ofa-general] More responder_resources problems] In-Reply-To: <000801c8a4ad$791313d0$40fc070a@amr.corp.intel.com> References: <1208888819.689.38.camel@hrosenstock-ws.xsigo.com> <000801c8a4ad$791313d0$40fc070a@amr.corp.intel.com> Message-ID: <20080422210049.GA17925@obsidianresearch.com> On Tue, Apr 22, 2008 at 12:17:13PM -0700, Sean Hefty wrote: > >I can't find any of this in any of the cm libraries - and this is the > >sort of thing I was expecting to find in kernel cm.c, since other than > >letting the client on the passive side specify lower limits there > >really isn't much latitude here. > The initiator_depth and responder_resources are control by the CM > ULP, and are specified when calling send_cm_req / send_cm_rep. The > exchanged values are reported through the req_event/rep_event > parameters. Yes, but the actual programming of the values into the QP is done by cm_init_qp_rtr_attr/cm_init_qp_rts_attr (well, in many cases) - which takes the values from the rep/req directly, without modification. Look at for instance the entire stack, none of SRP, ISER or IPOIB touch max_*_rd_atomic, they all rely on cm_init_*_attr to set them properly. I guess these are not entierly good examples since they are generally not acting as the passive side (I don't have the target patches for SRP/ISER handy..) There is a bug here, it just isn't really obvious to me where the fixes should go to match the CM design. I was imagining that cm.c would adjust the REQ after reception, but there may be some downsides to that? > The behavior that you're describing is done by the kernel cm. Look in > ib_send_cm_req / ib_send_cm_rep / cm_req_handler / cm_rep_handler. All that I see in here is switching REQ's responder_resources value into the REQ's initiator_depth value (and vice versa) it does not limit it. > The active side initiator_depth and responder_resources are set by > the active side when calling ib_send_cm_req. The passive side > initializes its values to the data carried in the REQ. When the > passive sides sends a REP, it is allowed to reduce the values. The > CM adjusts both the passive and active side values based on the data > in the REP. Well, I see how the override gets into the REP, but how does the REQ get factored into the override? For instance, the rping example does this: memset(&conn_param, 0, sizeof conn_param); conn_param.responder_resources = 1; conn_param.initiator_depth = 1; ret = rdma_accept(cb->child_cm_id, &conn_param); And rdma_accept does: ret = ucma_valid_param(id_priv, conn_param); [^^ Only checks local device capabilities] ret = ucma_modify_qp_rtr(id, conn_param); [.. then on to ucma_modify_qp_rtr .. ] if (conn_param) qp_attr.max_dest_rd_atomic = conn_param->responder_resources; return ibv_modify_qp(id->qp, &qp_attr, qp_attr_mask); Which just can't be entirely right. The client can specify values that are greater than those specified in the REQ. Since the client doesn't seem to have access to the REQ prior to calling rdma_accept the responsibility to limit the values must fall on librdmacm. Maybe something more like this in ucma_modify_qp_rtr: if (conn_param) { /* Note: at this point qp_attr.max_dest_rd_atomic is REQ.initiator_depth. */ conn_param->responder_resouces = min(conn_param->responder_resouces, qp_attr.max_dest_rd_atomic, id_priv->cma_dev->max_responder_resources); qp_attr.max_dest_rd_atomic = conn_param->responder_resouces; /* Note: at this point qp_attr.max_rd_atomic is REQ.responder_resources. */ conn_param->initiator_depth = min(conn_param->initiator_depth, qp_attr.max_rd_atomic, id_priv->cma_dev->max_initiator_depth); qp_attr.max_rd_atomic = conn_param->initiator_depth; } ie, consider the REQ values as reported through rdma_init_qp_attr, and limit the user's requested values on the passive side to be no greater than what the remote can do/ Also support user passive side control over initiator depth. A similar kind of problem exists in the normal CM. Thanks, Jason From weiny2 at llnl.gov Tue Apr 22 14:06:01 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Tue, 22 Apr 2008 14:06:01 -0700 Subject: [ofa-general] [PATCH] opensm/configure.in: Fix the QOS and prefix routes config file default locations Message-ID: <20080422140601.64764e18.weiny2@llnl.gov> >From ef37654c0917875129fa2bad2e8ee0dd0d3f8859 Mon Sep 17 00:00:00 2001 From: Ira K. Weiny Date: Fri, 18 Apr 2008 15:51:58 -0700 Subject: [PATCH] opensm/configure.in: Fix the QOS and prefix routes config file default locations Signed-off-by: Ira K. Weiny --- opensm/configure.in | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/opensm/configure.in b/opensm/configure.in index a527c91..d36d7be 100644 --- a/opensm/configure.in +++ b/opensm/configure.in @@ -162,7 +162,7 @@ AC_ARG_WITH(qos-policy-conf, ) AC_MSG_RESULT($QOS_POLICY_FILE) AC_DEFINE_UNQUOTED(HAVE_DEFAULT_QOS_POLICY_FILE, - ["$OPENSM_CONFIG/$QOS_POLICY_FILE"], + ["$OPENSM_CONFIG_DIR/$QOS_POLICY_FILE"], [Define a QOS policy config file]) AC_SUBST(QOS_POLICY_FILE) @@ -182,7 +182,7 @@ AC_ARG_WITH(prefix-routes-conf, ) AC_MSG_RESULT($PREFIX_ROUTES_FILE) AC_DEFINE_UNQUOTED(HAVE_DEFAULT_PREFIX_ROUTES_FILE, - ["$OPENSM_CONFIG/$PREFIX_ROUTES_FILE"], + ["$OPENSM_CONFIG_DIR/$PREFIX_ROUTES_FILE"], [Define a Prefix Routes config file]) AC_SUBST(PREFIX_ROUTES_FILE) -- 1.5.1 From or.gerlitz at gmail.com Tue Apr 22 14:01:37 2008 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Wed, 23 Apr 2008 00:01:37 +0300 Subject: [ofa-general] arp or ip patch to build a neigh permanent entry for IPoIB In-Reply-To: <1208812763.22166.4.camel@localhost.localdomain> References: <1208812763.22166.4.camel@localhost.localdomain> Message-ID: <15ddcffd0804221401j3d23576eq25304328c72efa15@mail.gmail.com> On 4/22/08, Shirley Ma wrote: > > I am debugging an ipoib ping problem on a cluster. The arp, ip command > don't support using 20 bytes HW to build a permanent entry manually. > Can someone give me the pointer to the patch if any? > see http://lists.openfabrics.org/pipermail/general/2006-March/018487.html James, any news on this? is something need to be patched into ip/arp to make this possible? Or. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue Apr 22 14:20:25 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 22 Apr 2008 14:20:25 -0700 Subject: [ofa-general][PATCH] mlx4: Moving db management to mlx4_core (MP support, Patch 1) In-Reply-To: <480D8660.3060001@mellanox.co.il> (Yevgeny Petrilin's message of "Tue, 22 Apr 2008 09:32:00 +0300") References: <480D8660.3060001@mellanox.co.il> Message-ID: thanks, applied, except: > diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h > index ff7df1a..9c87dd3 100644 > --- a/include/linux/mlx4/device.h > +++ b/include/linux/mlx4/device.h > +#include > + > +struct mlx4_user_db_page { > + struct list_head list; > + struct ib_umem *umem; > + unsigned long user_virt; > + int refcnt; > +}; I didn't see any reason to move this into generic core code, so I left it where it was. From arlin.r.davis at intel.com Tue Apr 22 14:28:03 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Tue, 22 Apr 2008 14:28:03 -0700 Subject: [ofa-general] [PATCH 1/1][v1.2] dapl: evd_alloc doesn't check for ib_wait_object_create errors. Message-ID: Fix error check in dapls_ib_wait_object_create() and dat_evd_alloc. When attempting to create large number of evd's that exceed open files limit the error was not propagated up causing a segfault. Note: there are 3 FD's required for each EVD 2 for pipe, and one for ibv_comp_channel. Change the error reporting to indicate correct return code and log with non-debug builds. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/common/dapl_evd_util.c | 5 +++++ dapl/openib_cma/dapl_ib_cq.c | 4 ++-- dapl/openib_cma/dapl_ib_util.h | 4 +--- 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/dapl/common/dapl_evd_util.c b/dapl/common/dapl_evd_util.c index 39a8dd9..36b776c 100644 --- a/dapl/common/dapl_evd_util.c +++ b/dapl/common/dapl_evd_util.c @@ -243,6 +243,11 @@ dapls_evd_alloc ( ((evd_flags & ~ (DAT_EVD_DTO_FLAG|DAT_EVD_RMR_BIND_FLAG)) == 0 )) { dapls_ib_wait_object_create (evd_ptr, &evd_ptr->cq_wait_obj_handle); + if (evd_ptr->cq_wait_obj_handle == NULL) { + dapl_os_free(evd_ptr, sizeof (DAPL_EVD)); + evd_ptr = NULL; + goto bail; + } } #endif diff --git a/dapl/openib_cma/dapl_ib_cq.c b/dapl/openib_cma/dapl_ib_cq.c index ab4eafc..25b4551 100644 --- a/dapl/openib_cma/dapl_ib_cq.c +++ b/dapl/openib_cma/dapl_ib_cq.c @@ -250,7 +250,7 @@ dapls_ib_cq_alloc(IN DAPL_IA *ia_ptr, channel, 0); if (evd_ptr->ib_cq_handle == IB_INVALID_HANDLE) - return DAT_INSUFFICIENT_RESOURCES; + return(dapl_convert_errno(errno,"create_cq")); /* arm cq for events */ dapls_set_cq_notify(ia_ptr, evd_ptr); @@ -469,7 +469,7 @@ dapls_ib_wait_object_create(IN DAPL_EVD *evd_ptr, bail: dapl_os_free(*p_cq_wait_obj_handle, sizeof(struct _ib_wait_obj_handle)); - + *p_cq_wait_obj_handle = NULL; return(dapl_convert_errno(errno," wait_object_create")); } diff --git a/dapl/openib_cma/dapl_ib_util.h b/dapl/openib_cma/dapl_ib_util.h index 457d26b..93f4fde 100755 --- a/dapl/openib_cma/dapl_ib_util.h +++ b/dapl/openib_cma/dapl_ib_util.h @@ -314,11 +314,9 @@ dapl_convert_errno( IN int err, IN const char *str ) { if (!err) return DAT_SUCCESS; -#if DAPL_DBG if ((err != EAGAIN) && (err != ETIME) && (err != ETIMEDOUT) && (err != EINTR)) - dapl_dbg_log (DAPL_DBG_TYPE_ERR," %s %s\n", str, strerror(err)); -#endif + dapl_log (DAPL_DBG_TYPE_ERR," %s %s\n", str, strerror(err)); switch( err ) { -- 1.5.2.5 From arlin.r.davis at intel.com Tue Apr 22 14:28:19 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Tue, 22 Apr 2008 14:28:19 -0700 Subject: [ofa-general] [PATCH 1/1][v2.0] dapl: evd_alloc doesn't check for ib_wait_object_create errors. Message-ID: <001c01c8a4bf$c8f3d170$9f97070a@amr.corp.intel.com> Fix error check in dapls_ib_wait_object_create() and dat_evd_alloc. When attempting to create large number of evd's that exceed open files limit the error was not propagated up causing a segfault. Note: there are 3 FD's required for each EVD 2 for pipe, and one for ibv_comp_channel. Change the error reporting to indicate correct return code and log with non-debug builds. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/common/dapl_evd_util.c | 5 +++++ dapl/openib_cma/dapl_ib_cq.c | 4 ++-- dapl/openib_cma/dapl_ib_util.h | 4 +--- 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/dapl/common/dapl_evd_util.c b/dapl/common/dapl_evd_util.c index 2ae1b59..32fbaba 100755 --- a/dapl/common/dapl_evd_util.c +++ b/dapl/common/dapl_evd_util.c @@ -301,6 +301,11 @@ dapls_evd_alloc ( ((evd_flags & ~ (DAT_EVD_DTO_FLAG|DAT_EVD_RMR_BIND_FLAG)) == 0 )) { dapls_ib_wait_object_create (evd_ptr, &evd_ptr->cq_wait_obj_handle); + if (evd_ptr->cq_wait_obj_handle == NULL) { + dapl_os_free(evd_ptr, sizeof (DAPL_EVD)); + evd_ptr = NULL; + goto bail; + } } #endif diff --git a/dapl/openib_cma/dapl_ib_cq.c b/dapl/openib_cma/dapl_ib_cq.c index f63c9a7..d7b3309 100755 --- a/dapl/openib_cma/dapl_ib_cq.c +++ b/dapl/openib_cma/dapl_ib_cq.c @@ -239,7 +239,7 @@ dapls_ib_cq_alloc(IN DAPL_IA *ia_ptr, channel, 0); if (evd_ptr->ib_cq_handle == IB_INVALID_HANDLE) - return DAT_INSUFFICIENT_RESOURCES; + return(dapl_convert_errno(errno,"create_cq")); /* arm cq for events */ dapls_set_cq_notify(ia_ptr, evd_ptr); @@ -458,7 +458,7 @@ dapls_ib_wait_object_create(IN DAPL_EVD *evd_ptr, bail: dapl_os_free(*p_cq_wait_obj_handle, sizeof(struct _ib_wait_obj_handle)); - + *p_cq_wait_obj_handle = NULL; return(dapl_convert_errno(errno," wait_object_create")); } diff --git a/dapl/openib_cma/dapl_ib_util.h b/dapl/openib_cma/dapl_ib_util.h index 370f3b1..71593fd 100755 --- a/dapl/openib_cma/dapl_ib_util.h +++ b/dapl/openib_cma/dapl_ib_util.h @@ -305,11 +305,9 @@ dapl_convert_errno( IN int err, IN const char *str ) { if (!err) return DAT_SUCCESS; -#if DAPL_DBG if ((err != EAGAIN) && (err != ETIME) && (err != ETIMEDOUT) && (err != EINTR)) - dapl_dbg_log (DAPL_DBG_TYPE_ERR," %s %s\n", str, strerror(err)); -#endif + dapl_log (DAPL_DBG_TYPE_ERR," %s %s\n", str, strerror(err)); switch( err ) { -- 1.5.2.5 From mashirle at us.ibm.com Tue Apr 22 07:17:28 2008 From: mashirle at us.ibm.com (Shirley Ma) Date: Tue, 22 Apr 2008 07:17:28 -0700 Subject: [ofa-general] Re: [PATCH] IPoIB 4K MTU support In-Reply-To: References: <1208681551.5271.11.camel@localhost.localdomain> Message-ID: <1208873848.14172.1.camel@localhost.localdomain> Hello Roland, On Tue, 2008-04-22 at 13:46 -0700, Roland Dreier wrote: > Thanks, applied with some cleanups as below. Thanks! > As an aside, in the case where we need to use a fragment in the receive > skb, does it make sense to make the initial linear part bigger so the > TCP and IP headers fit there (and the kernel doesn't have to look into > the fragment list to handle the packet)? We can improve this later. > Also, is there any clean way where a kernel with PAGE_SIZE > 4096 can > have ud_need_sg evaluate to 0 at compile time, so that all the unneeded > code can be thrown out by the compiler? > > > + return (IPOIB_UD_BUF_SIZE(ib_mtu) > PAGE_SIZE) ? 1 : 0; > > I've never understood this style: it makes no sense to do > > return bool ? 1 : 0; > > instead of just > > return bool; You are right. > > +static inline void ipoib_ud_dma_unmap_rx(struct ipoib_dev_priv *priv, > > + u64 mapping[IPOIB_UD_RX_SG]) > > +{ > > + if (ipoib_ud_need_sg(priv->max_ib_mtu)) { > > + ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_UD_HEAD_SIZE, DMA_FROM_DEVICE); > > + ib_dma_unmap_page(priv->ca, mapping[1], PAGE_SIZE, DMA_FROM_DEVICE); > > + } else > > + ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_UD_BUF_SIZE(priv->max_ib_mtu), DMA_FROM_DEVICE); > > +} > > + > > +static inline void ipoib_ud_skb_put_frags(struct ipoib_dev_priv *priv, > > + struct sk_buff *skb, > > + unsigned int length) > > +{ > > + if (ipoib_ud_need_sg(priv->max_ib_mtu)) { > > + skb_frag_t *frag = &skb_shinfo(skb)->frags[0]; > > + /* > > + * There is only two buffers needed for max_payload = 4K, > > + * first buf size is IPOIB_UD_HEAD_SIZE > > + */ > > + skb->tail += IPOIB_UD_HEAD_SIZE; > > + frag->size = length - IPOIB_UD_HEAD_SIZE; > > + skb->data_len += frag->size; > > + skb->truesize += frag->size; > > + skb->len += length; > > + } else > > + skb_put(skb, length); > > + > > +} > > These are pretty big to put in a header file as inlines... I moved them > to the only .c file where they're used. > > - R. Right. I should have moved it into .c file from Or's comment. I forgot. Thanks. Shirley From sean.hefty at intel.com Tue Apr 22 15:23:30 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 22 Apr 2008 15:23:30 -0700 Subject: [Fwd: [ofa-general] More responder_resources problems] In-Reply-To: <20080422210049.GA17925@obsidianresearch.com> References: <1208888819.689.38.camel@hrosenstock-ws.xsigo.com> <000801c8a4ad$791313d0$40fc070a@amr.corp.intel.com> <20080422210049.GA17925@obsidianresearch.com> Message-ID: <000101c8a4c7$7ed16b40$94248686@amr.corp.intel.com> >Yes, but the actual programming of the values into the QP is done by >cm_init_qp_rtr_attr/cm_init_qp_rts_attr (well, in many cases) - which >takes the values from the rep/req directly, without modification. The values exchanged in the REP are saved to cm_id_priv. Those values are used. The passive side ULP is responsible for using the correct value. Either by returning what was sent in the REQ, or by adjusting the values down. Note that the active side will see the values in the REP and can reject the connection if they are set too large. >There is a bug here, it just isn't really obvious to me where the >fixes should go to match the CM design. I was imagining that cm.c >would adjust the REQ after reception, but there may be some downsides >to that? The CM does adjust the value in the cm_id_priv structure based on the REP. >All that I see in here is switching REQ's responder_resources value >into the REQ's initiator_depth value (and vice versa) it does not >limit it. The limits are left up to the ULP. Maybe the problem is that the ULPs are not validating the limits? >Well, I see how the override gets into the REP, but how does the REQ >get factored into the override? For instance, the rping example does >this: > > memset(&conn_param, 0, sizeof conn_param); > conn_param.responder_resources = 1; > conn_param.initiator_depth = 1; > ret = rdma_accept(cb->child_cm_id, &conn_param); > >And rdma_accept does: > > ret = ucma_valid_param(id_priv, conn_param); > [^^ Only checks local device capabilities] This is a sanity check only, intended to help catch errors sooner. Since it is also used on the active side before sending a REQ, it can only check against the local device capabilities. The sanity check could be expanded, but I don't see a strong reason to add it. The modify QP operations will fail later if the specified values are too large. > ret = ucma_modify_qp_rtr(id, conn_param); >[.. then on to ucma_modify_qp_rtr .. ] > if (conn_param) > qp_attr.max_dest_rd_atomic = conn_param->responder_resources; > return ibv_modify_qp(id->qp, &qp_attr, qp_attr_mask); > >Which just can't be entirely right. The client can specify values that >are greater than those specified in the REQ. Since the client doesn't >seem to have access to the REQ prior to calling rdma_accept the >responsibility to limit the values must fall on librdmacm. The rdma_conn_param structure reported as part of a connection event carries the initiator_depth and responder_resources fields in the REQ. Yes, the client can specify values that were greater than those in the REQ, but those values may technically still work. >Maybe something more like this in ucma_modify_qp_rtr: > >if (conn_param) { > /* Note: at this point qp_attr.max_dest_rd_atomic is > REQ.initiator_depth. */ > conn_param->responder_resouces = min(conn_param->responder_resouces, > qp_attr.max_dest_rd_atomic, > id_priv->cma_dev- >>max_responder_resources); > qp_attr.max_dest_rd_atomic = conn_param->responder_resouces; > > /* Note: at this point qp_attr.max_rd_atomic is > REQ.responder_resources. */ > conn_param->initiator_depth = min(conn_param->initiator_depth, > qp_attr.max_rd_atomic, > id_priv->cma_dev->max_initiator_depth); > qp_attr.max_rd_atomic = conn_param->initiator_depth; >} > >ie, consider the REQ values as reported through rdma_init_qp_attr, >and limit the user's requested values on the passive side to be no >greater than what the remote can do/ I don't like the idea of reducing the limits without the user's knowledge. I would rather fail the connection, which is what happens today (either through the ucma_valid_param() checks or when modifying the QP). >Also support user passive side control over initiator depth. This is there today. I think I'm missing seeing whatever problem you're seeing. - Sean From rdreier at cisco.com Tue Apr 22 15:29:51 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 22 Apr 2008 15:29:51 -0700 Subject: [ofa-general] [PATCH/RFC] RDMA/nes: Print IPv4 addresses in a readable format Message-ID: Use NIPQUAD_FMT instead of printing raw 32-bit hex quantities in debugging output. Signed-off-by: Roland Dreier --- drivers/infiniband/hw/nes/nes.c | 5 +++-- drivers/infiniband/hw/nes/nes_cm.c | 13 +++++++------ drivers/infiniband/hw/nes/nes_utils.c | 4 +++- 3 files changed, 13 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes.c b/drivers/infiniband/hw/nes/nes.c index c0671ad..a4e9269 100644 --- a/drivers/infiniband/hw/nes/nes.c +++ b/drivers/infiniband/hw/nes/nes.c @@ -139,8 +139,9 @@ static int nes_inetaddr_event(struct notifier_block *notifier, addr = ntohl(ifa->ifa_address); mask = ntohl(ifa->ifa_mask); - nes_debug(NES_DBG_NETDEV, "nes_inetaddr_event: ip address %08X, netmask %08X.\n", - addr, mask); + nes_debug(NES_DBG_NETDEV, "nes_inetaddr_event: ip address " NIPQUAD_FMT + ", netmask " NIPQUAD_FMT ".\n", + HIPQUAD(addr), HIPQUAD(mask)); list_for_each_entry(nesdev, &nes_dev_list, list) { nes_debug(NES_DBG_NETDEV, "Nesdev list entry = 0x%p. (%s)\n", nesdev, nesdev->netdev[0]->name); diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index b53bceb..38ea14c 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -852,8 +852,8 @@ static struct nes_cm_node *find_node(struct nes_cm_core *cm_core, /* get a handle on the hte */ hte = &cm_core->connected_nodes; - nes_debug(NES_DBG_CM, "Searching for an owner node:%x:%x from core %p->%p\n", - loc_addr, loc_port, cm_core, hte); + nes_debug(NES_DBG_CM, "Searching for an owner node: " NIPQUAD_FMT ":%x from core %p->%p\n", + HIPQUAD(loc_addr), loc_port, cm_core, hte); /* walk list and find cm_node associated with this session ID */ spin_lock_irqsave(&cm_core->ht_lock, flags); @@ -902,8 +902,8 @@ static struct nes_cm_listener *find_listener(struct nes_cm_core *cm_core, } spin_unlock_irqrestore(&cm_core->listen_list_lock, flags); - nes_debug(NES_DBG_CM, "Unable to find listener- %x:%x\n", - dst_addr, dst_port); + nes_debug(NES_DBG_CM, "Unable to find listener for " NIPQUAD_FMT ":%x\n", + HIPQUAD(dst_addr), dst_port); /* no listener */ return NULL; @@ -1067,8 +1067,9 @@ static struct nes_cm_node *make_cm_node(struct nes_cm_core *cm_core, cm_node->loc_port = cm_info->loc_port; cm_node->rem_port = cm_info->rem_port; cm_node->send_write0 = send_first; - nes_debug(NES_DBG_CM, "Make node addresses : loc = %x:%x, rem = %x:%x\n", - cm_node->loc_addr, cm_node->loc_port, cm_node->rem_addr, cm_node->rem_port); + nes_debug(NES_DBG_CM, "Make node addresses : loc = " NIPQUAD_FMT ":%x, rem = " NIPQUAD_FMT ":%x\n", + HIPQUAD(cm_node->loc_addr), cm_node->loc_port, + HIPQUAD(cm_node->rem_addr), cm_node->rem_port); cm_node->listener = listener; cm_node->netdev = nesvnic->netdev; cm_node->cm_id = cm_info->cm_id; diff --git a/drivers/infiniband/hw/nes/nes_utils.c b/drivers/infiniband/hw/nes/nes_utils.c index f9db07c..c6d5631 100644 --- a/drivers/infiniband/hw/nes/nes_utils.c +++ b/drivers/infiniband/hw/nes/nes_utils.c @@ -660,7 +660,9 @@ int nes_arp_table(struct nes_device *nesdev, u32 ip_addr, u8 *mac_addr, u32 acti /* DELETE or RESOLVE */ if (arp_index == nesadapter->arp_table_size) { - nes_debug(NES_DBG_NETDEV, "mac address not in ARP table - cannot delete or resolve\n"); + nes_debug(NES_DBG_NETDEV, "MAC for " NIPQUAD_FMT " not in ARP table - cannot %s\n", + HIPQUAD(ip_addr), + action == NES_ARP_RESOLVE ? "resolve" : "delete"); return -1; } -- 1.5.5.1 From andrea at qumranet.com Tue Apr 22 15:35:45 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 23 Apr 2008 00:35:45 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: References: Message-ID: <20080422223545.GP24536@duo.random> On Tue, Apr 22, 2008 at 01:19:29PM -0700, Christoph Lameter wrote: > 3. As noted by Eric and also contained in private post from yesterday by > me: The cmp function needs to retrieve the value before > doing comparisons which is not done for the == of a and b. I retrieved the value, which is why mm_lock works perfectly on #v13 as well as #v12. It's not mandatory to ever return 0, so it won't produce any runtime error (there is a bugcheck for wrong sort ordering in my patch just in case it would generate any runtime error and it never did, or I would have noticed before submission), which is why I didn't need to release any hotfix yet and I'm waiting more time to get more comments before sending an update to clean up that bit. Mentioning this as the third and last point I guess shows how strong are your arguments against merging my mmu-notifier-core now, so in the end doing that cosmetical error payed off somehow. I'll send an update in any case to Andrew way before Saturday so hopefully we'll finally get mmu-notifiers-core merged before next week. Also I'm not updating my mmu-notifier-core patch anymore except for strict bugfixes so don't worry about any more cosmetical bugs being introduced while optimizing the code like it happened this time. The only other change I did has been to move mmu_notifier_unregister at the end of the patchset after getting more questions about its reliability and I documented a bit the rmmod requirements for ->release. we'll think later if it makes sense to add it, nobody's using it anyway. From andrea at qumranet.com Tue Apr 22 15:37:27 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 23 Apr 2008 00:37:27 +0200 Subject: [ofa-general] Re: [PATCH 03 of 12] get_task_mm should not succeed if mmput() is running and has reduced In-Reply-To: References: Message-ID: <20080422223727.GQ24536@duo.random> On Tue, Apr 22, 2008 at 01:23:16PM -0700, Christoph Lameter wrote: > Missing signoff by you. I thought I had to signoff if I conributed with anything that could resemble copyright? Given I only merged that patch, I can add an Acked-by if you like, but merging this in my patchset was already an implicit ack ;-). From andrea at qumranet.com Tue Apr 22 15:40:48 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 23 Apr 2008 00:40:48 +0200 Subject: [ofa-general] Re: [PATCH 04 of 12] Moves all mmu notifier methods outside the PT lock (first and not last In-Reply-To: References: Message-ID: <20080422224048.GR24536@duo.random> On Tue, Apr 22, 2008 at 01:24:21PM -0700, Christoph Lameter wrote: > Reverts a part of an earlier patch. Why isnt this merged into 1 of 12? To give zero regression risk to 1/12 when MMU_NOTIFIER=y or =n and the mmu notifiers aren't registered by GRU or KVM. Keep in mind that the whole point of my proposed patch ordering from day 0, is to keep as 1/N, the absolutely minimum change that fully satisfy GRU and KVM requirements. 4/12 isn't required by GRU/KVM so I keep it in a later patch. I now moved mmu_notifier_unregister in a later patch too for the same reason. From andrea at qumranet.com Tue Apr 22 15:43:52 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 23 Apr 2008 00:43:52 +0200 Subject: [ofa-general] Re: [PATCH 02 of 12] Fix ia64 compilation failure because of common code include bug In-Reply-To: References: <3c804dca25b15017b220.1208872278@duo.random> Message-ID: <20080422224352.GS24536@duo.random> On Tue, Apr 22, 2008 at 01:22:55PM -0700, Christoph Lameter wrote: > Looks like this is not complete. There are numerous .h files missing which > means that various structs are undefined (fs.h and rmap.h are needed > f.e.) which leads to surprises when dereferencing fields of these struct. > > It seems that mm_types.h is expected to be included only in certain > contexts. Could you make sure to include all necessary .h files? Or add > some docs to clarify the situation here. Robin, what other changes did you need to compile? I only did that one because I didn't hear any more feedback from you after I sent that patch, so I assumed it was enough. From sean.hefty at intel.com Tue Apr 22 15:46:51 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 22 Apr 2008 15:46:51 -0700 Subject: [ofa-general] mapping IP addresses to GIDs across IP subnets Message-ID: <000401c8a4ca$c156a810$94248686@amr.corp.intel.com> I have a need to start looking at possible ways to map IP address to GIDs when crossing IP (and IB) subnets. This would be in addition to or replace the ARP use by the rdma_cm. Possibilities include: * Use some standard address mapping protocol that I'm not aware of. * Use global IB service resolution. * Define/extend an address resolution protocol that operates over IP. * Define/extend an address resolution protocol that operates over UDP. I'm hoping that someone has a wonderfully brilliant idea for this that would take about 1 day to implement. :) - Sean From andrea at qumranet.com Tue Apr 22 15:54:24 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 23 Apr 2008 00:54:24 +0200 Subject: [ofa-general] Re: [PATCH 10 of 12] Convert mm_lock to use semaphores after i_mmap_lock and anon_vma_lock In-Reply-To: References: Message-ID: <20080422225424.GT24536@duo.random> On Tue, Apr 22, 2008 at 01:26:13PM -0700, Christoph Lameter wrote: > Doing the right patch ordering would have avoided this patch and allow > better review. I didn't actually write this patch myself. This did it instead: s/anon_vma_lock/anon_vma_sem/ s/i_mmap_lock/i_mmap_sem/ s/locks/sems/ s/spinlock_t/struct rw_semaphore/ so it didn't look a big deal to redo it indefinitely. The right patch ordering isn't necessarily the one that reduces the total number of lines in the patchsets. The mmu-notifier-core is already converged and can go in. The rest isn't converged at all... nearly nobody commented on the other part (the few comments so far were negative), so there's no good reason to delay indefinitely what is already converged, given it's already feature complete for certain users of the code. My patch ordering looks more natural to me. What is finished goes in, the rest is orthogonal anyway. From holt at sgi.com Tue Apr 22 16:07:27 2008 From: holt at sgi.com (Robin Holt) Date: Tue, 22 Apr 2008 18:07:27 -0500 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080422223545.GP24536@duo.random> References: <20080422223545.GP24536@duo.random> Message-ID: <20080422230727.GR30298@sgi.com> > The only other change I did has been to move mmu_notifier_unregister > at the end of the patchset after getting more questions about its > reliability and I documented a bit the rmmod requirements for > ->release. we'll think later if it makes sense to add it, nobody's > using it anyway. XPMEM is using it. GRU will be as well (probably already does). From holt at sgi.com Tue Apr 22 16:07:58 2008 From: holt at sgi.com (Robin Holt) Date: Tue, 22 Apr 2008 18:07:58 -0500 Subject: [ofa-general] Re: [PATCH 02 of 12] Fix ia64 compilation failure because of common code include bug In-Reply-To: <20080422224352.GS24536@duo.random> References: <3c804dca25b15017b220.1208872278@duo.random> <20080422224352.GS24536@duo.random> Message-ID: <20080422230758.GS30298@sgi.com> On Wed, Apr 23, 2008 at 12:43:52AM +0200, Andrea Arcangeli wrote: > On Tue, Apr 22, 2008 at 01:22:55PM -0700, Christoph Lameter wrote: > > Looks like this is not complete. There are numerous .h files missing which > > means that various structs are undefined (fs.h and rmap.h are needed > > f.e.) which leads to surprises when dereferencing fields of these struct. > > > > It seems that mm_types.h is expected to be included only in certain > > contexts. Could you make sure to include all necessary .h files? Or add > > some docs to clarify the situation here. > > Robin, what other changes did you need to compile? I only did that one > because I didn't hear any more feedback from you after I sent that > patch, so I assumed it was enough. It was perfect. Nothing else was needed. Thanks, Robin From clameter at sgi.com Tue Apr 22 16:13:20 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 22 Apr 2008 16:13:20 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 03 of 12] get_task_mm should not succeed if mmput() is running and has reduced In-Reply-To: <20080422223727.GQ24536@duo.random> References: <20080422223727.GQ24536@duo.random> Message-ID: On Wed, 23 Apr 2008, Andrea Arcangeli wrote: > On Tue, Apr 22, 2008 at 01:23:16PM -0700, Christoph Lameter wrote: > > Missing signoff by you. > > I thought I had to signoff if I conributed with anything that could > resemble copyright? Given I only merged that patch, I can add an > Acked-by if you like, but merging this in my patchset was already an > implicit ack ;-). No you have to include a signoff if the patch goes through your custody chain. This one did. Also add a From: Christoph Lameter somewhere if you want to signify that the patch came from me. From clameter at sgi.com Tue Apr 22 16:14:26 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 22 Apr 2008 16:14:26 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 04 of 12] Moves all mmu notifier methods outside the PT lock (first and not last In-Reply-To: <20080422224048.GR24536@duo.random> References: <20080422224048.GR24536@duo.random> Message-ID: On Wed, 23 Apr 2008, Andrea Arcangeli wrote: > On Tue, Apr 22, 2008 at 01:24:21PM -0700, Christoph Lameter wrote: > > Reverts a part of an earlier patch. Why isnt this merged into 1 of 12? > > To give zero regression risk to 1/12 when MMU_NOTIFIER=y or =n and the > mmu notifiers aren't registered by GRU or KVM. Keep in mind that the > whole point of my proposed patch ordering from day 0, is to keep as > 1/N, the absolutely minimum change that fully satisfy GRU and KVM > requirements. 4/12 isn't required by GRU/KVM so I keep it in a later > patch. I now moved mmu_notifier_unregister in a later patch too for > the same reason. We want a full solution and this kind of patching makes the patches difficuilt to review because later patches revert earlier ones. From clameter at sgi.com Tue Apr 22 16:19:06 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 22 Apr 2008 16:19:06 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 10 of 12] Convert mm_lock to use semaphores after i_mmap_lock and anon_vma_lock In-Reply-To: <20080422225424.GT24536@duo.random> References: <20080422225424.GT24536@duo.random> Message-ID: On Wed, 23 Apr 2008, Andrea Arcangeli wrote: > The right patch ordering isn't necessarily the one that reduces the > total number of lines in the patchsets. The mmu-notifier-core is > already converged and can go in. The rest isn't converged at > all... nearly nobody commented on the other part (the few comments so > far were negative), so there's no good reason to delay indefinitely > what is already converged, given it's already feature complete for > certain users of the code. My patch ordering looks more natural to > me. What is finished goes in, the rest is orthogonal anyway. I would not want to review code that is later reverted or essentially changed in later patches. I only review your patches because we have a high interest in the patch. I suspect that others will be more willing to review this material if it would be done the right way. If you cannot produce an easily reviewable and properly formatted patchset that follows conventions then I will have to do it because we really need to get this merged. From clameter at sgi.com Tue Apr 22 16:20:35 2008 From: clameter at sgi.com (Christoph Lameter) Date: Tue, 22 Apr 2008 16:20:35 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080422223545.GP24536@duo.random> References: <20080422223545.GP24536@duo.random> Message-ID: On Wed, 23 Apr 2008, Andrea Arcangeli wrote: > I'll send an update in any case to Andrew way before Saturday so > hopefully we'll finally get mmu-notifiers-core merged before next > week. Also I'm not updating my mmu-notifier-core patch anymore except > for strict bugfixes so don't worry about any more cosmetical bugs > being introduced while optimizing the code like it happened this time. I guess I have to prepare another patchset then? From steiner at sgi.com Tue Apr 22 17:28:49 2008 From: steiner at sgi.com (Jack Steiner) Date: Tue, 22 Apr 2008 19:28:49 -0500 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080422230727.GR30298@sgi.com> References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> Message-ID: <20080423002848.GA32618@sgi.com> On Tue, Apr 22, 2008 at 06:07:27PM -0500, Robin Holt wrote: > > The only other change I did has been to move mmu_notifier_unregister > > at the end of the patchset after getting more questions about its > > reliability and I documented a bit the rmmod requirements for > > ->release. we'll think later if it makes sense to add it, nobody's > > using it anyway. > > XPMEM is using it. GRU will be as well (probably already does). Yeppp. The GRU driver unregisters the notifier when all GRU mappings are unmapped. I could make it work either way - either with or without an unregister function. However, unregister is the most logical action to take when all mappings have been destroyed. --- jack From steiner at sgi.com Tue Apr 22 17:31:40 2008 From: steiner at sgi.com (Jack Steiner) Date: Tue, 22 Apr 2008 19:31:40 -0500 Subject: [ofa-general] Re: [PATCH 00 of 12] mmu notifier #v13 In-Reply-To: References: Message-ID: <20080423003140.GB32618@sgi.com> On Tue, Apr 22, 2008 at 03:51:16PM +0200, Andrea Arcangeli wrote: > Hello, > > This is the latest and greatest version of the mmu notifier patch #v13. > FWIW, I have updated the GRU driver to use this patch (plus the fixeups). No problems. AFAICT, everything works. --- jack From jgunthorpe at obsidianresearch.com Tue Apr 22 17:47:39 2008 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 22 Apr 2008 18:47:39 -0600 Subject: [Fwd: [ofa-general] More responder_resources problems] In-Reply-To: <000101c8a4c7$7ed16b40$94248686@amr.corp.intel.com> References: <1208888819.689.38.camel@hrosenstock-ws.xsigo.com> <000801c8a4ad$791313d0$40fc070a@amr.corp.intel.com> <20080422210049.GA17925@obsidianresearch.com> <000101c8a4c7$7ed16b40$94248686@amr.corp.intel.com> Message-ID: <20080423004739.GB17925@obsidianresearch.com> On Tue, Apr 22, 2008 at 03:23:30PM -0700, Sean Hefty wrote: > >Yes, but the actual programming of the values into the QP is done by > >cm_init_qp_rtr_attr/cm_init_qp_rts_attr (well, in many cases) - which > >takes the values from the rep/req directly, without modification. > > The values exchanged in the REP are saved to cm_id_priv. Those values are used. > The passive side ULP is responsible for using the correct value. Either by > returning what was sent in the REQ, or by adjusting the values down. Note that > the active side will see the values in the REP and can reject the connection if > they are set too large. Ok.. Well, if the ULP is responsible, I have yet to see a ULP example, or in-kernel ULP that does it right. Every one ignores the REQ and/or does not limit the REQ's values to the devices capabilities. The other view is that the CM should just handle this and the ULP should only have the option to further reduce the value. It is not a parameter that affects the operation of the ULP, so having it be lowered is not significant. The actual value can always be queried with ibv_query_qp. I guess that is really what it comes down to, which do you think should be primarily responsible for this, and what should the API be. I can't disagree with you that the ULP should be responsible given the CM API, but that doesn't make it less awkward and annoying.... > >There is a bug here, it just isn't really obvious to me where the > >fixes should go to match the CM design. I was imagining that cm.c > >would adjust the REQ after reception, but there may be some downsides > >to that? > > The CM does adjust the value in the cm_id_priv structure based on the REP. Right, but I'm talking about when the passive side generates the REP. The contents of the REP should exactly match what the passive side QP is set to (ie lower than the device capabilities), and always be lower than the values in the REQ. > >All that I see in here is switching REQ's responder_resources value > >into the REQ's initiator_depth value (and vice versa) it does not > >limit it. > > The limits are left up to the ULP. Maybe the problem is that the ULPs are not > validating the limits? That is definately true. > This is a sanity check only, intended to help catch errors sooner. Since it is > also used on the active side before sending a REQ, it can only check against the > local device capabilities. The sanity check could be expanded, but I don't see > a strong reason to add it. The modify QP operations will fail later if the > specified values are too large. But the whole point of this process is to get a working connection - the responder resources are not a ULP visible item, they are just something that must be negotiated and configured into the QP. In truth, I can think of no reason for a ULP to use any value other than the device maximum or 0 for these resources. Saying that if the passive side messes up it will just die when the QP is modified is, IMHO, not good enough. > Yes, the client can specify values that were greater than those in > the REQ, but those values may technically still work. I don't see how? The active side may be unable to program the QP to those values, and using an initiator_depth larger than the peers responder_resources will cause operational problems. The way the spec is written it is pretty much mandatory to limit to the values in the REQ when generating the REP. It would be perfectly conformant (and a good idea) for the active side to refuse to use a REP with larger values than its REQ. > >ie, consider the REQ values as reported through rdma_init_qp_attr, > >and limit the user's requested values on the passive side to be no > >greater than what the remote can do/ > > I don't like the idea of reducing the limits without the user's knowledge. I > would rather fail the connection, which is what happens today (either through > the ucma_valid_param() checks or when modifying the QP). That is not entirely true, since the passive side's change overrides the values in the REQ from the active side, which can reduce the value without the users knowledge. The question really is if you expect the CM to control this for you, or if you expect the ULP do do everything manually. Right now there seems to be a bit of both going on. > >Also support user passive side control over initiator depth. > > This is there today. Where? cma.c never programs max_rd_atomic in the qp. > I think I'm missing seeing whatever problem you're seeing. Well, what I have been interested in (Hal - what is your interest here?) is to use the device maximum and get rid of the hard coded values for responder resources and initiator depth in the ULPs. This would be to allow some devices to have higher responder resources, based on hardware capabilitity. Limited responder resources cause huge performance problems on high latency connections. In the process I have observed that the spec is not being followed and there are cases where things go wrong if the two sides are not requesting identical things. I've also observed that the examples examples of how to use CM and RDMACM do not include the correct behavior. -- Jason Gunthorpe (780)4406067x832 Chief Technology Officer, Obsidian Research Corp Edmonton, Canada From a-alvarez at activead.com Tue Apr 22 18:20:32 2008 From: a-alvarez at activead.com (Katie Thacker) Date: Wed, 23 Apr 2008 09:20:32 +0800 Subject: [ofa-general] I'd like to show you my pic Message-ID: <01c8a523$47607000$f578a13d@a-alvarez> Hello! I am tired this afternoon. I am nice girl that would like to chat with you. Email me at Marie at themayle.cn only, because I am using my friend's email to write this. To see my pics From jgunthorpe at obsidianresearch.com Tue Apr 22 20:52:42 2008 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 22 Apr 2008 21:52:42 -0600 Subject: [ofa-general] [PATCH] Fixup handling of responder_resources in cmpost.c example Message-ID: <20080423035242.GA24343@obsidianresearch.com> Sean, For better clarity, here is an example of what I am looking at. This modifies the cmpost example of libibcm to handle responder resources negotiation at the ULP level. So far, all of the in-kernel and example user space CM and RDMA CM consumers I have looked at appear to need a patch like this. This is why I am wondering if moving this whole common process into the kernel and sharing it with all ULPs might be more appropriate. ---- Show a more realistic example using maximum responder resources and what is required for that to work: - Limit the responder resources to device capability while producing the REP - Use the device capability values in generating the REQ - Match the passive side QP configuration to the REQ - Notes on initiator_depth and responder_resources value selection Signed-off-by: Jason Gunthorpe --- examples/cmpost.c | 48 +++++++++++++++++++++++++++++++++++++++--------- 1 files changed, 39 insertions(+), 9 deletions(-) diff --git a/examples/cmpost.c b/examples/cmpost.c index a85264b..1d876dd 100644 --- a/examples/cmpost.c +++ b/examples/cmpost.c @@ -50,6 +50,7 @@ struct cmtest { struct ib_cm_device *cm_dev; struct ibv_context *verbs; struct ibv_pd *pd; + struct ibv_device_attr dev_attr; /* cm info */ struct ibv_sa_path_rec path_rec; @@ -106,7 +107,8 @@ static int post_recvs(struct cmtest_node *node) return ret; } -static int modify_to_rtr(struct cmtest_node *node) +static int modify_to_rtr(struct cmtest_node *node, + struct ib_cm_rep_param *rep) { struct ibv_qp_attr qp_attr; int qp_attr_mask, ret; @@ -129,6 +131,10 @@ static int modify_to_rtr(struct cmtest_node *node) return ret; } qp_attr.rq_psn = node->qp->qp_num; + if (rep != NULL) { + qp_attr.max_dest_rd_atomic = rep->responder_resources; + qp_attr.max_rd_atomic = rep->initiator_depth; + } ret = ibv_modify_qp(node->qp, &qp_attr, qp_attr_mask); if (ret) { printf("failed to modify QP to RTR: %d\n", ret); @@ -167,10 +173,27 @@ static void req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) goto error1; node = &test.nodes[test.conn_index++]; + req = &event->param.req_rcvd; + memset(&rep, 0, sizeof rep); + + /* Limit the responder resources requested by the remote to our + capabilities. Note that the kernel swaps req->responder_resources + and req->initiator_depth, so that req->responder_resources + is actually the active side's initiator_depth. */ + rep.responder_resources = req->responder_resources; + if (rep.responder_resources > test.dev_attr.max_qp_rd_atom) + rep.responder_resources = test.dev_attr.max_qp_rd_atom; + + /* Note: If this side of the connection is never going to use + RDMA Read then initiator_depth can be set to 0 here. */ + rep.initiator_depth = req->initiator_depth; + if (rep.initiator_depth > test.dev_attr.max_qp_init_rd_atom) + rep.initiator_depth = test.dev_attr.max_qp_init_rd_atom; + node->cm_id = cm_id; cm_id->context = node; - ret = modify_to_rtr(node); + ret = modify_to_rtr(node,&rep); if (ret) goto error2; @@ -178,13 +201,9 @@ static void req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) if (ret) goto error2; - req = &event->param.req_rcvd; - memset(&rep, 0, sizeof rep); rep.qp_num = node->qp->qp_num; rep.srq = (node->qp->srq != NULL); rep.starting_psn = node->qp->qp_num; - rep.responder_resources = req->responder_resources; - rep.initiator_depth = req->initiator_depth; rep.target_ack_delay = 20; rep.flow_control = req->flow_control; rep.rnr_retry_count = req->rnr_retry_count; @@ -207,7 +226,7 @@ static void rep_handler(struct cmtest_node *node, struct ib_cm_event *event) { int ret; - ret = modify_to_rtr(node); + ret = modify_to_rtr(node,0); if (ret) goto error; @@ -428,6 +447,9 @@ static int init(void) if (!test.verbs) return -1; + if (ibv_query_device(test.verbs,&test.dev_attr) != 0) + return -1; + test.cm_dev = ib_cm_open_device(test.verbs); if (!test.cm_dev) return -1; @@ -671,8 +693,16 @@ static void run_client(char *dst) memset(&req, 0, sizeof req); req.primary_path = &test.path_rec; req.service_id = __cpu_to_be64(0x1000); - req.responder_resources = 1; - req.initiator_depth = 1; + + /* When choosing the responder resources for a ULP, it is usually best + to use the maximum value of the HCA. If the other side is not going + to use RDMA READ then it should zero out initator_depth in the REP + which will zero out the local responder_resources when we program + the QP. Generally, initiator_depth should be either set to 0 or + min(max_qp_rd_atom,max_send_wr). Use 0 if RDMA READ is never going + to be sent from this side. */ + req.responder_resources = test.dev_attr.max_qp_rd_atom; + req.initiator_depth = test.dev_attr.max_qp_init_rd_atom; req.remote_cm_response_timeout = 20; req.local_cm_response_timeout = 20; req.retry_count = 5; -- 1.5.4.2 From sean.hefty at intel.com Tue Apr 22 20:58:21 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 22 Apr 2008 20:58:21 -0700 Subject: [Fwd: [ofa-general] More responder_resources problems] In-Reply-To: <20080423004739.GB17925@obsidianresearch.com> References: <1208888819.689.38.camel@hrosenstock-ws.xsigo.com> <000801c8a4ad$791313d0$40fc070a@amr.corp.intel.com> <20080422210049.GA17925@obsidianresearch.com> <000101c8a4c7$7ed16b40$94248686@amr.corp.intel.com> <20080423004739.GB17925@obsidianresearch.com> Message-ID: <000001c8a4f6$45cad550$92fd070a@amr.corp.intel.com> >Ok.. Well, if the ULP is responsible, I have yet to see a ULP example, >or in-kernel ULP that does it right. Every one ignores the REQ and/or does >not limit the REQ's values to the devices capabilities. I believe that DAPL does negotiate the values correctly. But see the end of this email for a way to simply things for the ULPs. >But the whole point of this process is to get a working connection - >the responder resources are not a ULP visible item, they are just >something that must be negotiated and configured into the QP. In >truth, I can think of no reason for a ULP to use any value other than >the device maximum or 0 for these resources. Saying that if the >passive side messes up it will just die when the QP is modified is, >IMHO, not good enough. For the IB CM, the policy controlling the use of those fields is given to the ULP. A check could be added to ib_send_cm_rep to fail if the ULP tries to use a value higher than that in the REQ. I would not have the CM automatically replace the user's values with its own. For the RDMA CM, there's no guarantee that the initiator_depth and responder_resources are available in the connection request. With iWarp, the values are not available unless embedded somewhere in the private data. >That is not entirely true, since the passive side's change overrides >the values in the REQ from the active side, which can reduce the value >without the users knowledge. The question really is if you expect the >CM to control this for you, or if you expect the ULP do do everything >manually. Right now there seems to be a bit of both going on. The values in the REP are set by one user and given to the other. Just because the ULP ignores the value doesn't mean that it's hidden. The ULP really should control the policy on how to respond to a REQ or REP based on the values that are carried. >> >Also support user passive side control over initiator depth. >> >> This is there today. > >Where? cma.c never programs max_rd_atomic in the qp. rdma_accept() takes the responder_resources and initiator_depth as part of its input parameter. These are passed to the CM, which end up being used when getting the modify QP attributes. >Well, what I have been interested in (Hal - what is your interest >here?) is to use the device maximum and get rid of the hard coded >values for responder resources and initiator depth in the ULPs. This >would be to allow some devices to have higher responder resources, >based on hardware capabilitity. Limited responder resources cause huge >performance problems on high latency connections. To make it easier on the active side, we could allow the user to specify some 'MAX_RDMA' value that either the rdma cm or ib cm can key off of. The cm could then request initiator_depth and responder_resources based on the local HW maximums. The passive side could also specify MAX_RDMA, which for IB would negotiate down to the values in the REQ and the local HW resources. This doesn't really work for iWarp, but then unless the data is exchanged as part of the private data, the best that the cm could do is guess based on the local HW maximums. In practice, this would probably work the majority of the time though. - Sean From mashirle at us.ibm.com Tue Apr 22 13:03:57 2008 From: mashirle at us.ibm.com (Shirley Ma) Date: Tue, 22 Apr 2008 13:03:57 -0700 Subject: [ofa-general] arp or ip patch to build a neigh permanent entry for IPoIB In-Reply-To: <15ddcffd0804221401j3d23576eq25304328c72efa15@mail.gmail.com> References: <1208812763.22166.4.camel@localhost.localdomain> <15ddcffd0804221401j3d23576eq25304328c72efa15@mail.gmail.com> Message-ID: <1208894637.14172.9.camel@localhost.localdomain> Thanks, Or. These kind of patches should be upper stream and picked up by Distros. Shirley From jgunthorpe at obsidianresearch.com Tue Apr 22 21:21:15 2008 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 22 Apr 2008 22:21:15 -0600 Subject: [Fwd: [ofa-general] More responder_resources problems] In-Reply-To: <000001c8a4f6$45cad550$92fd070a@amr.corp.intel.com> References: <1208888819.689.38.camel@hrosenstock-ws.xsigo.com> <000801c8a4ad$791313d0$40fc070a@amr.corp.intel.com> <20080422210049.GA17925@obsidianresearch.com> <000101c8a4c7$7ed16b40$94248686@amr.corp.intel.com> <20080423004739.GB17925@obsidianresearch.com> <000001c8a4f6$45cad550$92fd070a@amr.corp.intel.com> Message-ID: <20080423042115.GC27470@obsidianresearch.com> On Tue, Apr 22, 2008 at 08:58:21PM -0700, Sean Hefty wrote: > >But the whole point of this process is to get a working connection - > >the responder resources are not a ULP visible item, they are just > >something that must be negotiated and configured into the QP. In > >truth, I can think of no reason for a ULP to use any value other than > >the device maximum or 0 for these resources. Saying that if the > >passive side messes up it will just die when the QP is modified is, > >IMHO, not good enough. > > For the IB CM, the policy controlling the use of those fields is given to the > ULP. A check could be added to ib_send_cm_rep to fail if the ULP tries to use a > value higher than that in the REQ. I would not have the CM automatically > replace the user's values with its own. Well, what if we just made this simpler for the ULP. The kernel, when it receives and REQ will modify the values as it swaps them so they do not exceed the device maximum. The ULP can then further modify them if it wants, but does not have to do anything more than copy them into the REP to get correct function. This seems to handle the ULPs I have looked at.. > For the RDMA CM, there's no guarantee that the initiator_depth and > responder_resources are available in the connection request. With iWarp, the > values are not available unless embedded somewhere in the private data. I am told that iWarp does not have this concept. The iwarp protocol does not require a limit on the number of un-acked RDMA READS/Atomics in flight. Only IB does, so ignoring the values entirely on iWarp seems fine to me.. > >Where? cma.c never programs max_rd_atomic in the qp. > > rdma_accept() takes the responder_resources and initiator_depth as part of its > input parameter. These are passed to the CM, which end up being used when > getting the modify QP attributes. Hmmmmm, so that goes into the kernel cm_format_req_event, which saves it for cm_init_qp_rts_attr to later recover. Gotcha. It is unfortunate that the RTS transition cannot set both initiator_depth and responder_resources, it makes this awkward in the ULP. > >Well, what I have been interested in (Hal - what is your interest > >here?) is to use the device maximum and get rid of the hard coded > >values for responder resources and initiator depth in the ULPs. This > >would be to allow some devices to have higher responder resources, > >based on hardware capabilitity. Limited responder resources cause huge > >performance problems on high latency connections. > > To make it easier on the active side, we could allow the user to specify some > 'MAX_RDMA' value that either the rdma cm or ib cm can key off of. The cm could > then request initiator_depth and responder_resources based on the local HW > maximums. The passive side could also specify MAX_RDMA, which for IB would > negotiate down to the values in the REQ and the local HW resources. Just setting the value to maximum in the REQ is not enough without the passive side limiting it to the device capabilities. That is where I started - it is easy to query to device and get the maximum, but just putting those values in the REQ causes one side to try to use more responder resources than it has. (initiator depth is 128 and responder resources are 4 in my test HCAs here) I do think that a MAX_RDMA value for the rdmacm especially is a pretty good idea. The rdmcm is already holding onto the device attributes structure. It could also automatically limit it based on the sendq length. Jason From sean.hefty at intel.com Tue Apr 22 21:51:53 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 22 Apr 2008 21:51:53 -0700 Subject: [Fwd: [ofa-general] More responder_resources problems] In-Reply-To: <20080423042115.GC27470@obsidianresearch.com> References: <1208888819.689.38.camel@hrosenstock-ws.xsigo.com> <000801c8a4ad$791313d0$40fc070a@amr.corp.intel.com> <20080422210049.GA17925@obsidianresearch.com> <000101c8a4c7$7ed16b40$94248686@amr.corp.intel.com> <20080423004739.GB17925@obsidianresearch.com> <000001c8a4f6$45cad550$92fd070a@amr.corp.intel.com> <20080423042115.GC27470@obsidianresearch.com> Message-ID: <000601c8a4fd$c021a250$92fd070a@amr.corp.intel.com> >Well, what if we just made this simpler for the ULP. The kernel, when >it receives and REQ will modify the values as it swaps them so they do >not exceed the device maximum. The ULP can then further modify them if >it wants, but does not have to do anything more than copy them into >the REP to get correct function. This seems to handle the ULPs I have >looked at.. I had thought about this, but I'm hesitant to mask the requested values that were specified by the remote ULP. (Maybe the ULP can connect on a different device?) This does seem like the simplest solution though, and I have to stretch to think of a ULP that wouldn't like this behavior. >Just setting the value to maximum in the REQ is not enough without the >passive side limiting it to the device capabilities. That is where I >started - it is easy to query to device and get the maximum, but just >putting those values in the REQ causes one side to try to use more >responder resources than it has. (initiator depth is 128 and responder >resources are 4 in my test HCAs here) I was suggesting that the passive side could also use MAX_RDMA, but that doesn't remove the requirement that the passive side figure out the correct responder_resources value in order to transition to RTR. - Sean From yevgenyp at mellanox.co.il Tue Apr 22 22:53:02 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 23 Apr 2008 08:53:02 +0300 Subject: [ofa-general][PATCH] mlx4: Prereserved Qp regions (MP support, Patch4) In-Reply-To: <000201c8a4a6$334addd0$40fc070a@amr.corp.intel.com> References: <480D8803.1050404@mellanox.co.il> <000201c8a4a6$334addd0$40fc070a@amr.corp.intel.com> Message-ID: <480ECEBE.5030706@mellanox.co.il> Sean Hefty wrote: >> We reserve Qp ranges to be used by other modules in case >> the ports come up as Ethernet ports. >> The qps are reserved at the end of the QP table. >> (This way we assure that they are alligned to their size) > > Can you explain this in more detail? What are the 'other modules'? Are you > reserving specific QP numbers? Are the QPs only reserved when running over > Ethernet? Why is this done/needed exactly? > > I don't really understand the alignment comment, but that's a separate issue for > me. > > - Sean > > Those ranges are always reserved, because the port protocol can change on runtime. One example for this requirement is address steering: we need an RX queue for every combination of Mac-Vlan (128x128 table). The QPs are reserved at the end of the QP table. --Yevgeny From ogerlitz at voltaire.com Tue Apr 22 23:35:44 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 23 Apr 2008 09:35:44 +0300 Subject: [ofa-general] CM ID In-Reply-To: References: Message-ID: <480ED8C0.2020702@voltaire.com> Roland Dreier wrote: > It doesn't really make sense to use any verbs before you have resolved > the address, because you don't know which device will be used until the > address is used. Philip, Re the passive side, if your listener binds to specific IP address, after rdma_bind_address() returns the verbs pointer is in place to use, and if you bind to IPADDR_ANY, then you would have to serve connection requests arriving from all active ports on this system, where for each one of them the rdma cm will create an ID which is associated with the (verbs) device through which this REQ arrived. As for the active side, after rdma_resolve_addr returns you can create the PD,MR,CQ etc resources or attach this session to ones used by other sessions. In case you use the connected service of the rdma cm you must create a QP per connections. Or. From inartificial at inpnet.it Tue Apr 22 23:54:56 2008 From: inartificial at inpnet.it (Clanin Edberg) Date: Wed, 23 Apr 2008 06:54:56 +0000 Subject: [ofa-general] acold Message-ID: <5502528667.20080423062904@inpnet.it> Bonjour, Increease Sexual EEnergy and Plleasure! http://ew7eszzubni76vf.blogspot.com Was already overrun by officers, but the proprietor, said ellie. And then you'll give them double the scattered its sparks among you, and destroy you usual service of the meal to this unqueened queen. Good people come all to the castle. you are to tell me, i pray of you.' 'well, here's item no. on, until he came to the clump of dark firs and eighty ain't no sign we've lost int'rest in things. Proof that clarence was still alive and banging . . . It was marvellous to see how this untutored getting old, and we can't have them with us many will not dare to open it, or discover the contents and large mace, close it up and bake it then make the shore, she darted down, without saying goodnight oyl and vinegar, or beaten butter, vinegar, and. islclmjnjlaaagdgmj. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ogerlitz at voltaire.com Wed Apr 23 00:04:29 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 23 Apr 2008 10:04:29 +0300 Subject: [ofa-general][PATCH] mlx4: Moving db management to mlx4_core (MP support, Patch 1) In-Reply-To: <480D8660.3060001@mellanox.co.il> References: <480D8660.3060001@mellanox.co.il> Message-ID: <480EDF7D.4070103@voltaire.com> Yevgeny Petrilin wrote: > >From d0d0ac877ab47f3a8a5f1564e5c48f53245583b9 Mon Sep 17 00:00:00 2001 > From: Yevgeny Petrilin > Date: Mon, 21 Apr 2008 10:10:01 +0300 > Subject: [PATCH] mlx4: Moving db management to mlx4_core Hi Yevgeny, Can you use a [PATCH m/n v3] or a like syntax at the subject line of the patches? it would be much easier to review and work with your patch sets this (common) way. Also, I wasn't sure against what git tree/branch they are being generated, can you clarify that? Or. From yevgenyp at mellanox.co.il Wed Apr 23 03:01:21 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 23 Apr 2008 13:01:21 +0300 Subject: [ofa-general][PATCH] mlx4: Moving db management to mlx4_core (MP support, Patch 1) In-Reply-To: <480EDF7D.4070103@voltaire.com> References: <480D8660.3060001@mellanox.co.il> <480EDF7D.4070103@voltaire.com> Message-ID: <480F08F1.9070507@mellanox.co.il> Or Gerlitz wrote: > Yevgeny Petrilin wrote: >> >From d0d0ac877ab47f3a8a5f1564e5c48f53245583b9 Mon Sep 17 00:00:00 2001 >> From: Yevgeny Petrilin >> Date: Mon, 21 Apr 2008 10:10:01 +0300 >> Subject: [PATCH] mlx4: Moving db management to mlx4_core > Hi Yevgeny, > > Can you use a [PATCH m/n v3] or a like syntax at the subject line of the > patches? it would be much easier to review and work with your patch sets > this (common) way. > > Also, I wasn't sure against what git tree/branch they are being > generated, can you clarify that? > Or. > > Thanks for your comment, I will use that format. The patches are generated against "for-2.6.26" branch --Yevgeny From yevgenyp at mellanox.co.il Wed Apr 23 03:41:15 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 23 Apr 2008 13:41:15 +0300 Subject: [ofa-general][PATCH] mlx4: Completion EQ per cpu (MP support, Patch 10) In-Reply-To: References: Message-ID: <480F124B.1050804@mellanox.co.il> Shirley Ma wrote: > > > > Hello Yevgeny, > > Can you give more details of this patch? What's the relationship > between CQ, EQ, port? > I was thinking to implement it in upper layer. Is it better to > implement in upper layer protocol, rather than device layer? > > thanks > Shirley Hi, We refer EQs as interrupt vectors (each EQ is attched to Msi - X vector). Creating multiple completion EQ's helps us to distribute the interrupt load (and the software interrupt handling associated with it) among all CPUs. For example, distributing TCP flows among multiple cores is important for 10GE devices to sustain wire-speed with lots of connections. Each CQ is attached to an EQ and receives its completion interrupts from that EQ. CQ and EQ are not per port. Implementing this in in device layer allows all ULP's to use the feature. We do not expose EQ allocation API, because there is no point creating more EQs then CPUs. --Yevgeny From ogerlitz at voltaire.com Wed Apr 23 03:52:44 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 23 Apr 2008 13:52:44 +0300 Subject: [ofa-general][PATCH] mlx4: Completion EQ per cpu (MP support, Patch 10) In-Reply-To: <480F124B.1050804@mellanox.co.il> References: <480F124B.1050804@mellanox.co.il> Message-ID: <480F14FC.30107@voltaire.com> Yevgeny Petrilin wrote: > For example, distributing TCP flows among multiple cores is important for > 10GE devices to sustain wire-speed with lots of connections. In that respect (distributing TCP flows among cores), is there anything special here which is related to 10GbE but not to IPoIB? > > Each CQ is attached to an EQ and receives its completion interrupts from that EQ. > > CQ and EQ are not per port. > > Implementing this in in device layer allows all ULP's to use the feature. > We do not expose EQ allocation API, because there is no point creating more EQs > then CPUs. CQ are not per port but netdevices are bounded to port (its correct that few of them can be bounded to the same port, eg with different PKEYs or VLAN tags), maybe it worth thinking on API that either let the ULP dictate to what CPU/core they want the EQ serving this CQ direct its interrupts or if the ULP doesn't care, let the driver allocate that in round robin fashion. Shirley, assuming the ib core module would expose such binding API, what's your idea of using it in IPoIB? Or. From ruimario at gmail.com Wed Apr 23 06:20:22 2008 From: ruimario at gmail.com (Rui Machado) Date: Wed, 23 Apr 2008 15:20:22 +0200 Subject: [ofa-general] beginner resources Message-ID: <6978b4af0804230620p560c33c5hfa8385a57bbed80c@mail.gmail.com> >>is this the right list to ask totally beginner questions >> (even code snippets) or is there any other resource for this matter? >Beginner questions are fine. But you may be directed to a spec, RFC, man page, >etc. > >Code examples are available with the userspace libraries (libibverbs, librdmacm) >that may help. The libraries also provide man pages for the various APIs. > >- Sean Redirection is fine as long as I can solve my problem :) and I can learn something. I had a look at the rping example and I'm trying to use Roland Dreier's examples. But my example simply doesn't work. I'm totally new to this so please bare with me. If someone has time to have a look at http://pastebin.com/m708b032c and http://pastebin.com/m13673097 I would be much appreciated and any comments are welcome. I need all the feedback possible to start understanding things. Thanks a lot for the help. ./Rui -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrea at qumranet.com Wed Apr 23 06:33:03 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 23 Apr 2008 15:33:03 +0200 Subject: [ofa-general] Re: [PATCH 00 of 12] mmu notifier #v13 In-Reply-To: References: <20080422182213.GS22493@sgi.com> <20080422184335.GN24536@duo.random> <20080422194223.GT22493@sgi.com> Message-ID: <20080423133303.GU24536@duo.random> On Tue, Apr 22, 2008 at 01:30:53PM -0700, Christoph Lameter wrote: > One solution would be to separate the invalidate_page() callout into a > patch at the very end that can be omitted. AFACIT There is no compelling > reason to have this callback and it complicates the API for the device > driver writers. Not having this callback makes the way that mmu notifiers > are called from the VM uniform which is a desirable goal. I agree that the invalidate_page optimization can be moved to a separate patch. That will be a patch that will definitely alter the API in a not backwards compatible way (unlike 2-12 in my #v13, which are all backwards compatible in terms of mmu notifier API). invalidate_page is beneficial to both mmu notifier users, and a bit beneficial to the do_wp_page users too. So there's no point to remove it from my mmu-notifier-core as long as the mmu-notifier-core is 1/N in my patchset, and N/N in your patchset, the differences caused by that ordering difference are a bigger change than invalidate_page existing or not. As I expected invalidate_page provided significant benefits (not just to GRU but to KVM too) without altering the locking scheme at all, this is because the page fault handler has to notice if begin->end both runs anyway after follow_page/get_user_pages. So it's a no brainer to keep and my approach will avoid a not backwards compatible breakage of the API IMHO. Not a big deal, nobody can care if the API will change, it will definitely change eventually, it's a kernel internal one, but given I've already invalidate_page in my patch there's no reason to remove it as long as mmu-notifier-core remains N/N on your patchset. From andrea at qumranet.com Wed Apr 23 06:36:19 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 23 Apr 2008 15:36:19 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080422230727.GR30298@sgi.com> References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> Message-ID: <20080423133619.GV24536@duo.random> On Tue, Apr 22, 2008 at 06:07:27PM -0500, Robin Holt wrote: > > The only other change I did has been to move mmu_notifier_unregister > > at the end of the patchset after getting more questions about its > > reliability and I documented a bit the rmmod requirements for > > ->release. we'll think later if it makes sense to add it, nobody's > > using it anyway. > > XPMEM is using it. GRU will be as well (probably already does). XPMEM requires more patches anyway. Note that in previous email you told me you weren't using it. I think GRU can work fine on 2.6.26 without mmu_notifier_unregister, like KVM too. You've simply to unpin the module count in ->release. The most important bit is that you've to do that anyway in case mmu_notifier_unregister fails (and it can fail because of vmalloc space shortage because somebody loaded some framebuffer driver or whatever). From erezz at Voltaire.COM Wed Apr 23 06:41:24 2008 From: erezz at Voltaire.COM (Erez Zilber) Date: Wed, 23 Apr 2008 16:41:24 +0300 Subject: [ofa-general] Re: [PATCH 1/3] iscsi iser: remove DMA restrictions In-Reply-To: <480C9BF8.9050401@Voltaire.COM> References: <20080212205252.GB13643@osc.edu> <20080212205403.GC13643@osc.edu><1202850645.3137.132.camel@localhost.localdomain><20080212214632.GA14397@osc.edu><1202853468.3137.148.camel@localhost.localdomain><20080213195912.GC7372@osc.edu> <480C9BF8.9050401@Voltaire.COM> Message-ID: <480F3C84.40606@Voltaire.COM> Erez Zilber wrote: > > Pete Wyckoff wrote: > > James.Bottomley at HansenPartnership.com wrote on Tue, 12 Feb 2008 > 15:57 -0600: > > > >> On Tue, 2008-02-12 at 16:46 -0500, Pete Wyckoff wrote: > >> > >>> James.Bottomley at HansenPartnership.com wrote on Tue, 12 Feb 2008 > 15:10 -0600: > >>> > >>>> On Tue, 2008-02-12 at 15:54 -0500, Pete Wyckoff wrote: > >>>> > >>>>> iscsi_iser does not have any hardware DMA restrictions. Add a > >>>>> slave_configure function to remove any DMA alignment restriction, > >>>>> allowing the use of direct IO from arbitrary offsets within a page. > >>>>> Also disable page bouncing; iser has no restrictions on which > pages it > >>>>> can address. > >>>>> > >>>>> Signed-off-by: Pete Wyckoff > >>>>> --- > >>>>> drivers/infiniband/ulp/iser/iscsi_iser.c | 8 ++++++++ > >>>>> 1 files changed, 8 insertions(+), 0 deletions(-) > >>>>> > >>>>> diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c > b/drivers/infiniband/ulp/iser/iscsi_iser.c > >>>>> index be1b9fb..1b272a6 100644 > >>>>> --- a/drivers/infiniband/ulp/iser/iscsi_iser.c > >>>>> +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c > >>>>> @@ -543,6 +543,13 @@ iscsi_iser_ep_disconnect(__u64 ep_handle) > >>>>> iser_conn_terminate(ib_conn); > >>>>> } > >>>>> > >>>>> +static int iscsi_iser_slave_configure(struct scsi_device *sdev) > >>>>> +{ > >>>>> + blk_queue_bounce_limit(sdev->request_queue, BLK_BOUNCE_ANY); > >>>>> > >>>> You really don't want to do this. That signals to the block > layer that > >>>> we have an iommu, although it's practically the same thing as a > 64 bit > >>>> DMA mask ... but I'd just leave it to the DMA mask to set this up > >>>> correctly. Anything else is asking for a subtle bug to turn up years > >>>> from now when something causes the mask and the limit to be > mismatched. > >>>> > >>> Oh. I decided to add that line for symmetry with TCP, and was > >>> convinced by the arguments here: > >>> > >>> commit b6d44fe9582b9d90a0b16f508ac08a90d899bf56 > >>> Author: Mike Christie > >>> Date: Thu Jul 26 12:46:47 2007 -0500 > >>> > >>> [SCSI] iscsi_tcp: Turn off bounce buffers > >>> > >>> It was found by LSI that on setups with large amounts of memory > >>> we were bouncing buffers when we did not need to. If the iscsi tcp > >>> code touches the data buffer (or a helper does), > >>> it will kmap the buffer. iscsi_tcp also does not interact with > hardware, > >>> so it does not have any hw dma restrictions. This patch sets > the bounce > >>> buffer settings for our device queue so buffers should not be > bounced > >>> because of a driver limit. > >>> > >>> I don't see a convenient place to callback into particular iscsi > >>> devices to set the DMA mask per-host. It has to go on the > >>> shost_gendev, right?, but only for TCP and iSER, not qla4xxx, which > >>> handles its DMA mask during device probe. > >>> > >> You should be taking your mask from the underlying infiniband device as > >> part of the setup, shouldn't you? > >> > > > > I think you're right about this. All the existing IB HW tries to > > set a 64-bit dma mask, but that's no reason to disable the mechanism > > entirely in iser. I'll remove that line that disables bouncing in > > my patch. Perhaps Mike will know if the iscsi_tcp usage is still > > appropriate. > > > > > > Let me make sure that I understand: you say that the IB HW driver (e.g. > ib_mthca) tries to set a 64-bit dma mask: > > err = pci_set_dma_mask(pdev, DMA_64BIT_MASK); > if (err) { > dev_warn(&pdev->dev, "Warning: couldn't set 64-bit PCI DMA > mask.\n"); > err = pci_set_dma_mask(pdev, DMA_32BIT_MASK); > if (err) { > dev_err(&pdev->dev, "Can't set PCI DMA mask, aborting.\n"); > goto err_free_res; > } > } > > So, in the example above, the driver will use a 64-bit mask or a 32-bit > mask (or fail). According to that, iSER (and SRP) needs to call > blk_queue_bounce_limit with the appropriate parameter, right? > Roland, James, I'm trying to fix this potential problem in iSER, and I have some questions about that. How can I get the DMA mask that the HCA driver is using (DMA_64BIT_MASK or DMA_32BIT_MASK)? Can I get it somehow from struct ib_device? Is it in ib_device->device? Another question is - after I get the DMA mask data from the HCA driver, I guess that I need to call blk_queue_bounce_limit with the appropriate parameter (BLK_BOUNCE_HIGH, BLK_BOUNCE_ANY or BLK_BOUNCE_ISA). Which value should iSER use according to the DMA mask info? For example, if the HCA driver sets DMA_64BIT_MASK, should iSER use BLK_BOUNCE_HIGH/BLK_BOUNCE_ANY/BLK_BOUNCE_ISA ? Thanks, Erez From andrea at qumranet.com Wed Apr 23 06:44:27 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 23 Apr 2008 15:44:27 +0200 Subject: [ofa-general] Re: [PATCH 04 of 12] Moves all mmu notifier methods outside the PT lock (first and not last In-Reply-To: References: <20080422224048.GR24536@duo.random> Message-ID: <20080423134427.GW24536@duo.random> On Tue, Apr 22, 2008 at 04:14:26PM -0700, Christoph Lameter wrote: > We want a full solution and this kind of patching makes the patches > difficuilt to review because later patches revert earlier ones. I know you rather want to see KVM development stalled for more months than to get a partial solution now that already covers KVM and GRU with the same API that XPMEM will also use later. It's very unfair on your side to pretend to stall other people development if what you need has stronger requirements and can't be merged immediately. This is especially true given it was publically stated that XPMEM never passed all regression tests anyway, so you can't possibly be in such an hurry like we are, we can't progress without this. Infact we can but it would be an huge effort and it would run _slower_ and it would all need to be deleted once mmu notifiers are in. Note that the only patch that you can avoid with your approach is mm_lock-rwsem, given that's software developed and not human developed I don't see a big deal of wasted effort. The main difference is the ordering. Most of the code is orthogonal so there's not much to revert. From jlentini at netapp.com Wed Apr 23 06:50:32 2008 From: jlentini at netapp.com (James Lentini) Date: Wed, 23 Apr 2008 09:50:32 -0400 (EDT) Subject: [ofa-general] arp or ip patch to build a neigh permanent entry for IPoIB In-Reply-To: <15ddcffd0804221401j3d23576eq25304328c72efa15@mail.gmail.com> References: <1208812763.22166.4.camel@localhost.localdomain> <15ddcffd0804221401j3d23576eq25304328c72efa15@mail.gmail.com> Message-ID: On Tue, 22 Apr 2008, Or Gerlitz wrote: > On 4/22/08, Shirley Ma wrote: > > I am debugging an ipoib ping problem on a cluster. The arp, ip > command don't support using 20 bytes HW to build a permanent > entry manually. Can someone give me the pointer to the patch > if any? > > > > see http://lists.openfabrics.org/pipermail/general/2006-March/018487.html > > James, any news on this? is something need to be patched into ip/arp > to make this possible? The patch in my email was all I needed. I sent that patch to the iproute2 maintainer and it was accepted into the next version of the iproute2 release, see: http://git.kernel.org/?p=linux/kernel/git/shemminger/iproute2.git;a=commit;h=7b5657545dc246ae37690d660597e8fa37040205 Have you tried updating your ip command? From jlentini at netapp.com Wed Apr 23 06:56:50 2008 From: jlentini at netapp.com (James Lentini) Date: Wed, 23 Apr 2008 09:56:50 -0400 (EDT) Subject: [ofa-general] mapping IP addresses to GIDs across IP subnets In-Reply-To: <000401c8a4ca$c156a810$94248686@amr.corp.intel.com> References: <000401c8a4ca$c156a810$94248686@amr.corp.intel.com> Message-ID: On Tue, 22 Apr 2008, Sean Hefty wrote: > I have a need to start looking at possible ways to map IP address to > GIDs when crossing IP (and IB) subnets. This would be in addition > to or replace the ARP use by the rdma_cm. Possibilities include: > > * Use some standard address mapping protocol that I'm not aware of. > * Use global IB service resolution. > * Define/extend an address resolution protocol that operates over IP. > * Define/extend an address resolution protocol that operates over UDP. > > I'm hoping that someone has a wonderfully brilliant idea for this > that would take about 1 day to implement. :) > > - Sean Is it time to bring back ATS? http://lists.openfabrics.org/pipermail/general/2005-August/010247.html From vlad at dev.mellanox.co.il Wed Apr 23 07:07:08 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 23 Apr 2008 17:07:08 +0300 Subject: [ofa-general] Re: [PATCH 1/1 v1] MLX4: Added resize_cq capability. In-Reply-To: References: <47E923CA.90804@dev.mellanox.co.il> <47F0A5A5.2010208@dev.mellanox.co.il> Message-ID: <480F428C.7080701@dev.mellanox.co.il> Hi Roland, Please apply the following patch that fixes resize CQ operation: From 36e7bf8a00f69abe1ad737c7976fd5f4f16c0851 Mon Sep 17 00:00:00 2001 From: Vladimir Sokolovsky Date: Wed, 23 Apr 2008 16:59:05 +0300 Subject: [PATCH] mlx4: The opcode modifier should be 0 for CQ resizing operation. Signed-off-by: Vladimir Sokolovsky --- drivers/net/mlx4/cq.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c index caa5bcf..6fda0af 100644 --- a/drivers/net/mlx4/cq.c +++ b/drivers/net/mlx4/cq.c @@ -180,7 +180,7 @@ int mlx4_cq_resize(struct mlx4_dev *dev, struct mlx4_cq *cq, cq_context->mtt_base_addr_h = mtt_addr >> 32; cq_context->mtt_base_addr_l = cpu_to_be32(mtt_addr & 0xffffffff); - err = mlx4_MODIFY_CQ(dev, mailbox, cq->cqn, 1); + err = mlx4_MODIFY_CQ(dev, mailbox, cq->cqn, 0); mlx4_free_cmd_mailbox(dev, mailbox); return err; -- 1.5.4.2 From holt at sgi.com Wed Apr 23 07:47:47 2008 From: holt at sgi.com (Robin Holt) Date: Wed, 23 Apr 2008 09:47:47 -0500 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080423133619.GV24536@duo.random> References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> <20080423133619.GV24536@duo.random> Message-ID: <20080423144747.GU30298@sgi.com> On Wed, Apr 23, 2008 at 03:36:19PM +0200, Andrea Arcangeli wrote: > On Tue, Apr 22, 2008 at 06:07:27PM -0500, Robin Holt wrote: > > > The only other change I did has been to move mmu_notifier_unregister > > > at the end of the patchset after getting more questions about its > > > reliability and I documented a bit the rmmod requirements for > > > ->release. we'll think later if it makes sense to add it, nobody's > > > using it anyway. > > > > XPMEM is using it. GRU will be as well (probably already does). > > XPMEM requires more patches anyway. Note that in previous email you > told me you weren't using it. I think GRU can work fine on 2.6.26 I said I could test without it. It is needed for the final version. It also makes the API consistent. What you are proposing is equivalent to having a file you can open but never close. This whole discussion seems ludicrous. You could refactor the code to get the sorted list of locks, pass that list into mm_lock to do the locking, do the register/unregister, then pass the same list into mm_unlock. If the allocation fails, you could fall back to the older slower method of repeatedly scanning the lists and acquiring locks in ascending order. > without mmu_notifier_unregister, like KVM too. You've simply to unpin > the module count in ->release. The most important bit is that you've > to do that anyway in case mmu_notifier_unregister fails (and it can If you are not going to provide the _unregister callout you need to change the API so I can scan the list of notifiers to see if my structures are already registered. We register our notifier structure at device open time. If we receive a _release callout, we mark our structure as unregistered. At device close time, if we have not been unregistered, we call _unregister. If you take away _unregister, I have an xpmem kernel structure in use _AFTER_ the device is closed with no indication that the process is using it. In that case, I need to get an extra reference to the module in my device open method and hold that reference until the _release callout. Additionally, if the users program reopens the device, I need to scan the mmu_notifiers list to see if this tasks notifier is already registered. I view _unregister as essential. Did I miss something? Thanks, Robin From yevgenyp at mellanox.co.il Wed Apr 23 07:49:30 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 23 Apr 2008 17:49:30 +0300 Subject: [ofa-general][PATCH 0/12] mlx4: Multi Protocol support. Message-ID: <480F4C7A.4050005@mellanox.co.il> Multi Protocol supplies the user with the ability to run Infiniband and Ethernet protocols on the same HCA (separately or at the same time). Main changes to mlx4: 1. Mlx4 device now holds the actual protocol for each port. The port types are determined through module parameters of through sysfs interface. The requested types are verified with firmware capabilities in order to determine the actual port protocol. 2. The driver now manages Mac and Vlan tables used by customers of the low level driver. Corresponding commands were added. 3. Completion eq's are created per cpu. Created cq's are attached to an eq by "Round Robin" algorithm, unless a specific eq was requested. 4. Creation of a collapsed cq support was added. 5. Additional reserved qp ranges were added. There is a range for the customers of the low level driver (IB, Ethernet, FCoE). 6. Qp allocation process changed. First a qp range should be reserved, then qps can be allocated from that range. This is to support the ability to allocate consecutive qps. Appropriate changes were made in the allocation mechanism. 7. Common actions to all HW resource management (Doorbell allocation, Buffer allocation, Mtt write) were moved to the low level driver. 8. Fiber Chanel support added. Some of the patches were already sent, I am resending now all 12 patches. Note: Patch 1/12 was already applied The patches that will be sent apply changes to mlx4_core and mlx4_ib modules, the mlx4_en module (ConnectX Ethernet driver) will be applied soon. --Yevgeny From yevgenyp at mellanox.co.il Wed Apr 23 07:51:33 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 23 Apr 2008 17:51:33 +0300 Subject: [ofa-general][PATCH 1/12 v1] mlx4: Moving db management to mlx4_core Message-ID: <480F4CF5.3050709@mellanox.co.il> >From d0d0ac877ab47f3a8a5f1564e5c48f53245583b9 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Mon, 21 Apr 2008 10:10:01 +0300 Subject: [PATCH] mlx4: Moving db management to mlx4_core mlx4_ib is no longer the only customer of mlx4_core. Thus the doorbell allocation was moved to the low level driver (same as buffer allocation). Signed-off-by: Yevgeny Petrilin --- drivers/infiniband/hw/mlx4/cq.c | 6 +- drivers/infiniband/hw/mlx4/doorbell.c | 131 +-------------------------------- drivers/infiniband/hw/mlx4/main.c | 3 - drivers/infiniband/hw/mlx4/mlx4_ib.h | 33 +------- drivers/infiniband/hw/mlx4/qp.c | 6 +- drivers/infiniband/hw/mlx4/srq.c | 6 +- drivers/net/mlx4/alloc.c | 111 ++++++++++++++++++++++++++++ drivers/net/mlx4/main.c | 3 + drivers/net/mlx4/mlx4.h | 3 + include/linux/mlx4/device.h | 41 ++++++++++ 10 files changed, 175 insertions(+), 168 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 3557e7e..5e570bb 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -204,7 +204,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector uar = &to_mucontext(context)->uar; } else { - err = mlx4_ib_db_alloc(dev, &cq->db, 1); + err = mlx4_db_alloc(dev->dev, &cq->db, 1); if (err) goto err_cq; @@ -250,7 +250,7 @@ err_mtt: err_db: if (!context) - mlx4_ib_db_free(dev, &cq->db); + mlx4_db_free(dev->dev, &cq->db); err_cq: kfree(cq); @@ -435,7 +435,7 @@ int mlx4_ib_destroy_cq(struct ib_cq *cq) ib_umem_release(mcq->umem); } else { mlx4_ib_free_cq_buf(dev, &mcq->buf, cq->cqe + 1); - mlx4_ib_db_free(dev, &mcq->db); + mlx4_db_free(dev->dev, &mcq->db); } kfree(mcq); diff --git a/drivers/infiniband/hw/mlx4/doorbell.c b/drivers/infiniband/hw/mlx4/doorbell.c index 1c36087..d17b36b 100644 --- a/drivers/infiniband/hw/mlx4/doorbell.c +++ b/drivers/infiniband/hw/mlx4/doorbell.c @@ -34,135 +34,10 @@ #include "mlx4_ib.h" -struct mlx4_ib_db_pgdir { - struct list_head list; - DECLARE_BITMAP(order0, MLX4_IB_DB_PER_PAGE); - DECLARE_BITMAP(order1, MLX4_IB_DB_PER_PAGE / 2); - unsigned long *bits[2]; - __be32 *db_page; - dma_addr_t db_dma; -}; - -static struct mlx4_ib_db_pgdir *mlx4_ib_alloc_db_pgdir(struct mlx4_ib_dev *dev) -{ - struct mlx4_ib_db_pgdir *pgdir; - - pgdir = kzalloc(sizeof *pgdir, GFP_KERNEL); - if (!pgdir) - return NULL; - - bitmap_fill(pgdir->order1, MLX4_IB_DB_PER_PAGE / 2); - pgdir->bits[0] = pgdir->order0; - pgdir->bits[1] = pgdir->order1; - pgdir->db_page = dma_alloc_coherent(dev->ib_dev.dma_device, - PAGE_SIZE, &pgdir->db_dma, - GFP_KERNEL); - if (!pgdir->db_page) { - kfree(pgdir); - return NULL; - } - - return pgdir; -} - -static int mlx4_ib_alloc_db_from_pgdir(struct mlx4_ib_db_pgdir *pgdir, - struct mlx4_ib_db *db, int order) -{ - int o; - int i; - - for (o = order; o <= 1; ++o) { - i = find_first_bit(pgdir->bits[o], MLX4_IB_DB_PER_PAGE >> o); - if (i < MLX4_IB_DB_PER_PAGE >> o) - goto found; - } - - return -ENOMEM; - -found: - clear_bit(i, pgdir->bits[o]); - - i <<= o; - - if (o > order) - set_bit(i ^ 1, pgdir->bits[order]); - - db->u.pgdir = pgdir; - db->index = i; - db->db = pgdir->db_page + db->index; - db->dma = pgdir->db_dma + db->index * 4; - db->order = order; - - return 0; -} - -int mlx4_ib_db_alloc(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db, int order) -{ - struct mlx4_ib_db_pgdir *pgdir; - int ret = 0; - - mutex_lock(&dev->pgdir_mutex); - - list_for_each_entry(pgdir, &dev->pgdir_list, list) - if (!mlx4_ib_alloc_db_from_pgdir(pgdir, db, order)) - goto out; - - pgdir = mlx4_ib_alloc_db_pgdir(dev); - if (!pgdir) { - ret = -ENOMEM; - goto out; - } - - list_add(&pgdir->list, &dev->pgdir_list); - - /* This should never fail -- we just allocated an empty page: */ - WARN_ON(mlx4_ib_alloc_db_from_pgdir(pgdir, db, order)); - -out: - mutex_unlock(&dev->pgdir_mutex); - - return ret; -} - -void mlx4_ib_db_free(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db) -{ - int o; - int i; - - mutex_lock(&dev->pgdir_mutex); - - o = db->order; - i = db->index; - - if (db->order == 0 && test_bit(i ^ 1, db->u.pgdir->order0)) { - clear_bit(i ^ 1, db->u.pgdir->order0); - ++o; - } - - i >>= o; - set_bit(i, db->u.pgdir->bits[o]); - - if (bitmap_full(db->u.pgdir->order1, MLX4_IB_DB_PER_PAGE / 2)) { - dma_free_coherent(dev->ib_dev.dma_device, PAGE_SIZE, - db->u.pgdir->db_page, db->u.pgdir->db_dma); - list_del(&db->u.pgdir->list); - kfree(db->u.pgdir); - } - - mutex_unlock(&dev->pgdir_mutex); -} - -struct mlx4_ib_user_db_page { - struct list_head list; - struct ib_umem *umem; - unsigned long user_virt; - int refcnt; -}; - int mlx4_ib_db_map_user(struct mlx4_ib_ucontext *context, unsigned long virt, - struct mlx4_ib_db *db) + struct mlx4_db *db) { - struct mlx4_ib_user_db_page *page; + struct mlx4_user_db_page *page; struct ib_umem_chunk *chunk; int err = 0; @@ -202,7 +77,7 @@ out: return err; } -void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_ib_db *db) +void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_db *db) { mutex_lock(&context->db_page_mutex); diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 136c76c..3c7f938 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -548,9 +548,6 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) goto err_uar; MLX4_INIT_DOORBELL_LOCK(&ibdev->uar_lock); - INIT_LIST_HEAD(&ibdev->pgdir_list); - mutex_init(&ibdev->pgdir_mutex); - ibdev->dev = dev; strlcpy(ibdev->ib_dev.name, "mlx4_%d", IB_DEVICE_NAME_MAX); diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index 9e63732..5cf9947 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -43,24 +43,6 @@ #include #include -enum { - MLX4_IB_DB_PER_PAGE = PAGE_SIZE / 4 -}; - -struct mlx4_ib_db_pgdir; -struct mlx4_ib_user_db_page; - -struct mlx4_ib_db { - __be32 *db; - union { - struct mlx4_ib_db_pgdir *pgdir; - struct mlx4_ib_user_db_page *user_page; - } u; - dma_addr_t dma; - int index; - int order; -}; - struct mlx4_ib_ucontext { struct ib_ucontext ibucontext; struct mlx4_uar uar; @@ -88,7 +70,7 @@ struct mlx4_ib_cq { struct mlx4_cq mcq; struct mlx4_ib_cq_buf buf; struct mlx4_ib_cq_resize *resize_buf; - struct mlx4_ib_db db; + struct mlx4_db db; spinlock_t lock; struct mutex resize_mutex; struct ib_umem *umem; @@ -127,7 +109,7 @@ struct mlx4_ib_qp { struct mlx4_qp mqp; struct mlx4_buf buf; - struct mlx4_ib_db db; + struct mlx4_db db; struct mlx4_ib_wq rq; u32 doorbell_qpn; @@ -154,7 +136,7 @@ struct mlx4_ib_srq { struct ib_srq ibsrq; struct mlx4_srq msrq; struct mlx4_buf buf; - struct mlx4_ib_db db; + struct mlx4_db db; u64 *wrid; spinlock_t lock; int head; @@ -175,9 +157,6 @@ struct mlx4_ib_dev { struct mlx4_dev *dev; void __iomem *uar_map; - struct list_head pgdir_list; - struct mutex pgdir_mutex; - struct mlx4_uar priv_uar; u32 priv_pdn; MLX4_DECLARE_DOORBELL_LOCK(uar_lock); @@ -248,11 +227,9 @@ static inline struct mlx4_ib_ah *to_mah(struct ib_ah *ibah) return container_of(ibah, struct mlx4_ib_ah, ibah); } -int mlx4_ib_db_alloc(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db, int order); -void mlx4_ib_db_free(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db); int mlx4_ib_db_map_user(struct mlx4_ib_ucontext *context, unsigned long virt, - struct mlx4_ib_db *db); -void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_ib_db *db); + struct mlx4_db *db); +void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_db *db); struct ib_mr *mlx4_ib_get_dma_mr(struct ib_pd *pd, int acc); int mlx4_ib_umem_write_mtt(struct mlx4_ib_dev *dev, struct mlx4_mtt *mtt, diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index b75efae..80ea8b9 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -514,7 +514,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, goto err; if (!init_attr->srq) { - err = mlx4_ib_db_alloc(dev, &qp->db, 0); + err = mlx4_db_alloc(dev->dev, &qp->db, 0); if (err) goto err; @@ -580,7 +580,7 @@ err_buf: err_db: if (!pd->uobject && !init_attr->srq) - mlx4_ib_db_free(dev, &qp->db); + mlx4_db_free(dev->dev, &qp->db); err: return err; @@ -666,7 +666,7 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp, kfree(qp->rq.wrid); mlx4_buf_free(dev->dev, qp->buf_size, &qp->buf); if (!qp->ibqp.srq) - mlx4_ib_db_free(dev, &qp->db); + mlx4_db_free(dev->dev, &qp->db); } } diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c index beaa3b0..2046197 100644 --- a/drivers/infiniband/hw/mlx4/srq.c +++ b/drivers/infiniband/hw/mlx4/srq.c @@ -129,7 +129,7 @@ struct ib_srq *mlx4_ib_create_srq(struct ib_pd *pd, if (err) goto err_mtt; } else { - err = mlx4_ib_db_alloc(dev, &srq->db, 0); + err = mlx4_db_alloc(dev->dev, &srq->db, 0); if (err) goto err_srq; @@ -200,7 +200,7 @@ err_buf: err_db: if (!pd->uobject) - mlx4_ib_db_free(dev, &srq->db); + mlx4_db_free(dev->dev, &srq->db); err_srq: kfree(srq); @@ -267,7 +267,7 @@ int mlx4_ib_destroy_srq(struct ib_srq *srq) kfree(msrq->wrid); mlx4_buf_free(dev->dev, msrq->msrq.max << msrq->msrq.wqe_shift, &msrq->buf); - mlx4_ib_db_free(dev, &msrq->db); + mlx4_db_free(dev->dev, &msrq->db); } kfree(msrq); diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c index 75ef9d0..43c6d04 100644 --- a/drivers/net/mlx4/alloc.c +++ b/drivers/net/mlx4/alloc.c @@ -196,3 +196,114 @@ void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf) } } EXPORT_SYMBOL_GPL(mlx4_buf_free); + +static struct mlx4_db_pgdir *mlx4_alloc_db_pgdir(struct device *dma_device) +{ + struct mlx4_db_pgdir *pgdir; + + pgdir = kzalloc(sizeof *pgdir, GFP_KERNEL); + if (!pgdir) + return NULL; + + bitmap_fill(pgdir->order1, MLX4_DB_PER_PAGE / 2); + pgdir->bits[0] = pgdir->order0; + pgdir->bits[1] = pgdir->order1; + pgdir->db_page = dma_alloc_coherent(dma_device, PAGE_SIZE, + &pgdir->db_dma, GFP_KERNEL); + if (!pgdir->db_page) { + kfree(pgdir); + return NULL; + } + + return pgdir; +} + +static int mlx4_alloc_db_from_pgdir(struct mlx4_db_pgdir *pgdir, + struct mlx4_db *db, int order) +{ + int o; + int i; + + for (o = order; o <= 1; ++o) { + i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o); + if (i < MLX4_DB_PER_PAGE >> o) + goto found; + } + + return -ENOMEM; + +found: + clear_bit(i, pgdir->bits[o]); + + i <<= o; + + if (o > order) + set_bit(i ^ 1, pgdir->bits[order]); + + db->u.pgdir = pgdir; + db->index = i; + db->db = pgdir->db_page + db->index; + db->dma = pgdir->db_dma + db->index * 4; + db->order = order; + + return 0; +} + +int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + struct mlx4_db_pgdir *pgdir; + int ret = 0; + + mutex_lock(&priv->pgdir_mutex); + + list_for_each_entry(pgdir, &priv->pgdir_list, list) + if (!mlx4_alloc_db_from_pgdir(pgdir, db, order)) + goto out; + + pgdir = mlx4_alloc_db_pgdir(&(dev->pdev->dev)); + if (!pgdir) { + ret = -ENOMEM; + goto out; + } + + list_add(&pgdir->list, &priv->pgdir_list); + + /* This should never fail -- we just allocated an empty page: */ + WARN_ON(mlx4_alloc_db_from_pgdir(pgdir, db, order)); + +out: + mutex_unlock(&priv->pgdir_mutex); + + return ret; +} +EXPORT_SYMBOL_GPL(mlx4_db_alloc); + +void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + int o; + int i; + + mutex_lock(&priv->pgdir_mutex); + + o = db->order; + i = db->index; + + if (db->order == 0 && test_bit(i ^ 1, db->u.pgdir->order0)) { + clear_bit(i ^ 1, db->u.pgdir->order0); + ++o; + } + i >>= o; + set_bit(i, db->u.pgdir->bits[o]); + + if (bitmap_full(db->u.pgdir->order1, MLX4_DB_PER_PAGE / 2)) { + dma_free_coherent(&(dev->pdev->dev), PAGE_SIZE, + db->u.pgdir->db_page, db->u.pgdir->db_dma); + list_del(&db->u.pgdir->list); + kfree(db->u.pgdir); + } + + mutex_unlock(&priv->pgdir_mutex); +} +EXPORT_SYMBOL_GPL(mlx4_db_free); diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 49a4aca..a6aa49f 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -798,6 +798,9 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id) INIT_LIST_HEAD(&priv->ctx_list); spin_lock_init(&priv->ctx_lock); + INIT_LIST_HEAD(&priv->pgdir_list); + mutex_init(&priv->pgdir_mutex); + /* * Now reset the HCA before we touch the PCI capabilities or * attempt a firmware command, since a boot ROM may have left diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index 7333681..a4023c2 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -257,6 +257,9 @@ struct mlx4_priv { struct list_head ctx_list; spinlock_t ctx_lock; + struct list_head pgdir_list; + struct mutex pgdir_mutex; + struct mlx4_fw fw; struct mlx4_cmd cmd; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index ff7df1a..9c87dd3 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -37,6 +37,8 @@ #include #include +#include + #include enum { @@ -208,6 +210,37 @@ struct mlx4_mtt { int page_shift; }; +enum { + MLX4_DB_PER_PAGE = PAGE_SIZE / 4 +}; + +struct mlx4_db_pgdir { + struct list_head list; + DECLARE_BITMAP(order0, MLX4_DB_PER_PAGE); + DECLARE_BITMAP(order1, MLX4_DB_PER_PAGE / 2); + unsigned long *bits[2]; + __be32 *db_page; + dma_addr_t db_dma; +}; + +struct mlx4_user_db_page { + struct list_head list; + struct ib_umem *umem; + unsigned long user_virt; + int refcnt; +}; + +struct mlx4_db { + __be32 *db; + union { + struct mlx4_db_pgdir *pgdir; + struct mlx4_user_db_page *user_page; + } u; + dma_addr_t dma; + int index; + int order; +}; + struct mlx4_mr { struct mlx4_mtt mtt; u64 iova; @@ -341,6 +374,9 @@ int mlx4_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt, int mlx4_buf_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt, struct mlx4_buf *buf); +int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order); +void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db); + int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq); void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq); -- 1.5.4 From yevgenyp at mellanox.co.il Wed Apr 23 07:53:51 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 23 Apr 2008 17:53:51 +0300 Subject: [ofa-general][PATCH 2/12 v1] mlx4: HW queues resource management Message-ID: <480F4D7F.8000707@mellanox.co.il> >From 3b15a6bba9cb79805198f64985433a33a3a096dc Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Mon, 21 Apr 2008 11:06:41 +0300 Subject: [PATCH] mlx4_core: HW queues resource management Added HW queues management API. Wraps buffer and doorbell allocation and mtt write. Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/alloc.c | 44 +++++++++++++++++++++++++++++++++++++++++++ include/linux/mlx4/device.h | 11 ++++++++++ 2 files changed, 55 insertions(+), 0 deletions(-) diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c index 43c6d04..f36d79e 100644 --- a/drivers/net/mlx4/alloc.c +++ b/drivers/net/mlx4/alloc.c @@ -307,3 +307,47 @@ void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db) mutex_unlock(&priv->pgdir_mutex); } EXPORT_SYMBOL_GPL(mlx4_db_free); + +int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres, + int size, int max_direct) +{ + int err; + + err = mlx4_db_alloc(dev, &wqres->db, 1); + if (err) + return err; + *wqres->db.db = 0; + + if (mlx4_buf_alloc(dev, size, max_direct, &wqres->buf)) { + err = -ENOMEM; + goto err_db; + } + + err = mlx4_mtt_init(dev, wqres->buf.npages, wqres->buf.page_shift, + &wqres->mtt); + if (err) + goto err_buf; + err = mlx4_buf_write_mtt(dev, &wqres->mtt, &wqres->buf); + if (err) + goto err_mtt; + + return 0; + +err_mtt: + mlx4_mtt_cleanup(dev, &wqres->mtt); +err_buf: + mlx4_buf_free(dev, size, &wqres->buf); +err_db: + mlx4_db_free(dev, &wqres->db); + return err; +} +EXPORT_SYMBOL_GPL(mlx4_alloc_hwq_res); + +void mlx4_free_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres, + int size) +{ + mlx4_mtt_cleanup(dev, &wqres->mtt); + mlx4_buf_free(dev, size, &wqres->buf); + mlx4_db_free(dev, &wqres->db); +} +EXPORT_SYMBOL_GPL(mlx4_free_hwq_res); diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index d5fb774..0505732 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -241,6 +241,12 @@ struct mlx4_db { int order; }; +struct mlx4_hwq_resources { + struct mlx4_db db; + struct mlx4_mtt mtt; + struct mlx4_buf buf; +}; + struct mlx4_mr { struct mlx4_mtt mtt; u64 iova; @@ -377,6 +383,11 @@ int mlx4_buf_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt, int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order); void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db); +int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres, + int size, int max_direct); +void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres, + int size); + int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq); void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq); -- 1.5.4 From yevgenyp at mellanox.co.il Wed Apr 23 07:54:51 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 23 Apr 2008 17:54:51 +0300 Subject: [ofa-general][PATCH 3/12 v1] Message-ID: <480F4DBB.20403@mellanox.co.il> >From 3978a59af72fddb9b98156a7ecf9018b8bf5b076 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Mon, 21 Apr 2008 13:26:14 +0300 Subject: [PATCH] mlx4: Qp range reservation Prior to allocating a qp, one need to reserve an aligned range of qps. The change is made to enable allocation of consecutive qps. Signed-off-by: Yevgeny Petrilin --- drivers/infiniband/hw/mlx4/qp.c | 9 +++++ drivers/net/mlx4/alloc.c | 77 ++++++++++++++++++++++++++++++++++++++- drivers/net/mlx4/mlx4.h | 2 + drivers/net/mlx4/qp.c | 44 ++++++++++++++++------- include/linux/mlx4/device.h | 5 ++- 5 files changed, 122 insertions(+), 15 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 80ea8b9..88aae1b 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -544,6 +544,11 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, } } + if (!sqpn) + err = mlx4_qp_reserve_range(dev->dev, 1, 1, &sqpn); + if (err) + goto err_wrid; + err = mlx4_qp_alloc(dev->dev, sqpn, &qp->mqp); if (err) goto err_wrid; @@ -654,6 +659,10 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp, mlx4_ib_unlock_cqs(send_cq, recv_cq); mlx4_qp_free(dev->dev, &qp->mqp); + + if (!is_sqp(dev, qp)) + mlx4_qp_release_range(dev->dev, qp->mqp.qpn, 1); + mlx4_mtt_cleanup(dev->dev, &qp->mtt); if (is_user) { diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c index f36d79e..4601506 100644 --- a/drivers/net/mlx4/alloc.c +++ b/drivers/net/mlx4/alloc.c @@ -73,7 +73,82 @@ void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj) spin_unlock(&bitmap->lock); } -int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved) +static unsigned long find_aligned_range(unsigned long *bitmap, + u32 start, u32 nbits, + int len, int align) +{ + unsigned long end, i; + +again: + start = ALIGN(start, align); + while ((start < nbits) && test_bit(start, bitmap)) + start += align; + if (start >= nbits) + return -1; + + end = start+len; + if (end > nbits) + return -1; + for (i = start+1; i < end; i++) { + if (test_bit(i, bitmap)) { + start = i+1; + goto again; + } + } + return start; +} + +u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align) +{ + u32 obj, i; + + if (likely(cnt == 1 && align == 1)) + return mlx4_bitmap_alloc(bitmap); + + spin_lock(&bitmap->lock); + + obj = find_aligned_range(bitmap->table, bitmap->last, + bitmap->max, cnt, align); + if (obj >= bitmap->max) { + bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask; + obj = find_aligned_range(bitmap->table, 0, + bitmap->max, + cnt, align); + } + + if (obj < bitmap->max) { + for (i = 0; i < cnt; i++) + set_bit(obj+i, bitmap->table); + if (obj == bitmap->last) { + bitmap->last = (obj + cnt); + if (bitmap->last >= bitmap->max) + bitmap->last = 0; + } + obj |= bitmap->top; + } else + obj = -1; + + spin_unlock(&bitmap->lock); + + return obj; +} + +void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt) +{ + u32 i; + + obj &= bitmap->max - 1; + + spin_lock(&bitmap->lock); + for (i = 0; i < cnt; i++) + clear_bit(obj+i, bitmap->table); + bitmap->last = min(bitmap->last, obj); + bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask; + spin_unlock(&bitmap->lock); +} + +int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, + u32 num, u32 mask, u32 reserved) { int i; diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index a4023c2..89d4ccc 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -287,6 +287,8 @@ static inline struct mlx4_priv *mlx4_priv(struct mlx4_dev *dev) u32 mlx4_bitmap_alloc(struct mlx4_bitmap *bitmap); void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj); +u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align); +void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt); int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved); void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap); diff --git a/drivers/net/mlx4/qp.c b/drivers/net/mlx4/qp.c index fa24e65..dff8e66 100644 --- a/drivers/net/mlx4/qp.c +++ b/drivers/net/mlx4/qp.c @@ -147,19 +147,42 @@ int mlx4_qp_modify(struct mlx4_dev *dev, struct mlx4_mtt *mtt, } EXPORT_SYMBOL_GPL(mlx4_qp_modify); -int mlx4_qp_alloc(struct mlx4_dev *dev, int sqpn, struct mlx4_qp *qp) +int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + struct mlx4_qp_table *qp_table = &priv->qp_table; + int qpn; + + qpn = mlx4_bitmap_alloc_range(&qp_table->bitmap, cnt, align); + if (qpn == -1) + return -ENOMEM; + + *base = qpn; + return 0; +} +EXPORT_SYMBOL_GPL(mlx4_qp_reserve_range); + +void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + struct mlx4_qp_table *qp_table = &priv->qp_table; + if (base_qpn < dev->caps.sqp_start + 8) + return; + + mlx4_bitmap_free_range(&qp_table->bitmap, base_qpn, cnt); +} +EXPORT_SYMBOL_GPL(mlx4_qp_release_range); + +int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp) { struct mlx4_priv *priv = mlx4_priv(dev); struct mlx4_qp_table *qp_table = &priv->qp_table; int err; - if (sqpn) - qp->qpn = sqpn; - else { - qp->qpn = mlx4_bitmap_alloc(&qp_table->bitmap); - if (qp->qpn == -1) - return -ENOMEM; - } + if (!qpn) + return -EINVAL; + + qp->qpn = qpn; err = mlx4_table_get(dev, &qp_table->qp_table, qp->qpn); if (err) @@ -208,9 +231,6 @@ err_put_qp: mlx4_table_put(dev, &qp_table->qp_table, qp->qpn); err_out: - if (!sqpn) - mlx4_bitmap_free(&qp_table->bitmap, qp->qpn); - return err; } EXPORT_SYMBOL_GPL(mlx4_qp_alloc); @@ -240,8 +260,6 @@ void mlx4_qp_free(struct mlx4_dev *dev, struct mlx4_qp *qp) mlx4_table_put(dev, &qp_table->auxc_table, qp->qpn); mlx4_table_put(dev, &qp_table->qp_table, qp->qpn); - if (qp->qpn >= dev->caps.sqp_start + 8) - mlx4_bitmap_free(&qp_table->bitmap, qp->qpn); } EXPORT_SYMBOL_GPL(mlx4_qp_free); diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 0505732..9c77bf3 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -392,7 +392,10 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq); void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq); -int mlx4_qp_alloc(struct mlx4_dev *dev, int sqpn, struct mlx4_qp *qp); +int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base); +void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt); + +int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp); void mlx4_qp_free(struct mlx4_dev *dev, struct mlx4_qp *qp); int mlx4_srq_alloc(struct mlx4_dev *dev, u32 pdn, struct mlx4_mtt *mtt, -- 1.5.4 From yevgenyp at mellanox.co.il Wed Apr 23 07:56:16 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 23 Apr 2008 17:56:16 +0300 Subject: [ofa-general][PATCH 3/12 v1] mlx4: Qp range reservation Message-ID: <480F4E10.9080203@mellanox.co.il> >From 3978a59af72fddb9b98156a7ecf9018b8bf5b076 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Mon, 21 Apr 2008 13:26:14 +0300 Subject: [PATCH] mlx4: Qp range reservation Prior to allocating a qp, one need to reserve an aligned range of qps. The change is made to enable allocation of consecutive qps. Signed-off-by: Yevgeny Petrilin --- drivers/infiniband/hw/mlx4/qp.c | 9 +++++ drivers/net/mlx4/alloc.c | 77 ++++++++++++++++++++++++++++++++++++++- drivers/net/mlx4/mlx4.h | 2 + drivers/net/mlx4/qp.c | 44 ++++++++++++++++------- include/linux/mlx4/device.h | 5 ++- 5 files changed, 122 insertions(+), 15 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 80ea8b9..88aae1b 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -544,6 +544,11 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, } } + if (!sqpn) + err = mlx4_qp_reserve_range(dev->dev, 1, 1, &sqpn); + if (err) + goto err_wrid; + err = mlx4_qp_alloc(dev->dev, sqpn, &qp->mqp); if (err) goto err_wrid; @@ -654,6 +659,10 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp, mlx4_ib_unlock_cqs(send_cq, recv_cq); mlx4_qp_free(dev->dev, &qp->mqp); + + if (!is_sqp(dev, qp)) + mlx4_qp_release_range(dev->dev, qp->mqp.qpn, 1); + mlx4_mtt_cleanup(dev->dev, &qp->mtt); if (is_user) { diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c index f36d79e..4601506 100644 --- a/drivers/net/mlx4/alloc.c +++ b/drivers/net/mlx4/alloc.c @@ -73,7 +73,82 @@ void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj) spin_unlock(&bitmap->lock); } -int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved) +static unsigned long find_aligned_range(unsigned long *bitmap, + u32 start, u32 nbits, + int len, int align) +{ + unsigned long end, i; + +again: + start = ALIGN(start, align); + while ((start < nbits) && test_bit(start, bitmap)) + start += align; + if (start >= nbits) + return -1; + + end = start+len; + if (end > nbits) + return -1; + for (i = start+1; i < end; i++) { + if (test_bit(i, bitmap)) { + start = i+1; + goto again; + } + } + return start; +} + +u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align) +{ + u32 obj, i; + + if (likely(cnt == 1 && align == 1)) + return mlx4_bitmap_alloc(bitmap); + + spin_lock(&bitmap->lock); + + obj = find_aligned_range(bitmap->table, bitmap->last, + bitmap->max, cnt, align); + if (obj >= bitmap->max) { + bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask; + obj = find_aligned_range(bitmap->table, 0, + bitmap->max, + cnt, align); + } + + if (obj < bitmap->max) { + for (i = 0; i < cnt; i++) + set_bit(obj+i, bitmap->table); + if (obj == bitmap->last) { + bitmap->last = (obj + cnt); + if (bitmap->last >= bitmap->max) + bitmap->last = 0; + } + obj |= bitmap->top; + } else + obj = -1; + + spin_unlock(&bitmap->lock); + + return obj; +} + +void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt) +{ + u32 i; + + obj &= bitmap->max - 1; + + spin_lock(&bitmap->lock); + for (i = 0; i < cnt; i++) + clear_bit(obj+i, bitmap->table); + bitmap->last = min(bitmap->last, obj); + bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask; + spin_unlock(&bitmap->lock); +} + +int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, + u32 num, u32 mask, u32 reserved) { int i; diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index a4023c2..89d4ccc 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -287,6 +287,8 @@ static inline struct mlx4_priv *mlx4_priv(struct mlx4_dev *dev) u32 mlx4_bitmap_alloc(struct mlx4_bitmap *bitmap); void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj); +u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align); +void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt); int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved); void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap); diff --git a/drivers/net/mlx4/qp.c b/drivers/net/mlx4/qp.c index fa24e65..dff8e66 100644 --- a/drivers/net/mlx4/qp.c +++ b/drivers/net/mlx4/qp.c @@ -147,19 +147,42 @@ int mlx4_qp_modify(struct mlx4_dev *dev, struct mlx4_mtt *mtt, } EXPORT_SYMBOL_GPL(mlx4_qp_modify); -int mlx4_qp_alloc(struct mlx4_dev *dev, int sqpn, struct mlx4_qp *qp) +int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + struct mlx4_qp_table *qp_table = &priv->qp_table; + int qpn; + + qpn = mlx4_bitmap_alloc_range(&qp_table->bitmap, cnt, align); + if (qpn == -1) + return -ENOMEM; + + *base = qpn; + return 0; +} +EXPORT_SYMBOL_GPL(mlx4_qp_reserve_range); + +void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + struct mlx4_qp_table *qp_table = &priv->qp_table; + if (base_qpn < dev->caps.sqp_start + 8) + return; + + mlx4_bitmap_free_range(&qp_table->bitmap, base_qpn, cnt); +} +EXPORT_SYMBOL_GPL(mlx4_qp_release_range); + +int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp) { struct mlx4_priv *priv = mlx4_priv(dev); struct mlx4_qp_table *qp_table = &priv->qp_table; int err; - if (sqpn) - qp->qpn = sqpn; - else { - qp->qpn = mlx4_bitmap_alloc(&qp_table->bitmap); - if (qp->qpn == -1) - return -ENOMEM; - } + if (!qpn) + return -EINVAL; + + qp->qpn = qpn; err = mlx4_table_get(dev, &qp_table->qp_table, qp->qpn); if (err) @@ -208,9 +231,6 @@ err_put_qp: mlx4_table_put(dev, &qp_table->qp_table, qp->qpn); err_out: - if (!sqpn) - mlx4_bitmap_free(&qp_table->bitmap, qp->qpn); - return err; } EXPORT_SYMBOL_GPL(mlx4_qp_alloc); @@ -240,8 +260,6 @@ void mlx4_qp_free(struct mlx4_dev *dev, struct mlx4_qp *qp) mlx4_table_put(dev, &qp_table->auxc_table, qp->qpn); mlx4_table_put(dev, &qp_table->qp_table, qp->qpn); - if (qp->qpn >= dev->caps.sqp_start + 8) - mlx4_bitmap_free(&qp_table->bitmap, qp->qpn); } EXPORT_SYMBOL_GPL(mlx4_qp_free); diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 0505732..9c77bf3 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -392,7 +392,10 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq); void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq); -int mlx4_qp_alloc(struct mlx4_dev *dev, int sqpn, struct mlx4_qp *qp); +int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base); +void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt); + +int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp); void mlx4_qp_free(struct mlx4_dev *dev, struct mlx4_qp *qp); int mlx4_srq_alloc(struct mlx4_dev *dev, u32 pdn, struct mlx4_mtt *mtt, -- 1.5.4 From yevgenyp at mellanox.co.il Wed Apr 23 07:58:32 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 23 Apr 2008 17:58:32 +0300 Subject: [ofa-general][PATCH 4/12 v2] mlx4: Pre reserved Qp regions Message-ID: <480F4E98.7010803@mellanox.co.il> >From 2dd4f8abdedda736adca5818c98f7a67d339ba7e Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Mon, 21 Apr 2008 14:39:27 +0300 Subject: [PATCH] mlx4: Pre reserved Qp regions. We reserve Qp ranges to be used by other modules in case the ports come up as Ethernet ports. The qps are reserved at the end of the QP table. (This way we assure that they are aligned to their size) We need to consider these reserved ranges in bitmap creation : The effective max parameter. Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/alloc.c | 38 ++++++++++++++++-------- drivers/net/mlx4/fw.c | 5 +++ drivers/net/mlx4/fw.h | 2 + drivers/net/mlx4/main.c | 65 +++++++++++++++++++++++++++++++++++++++---- drivers/net/mlx4/mlx4.h | 4 ++ drivers/net/mlx4/qp.c | 55 ++++++++++++++++++++++++++++++++++-- include/linux/mlx4/device.h | 19 ++++++++++++- include/linux/mlx4/qp.h | 4 ++ 8 files changed, 169 insertions(+), 23 deletions(-) diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c index 4601506..4b6074d 100644 --- a/drivers/net/mlx4/alloc.c +++ b/drivers/net/mlx4/alloc.c @@ -44,15 +44,18 @@ u32 mlx4_bitmap_alloc(struct mlx4_bitmap *bitmap) spin_lock(&bitmap->lock); - obj = find_next_zero_bit(bitmap->table, bitmap->max, bitmap->last); - if (obj >= bitmap->max) { + obj = find_next_zero_bit(bitmap->table, bitmap->effective_max, + bitmap->last); + if (obj >= bitmap->effective_max) { bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask; - obj = find_first_zero_bit(bitmap->table, bitmap->max); + obj = find_first_zero_bit(bitmap->table, bitmap->effective_max); } - if (obj < bitmap->max) { + if (obj < bitmap->effective_max) { set_bit(obj, bitmap->table); - bitmap->last = (obj + 1) & (bitmap->max - 1); + bitmap->last = (obj + 1); + if (bitmap->last == bitmap->effective_max) + bitmap->last = 0; obj |= bitmap->top; } else obj = -1; @@ -108,20 +111,20 @@ u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align) spin_lock(&bitmap->lock); obj = find_aligned_range(bitmap->table, bitmap->last, - bitmap->max, cnt, align); - if (obj >= bitmap->max) { + bitmap->effective_max, cnt, align); + if (obj >= bitmap->effective_max) { bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask; obj = find_aligned_range(bitmap->table, 0, - bitmap->max, + bitmap->effective_max, cnt, align); } - if (obj < bitmap->max) { + if (obj < bitmap->effective_max) { for (i = 0; i < cnt; i++) set_bit(obj+i, bitmap->table); if (obj == bitmap->last) { bitmap->last = (obj + cnt); - if (bitmap->last >= bitmap->max) + if (bitmap->last >= bitmap->effective_max) bitmap->last = 0; } obj |= bitmap->top; @@ -147,8 +150,9 @@ void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt) spin_unlock(&bitmap->lock); } -int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, - u32 num, u32 mask, u32 reserved) +int mlx4_bitmap_init_with_effective_max(struct mlx4_bitmap *bitmap, + u32 num, u32 mask, u32 reserved, + u32 effective_max) { int i; @@ -160,6 +164,7 @@ int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, bitmap->top = 0; bitmap->max = num; bitmap->mask = mask; + bitmap->effective_max = effective_max; spin_lock_init(&bitmap->lock); bitmap->table = kzalloc(BITS_TO_LONGS(num) * sizeof (long), GFP_KERNEL); if (!bitmap->table) @@ -171,6 +176,13 @@ int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, return 0; } +int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, + u32 num, u32 mask, u32 reserved) +{ + return mlx4_bitmap_init_with_effective_max(bitmap, num, mask, + reserved, num); +} + void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap) { kfree(bitmap->table); diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c index d82f275..b0ad0d1 100644 --- a/drivers/net/mlx4/fw.c +++ b/drivers/net/mlx4/fw.c @@ -325,6 +325,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) #define QUERY_PORT_MTU_OFFSET 0x01 #define QUERY_PORT_WIDTH_OFFSET 0x06 #define QUERY_PORT_MAX_GID_PKEY_OFFSET 0x07 +#define QUERY_PORT_MAX_MACVLAN_OFFSET 0x0a #define QUERY_PORT_MAX_VL_OFFSET 0x0b for (i = 1; i <= dev_cap->num_ports; ++i) { @@ -342,6 +343,10 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev_cap->max_pkeys[i] = 1 << (field & 0xf); MLX4_GET(field, outbox, QUERY_PORT_MAX_VL_OFFSET); dev_cap->max_vl[i] = field & 0xf; + MLX4_GET(field, outbox, QUERY_PORT_MAX_MACVLAN_OFFSET); + dev_cap->log_max_macs[i] = field & 0xf; + dev_cap->log_max_vlans[i] = field >> 4; + } } diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h index 306cb9b..a2e827c 100644 --- a/drivers/net/mlx4/fw.h +++ b/drivers/net/mlx4/fw.h @@ -97,6 +97,8 @@ struct mlx4_dev_cap { u32 reserved_lkey; u64 max_icm_sz; int max_gso_sz; + u8 log_max_macs[MLX4_MAX_PORTS + 1]; + u8 log_max_vlans[MLX4_MAX_PORTS + 1]; }; struct mlx4_adapter { diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index a6aa49f..f309532 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -85,6 +85,22 @@ static struct mlx4_profile default_profile = { .num_mtt = 1 << 20, }; +static int num_mac = 1; +module_param_named(num_mac, num_mac, int, 0444); +MODULE_PARM_DESC(num_mac, "Maximum number of MACs per ETH port " + "(1-127, default 1)"); + +static int num_vlan; +module_param_named(num_vlan, num_vlan, int, 0444); +MODULE_PARM_DESC(num_vlan, "Maximum number of VLANs per ETH port " + "(0-126, default 0)"); + +static int use_prio; +module_param_named(use_prio, use_prio, bool, 0444); +MODULE_PARM_DESC(use_prio, "Enable steering by VLAN priority on ETH ports " + "(0/1, default 0)"); + + static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) { int err; @@ -134,7 +150,6 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev->caps.max_rq_sg = dev_cap->max_rq_sg; dev->caps.max_wqes = dev_cap->max_qp_sz; dev->caps.max_qp_init_rdma = dev_cap->max_requester_per_qp; - dev->caps.reserved_qps = dev_cap->reserved_qps; dev->caps.max_srq_wqes = dev_cap->max_srq_sz; dev->caps.max_srq_sge = dev_cap->max_rq_sg - 1; dev->caps.reserved_srqs = dev_cap->reserved_srqs; @@ -161,6 +176,39 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev->caps.stat_rate_support = dev_cap->stat_rate_support; dev->caps.max_gso_sz = dev_cap->max_gso_sz; + dev->caps.log_num_macs = ilog2(roundup_pow_of_two(num_mac + 1)); + dev->caps.log_num_vlans = ilog2(roundup_pow_of_two(num_vlan + 2)); + dev->caps.log_num_prios = use_prio ? 3: 0; + + for (i = 1; i <= dev->caps.num_ports; ++i) { + if (dev->caps.log_num_macs > dev_cap->log_max_macs[i]) { + dev->caps.log_num_macs = dev_cap->log_max_macs[i]; + mlx4_warn(dev, "Requested number of MACs is too much " + "for port %d, reducing to %d.\n", + i, 1 << dev->caps.log_num_macs); + } + if (dev->caps.log_num_vlans > dev_cap->log_max_vlans[i]) { + dev->caps.log_num_vlans = dev_cap->log_max_vlans[i]; + mlx4_warn(dev, "Requested number of VLANs is too much " + "for port %d, reducing to %d.\n", + i, 1 << dev->caps.log_num_vlans); + } + } + + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW] = dev_cap->reserved_qps; + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_ETH_ADDR] = + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_ADDR] = + (1 << dev->caps.log_num_macs)* + (1 << dev->caps.log_num_vlans)* + (1 << dev->caps.log_num_prios)* + dev->caps.num_ports; + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH] = MLX4_NUM_FEXCH; + + dev->caps.reserved_qps = dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW] + + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_ETH_ADDR] + + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH] + + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH]; + return 0; } @@ -209,7 +257,8 @@ static int mlx4_init_cmpt_table(struct mlx4_dev *dev, u64 cmpt_base, ((u64) (MLX4_CMPT_TYPE_QP * cmpt_entry_sz) << MLX4_CMPT_SHIFT), cmpt_entry_sz, dev->caps.num_qps, - dev->caps.reserved_qps, 0, 0); + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], + 0, 0); if (err) goto err; @@ -334,7 +383,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap, init_hca->qpc_base, dev_cap->qpc_entry_sz, dev->caps.num_qps, - dev->caps.reserved_qps, 0, 0); + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], + 0, 0); if (err) { mlx4_err(dev, "Failed to map QP context memory, aborting.\n"); goto err_unmap_dmpt; @@ -344,7 +394,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap, init_hca->auxc_base, dev_cap->aux_entry_sz, dev->caps.num_qps, - dev->caps.reserved_qps, 0, 0); + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], + 0, 0); if (err) { mlx4_err(dev, "Failed to map AUXC context memory, aborting.\n"); goto err_unmap_qp; @@ -354,7 +405,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap, init_hca->altc_base, dev_cap->altc_entry_sz, dev->caps.num_qps, - dev->caps.reserved_qps, 0, 0); + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], + 0, 0); if (err) { mlx4_err(dev, "Failed to map ALTC context memory, aborting.\n"); goto err_unmap_auxc; @@ -364,7 +416,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap, init_hca->rdmarc_base, dev_cap->rdmarc_entry_sz << priv->qp_table.rdmarc_shift, dev->caps.num_qps, - dev->caps.reserved_qps, 0, 0); + dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], + 0, 0); if (err) { mlx4_err(dev, "Failed to map RDMARC context memory, aborting\n"); goto err_unmap_altc; diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index 89d4ccc..b74405a 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -111,6 +111,7 @@ struct mlx4_bitmap { u32 last; u32 top; u32 max; + u32 effective_max; u32 mask; spinlock_t lock; unsigned long *table; @@ -290,6 +291,9 @@ void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj); u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align); void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt); int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved); +int mlx4_bitmap_init_with_effective_max(struct mlx4_bitmap *bitmap, + u32 num, u32 mask, u32 reserved, + u32 effective_max); void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap); int mlx4_reset(struct mlx4_dev *dev); diff --git a/drivers/net/mlx4/qp.c b/drivers/net/mlx4/qp.c index dff8e66..2d5be15 100644 --- a/drivers/net/mlx4/qp.c +++ b/drivers/net/mlx4/qp.c @@ -273,6 +273,7 @@ int mlx4_init_qp_table(struct mlx4_dev *dev) { struct mlx4_qp_table *qp_table = &mlx4_priv(dev)->qp_table; int err; + int reserved_from_top = 0; spin_lock_init(&qp_table->lock); INIT_RADIX_TREE(&dev->qp_table_tree, GFP_ATOMIC); @@ -282,9 +283,43 @@ int mlx4_init_qp_table(struct mlx4_dev *dev) * block of special QPs must be aligned to a multiple of 8, so * round up. */ - dev->caps.sqp_start = ALIGN(dev->caps.reserved_qps, 8); - err = mlx4_bitmap_init(&qp_table->bitmap, dev->caps.num_qps, - (1 << 24) - 1, dev->caps.sqp_start + 8); + dev->caps.sqp_start = + ALIGN(dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], 8); + + { + int sort[MLX4_QP_REGION_COUNT]; + int i, j, tmp; + int last_base = dev->caps.num_qps; + + for (i = 1; i < MLX4_QP_REGION_COUNT; ++i) + sort[i] = i; + + for (i = MLX4_QP_REGION_COUNT; i > 0; --i) { + for (j = 2; j < i; ++j) { + if (dev->caps.reserved_qps_cnt[sort[j]] > + dev->caps.reserved_qps_cnt[sort[j - 1]]) { + tmp = sort[j]; + sort[j] = sort[j - 1]; + sort[j - 1] = tmp; + } + } + } + + for (i = 1; i < MLX4_QP_REGION_COUNT; ++i) { + last_base -= dev->caps.reserved_qps_cnt[sort[i]]; + dev->caps.reserved_qps_base[sort[i]] = last_base; + reserved_from_top += + dev->caps.reserved_qps_cnt[sort[i]]; + } + + } + + err = mlx4_bitmap_init_with_effective_max(&qp_table->bitmap, + dev->caps.num_qps, + (1 << 23) - 1, + dev->caps.sqp_start + 8, + dev->caps.num_qps - + reserved_from_top); if (err) return err; @@ -297,6 +332,20 @@ void mlx4_cleanup_qp_table(struct mlx4_dev *dev) mlx4_bitmap_cleanup(&mlx4_priv(dev)->qp_table.bitmap); } +int mlx4_qp_get_region(struct mlx4_dev *dev, + enum qp_region region, + int *base_qpn, int *cnt) +{ + if ((region < 0) || (region >= MLX4_QP_REGION_COUNT)) + return -EINVAL; + + *base_qpn = dev->caps.reserved_qps_base[region]; + *cnt = dev->caps.reserved_qps_cnt[region]; + + return 0; +} +EXPORT_SYMBOL_GPL(mlx4_qp_get_region); + int mlx4_qp_query(struct mlx4_dev *dev, struct mlx4_qp *qp, struct mlx4_qp_context *context) { diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 9c77bf3..955eeca 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -135,6 +135,18 @@ enum { MLX4_STAT_RATE_OFFSET = 5 }; +enum qp_region { + MLX4_QP_REGION_FW = 0, + MLX4_QP_REGION_ETH_ADDR, + MLX4_QP_REGION_FC_ADDR, + MLX4_QP_REGION_FC_EXCH, + MLX4_QP_REGION_COUNT +}; + +enum { + MLX4_NUM_FEXCH = 64 * 1024, +}; + static inline u64 mlx4_fw_ver(u64 major, u64 minor, u64 subminor) { return (major << 32) | (minor << 16) | subminor; @@ -159,7 +171,6 @@ struct mlx4_caps { int max_rq_desc_sz; int max_qp_init_rdma; int max_qp_dest_rdma; - int reserved_qps; int sqp_start; int num_srqs; int max_srq_wqes; @@ -189,6 +200,12 @@ struct mlx4_caps { u16 stat_rate_support; u8 port_width_cap[MLX4_MAX_PORTS + 1]; int max_gso_sz; + int reserved_qps_cnt[MLX4_QP_REGION_COUNT]; + int reserved_qps; + int reserved_qps_base[MLX4_QP_REGION_COUNT]; + int log_num_macs; + int log_num_vlans; + int log_num_prios; }; struct mlx4_buf_list { diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h index a5e43fe..5a02980 100644 --- a/include/linux/mlx4/qp.h +++ b/include/linux/mlx4/qp.h @@ -303,4 +316,8 @@ static inline struct mlx4_qp *__mlx4_qp_lookup(struct mlx4_dev *dev, u32 qpn) void mlx4_qp_remove(struct mlx4_dev *dev, struct mlx4_qp *qp); +int mlx4_qp_get_region(struct mlx4_dev *dev, + enum qp_region region, + int *base_qpn, int *cnt); + #endif /* MLX4_QP_H */ -- 1.5.4 From yevgenyp at mellanox.co.il Wed Apr 23 08:00:14 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 23 Apr 2008 18:00:14 +0300 Subject: [ofa-general][PATCH 5/12 v1] mlx4: Different port type support Message-ID: <480F4EFE.7020807@mellanox.co.il> >From 0d3da6ad682c4655cd909aefe5bc294c55f5f711 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Mon, 21 Apr 2008 17:40:57 +0300 Subject: [PATCH] mlx4: Different port type support Multi protocol supports different port types. The port types are delivered through module parameters, crossed with firmware capabilities. Each consumer of mlx4_core should query for supported port types, mlx4_ib can no longer assume that all physical ports belong to it. Signed-off-by: Yevgeny Petrilin --- drivers/infiniband/hw/mlx4/mad.c | 6 +- drivers/infiniband/hw/mlx4/main.c | 12 ++++- drivers/infiniband/hw/mlx4/mlx4_ib.h | 2 + drivers/net/mlx4/fw.c | 4 ++ drivers/net/mlx4/fw.h | 1 + drivers/net/mlx4/main.c | 84 ++++++++++++++++++++++++++++++++++ include/linux/mlx4/device.h | 32 +++++++++++++ 7 files changed, 136 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c index 4c1e72f..d91ba56 100644 --- a/drivers/infiniband/hw/mlx4/mad.c +++ b/drivers/infiniband/hw/mlx4/mad.c @@ -297,7 +297,7 @@ int mlx4_ib_mad_init(struct mlx4_ib_dev *dev) int p, q; int ret; - for (p = 0; p < dev->dev->caps.num_ports; ++p) + for (p = 0; p < dev->num_ports; ++p) for (q = 0; q <= 1; ++q) { agent = ib_register_mad_agent(&dev->ib_dev, p + 1, q ? IB_QPT_GSI : IB_QPT_SMI, @@ -313,7 +313,7 @@ int mlx4_ib_mad_init(struct mlx4_ib_dev *dev) return 0; err: - for (p = 0; p < dev->dev->caps.num_ports; ++p) + for (p = 0; p < dev->num_ports; ++p) for (q = 0; q <= 1; ++q) if (dev->send_agent[p][q]) ib_unregister_mad_agent(dev->send_agent[p][q]); @@ -326,7 +326,7 @@ void mlx4_ib_mad_cleanup(struct mlx4_ib_dev *dev) struct ib_mad_agent *agent; int p, q; - for (p = 0; p < dev->dev->caps.num_ports; ++p) { + for (p = 0; p < dev->num_ports; ++p) { for (q = 0; q <= 1; ++q) { agent = dev->send_agent[p][q]; dev->send_agent[p][q] = NULL; diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 3c7f938..507dbe3 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -549,11 +549,15 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) MLX4_INIT_DOORBELL_LOCK(&ibdev->uar_lock); ibdev->dev = dev; + ibdev->ports_map = mlx4_get_ports_of_type(dev, MLX4_PORT_TYPE_IB); strlcpy(ibdev->ib_dev.name, "mlx4_%d", IB_DEVICE_NAME_MAX); ibdev->ib_dev.owner = THIS_MODULE; ibdev->ib_dev.node_type = RDMA_NODE_IB_CA; - ibdev->ib_dev.phys_port_cnt = dev->caps.num_ports; + ibdev->num_ports = 0; + mlx4_foreach_port(i, ibdev->ports_map) + ibdev->num_ports++; + ibdev->ib_dev.phys_port_cnt = ibdev->num_ports; ibdev->ib_dev.num_comp_vectors = 1; ibdev->ib_dev.dma_device = &dev->pdev->dev; @@ -667,7 +671,7 @@ static void mlx4_ib_remove(struct mlx4_dev *dev, void *ibdev_ptr) struct mlx4_ib_dev *ibdev = ibdev_ptr; int p; - for (p = 1; p <= dev->caps.num_ports; ++p) + for (p = 1; p <= ibdev->num_ports; ++p) mlx4_CLOSE_PORT(dev, p); mlx4_ib_mad_cleanup(ibdev); @@ -682,6 +686,10 @@ static void mlx4_ib_event(struct mlx4_dev *dev, void *ibdev_ptr, enum mlx4_dev_event event, int port) { struct ib_event ibev; + struct mlx4_ib_dev *ibdev = to_mdev((struct ib_device *) ibdev_ptr); + + if (port > ibdev->num_ports) + return; switch (event) { case MLX4_DEV_EVENT_PORT_UP: diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index 5cf9947..9d4f7a7 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -155,6 +155,8 @@ struct mlx4_ib_ah { struct mlx4_ib_dev { struct ib_device ib_dev; struct mlx4_dev *dev; + u32 ports_map; + int num_ports; void __iomem *uar_map; struct mlx4_uar priv_uar; diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c index b0ad0d1..e875b08 100644 --- a/drivers/net/mlx4/fw.c +++ b/drivers/net/mlx4/fw.c @@ -322,6 +322,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev_cap->max_pkeys[i] = 1 << (field & 0xf); } } else { +#define QUERY_PORT_SUPPORTED_TYPE_OFFSET 0x00 #define QUERY_PORT_MTU_OFFSET 0x01 #define QUERY_PORT_WIDTH_OFFSET 0x06 #define QUERY_PORT_MAX_GID_PKEY_OFFSET 0x07 @@ -334,6 +335,9 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) if (err) goto out; + MLX4_GET(field, outbox, + QUERY_PORT_SUPPORTED_TYPE_OFFSET); + dev_cap->supported_port_types[i] = field & 3; MLX4_GET(field, outbox, QUERY_PORT_MTU_OFFSET); dev_cap->max_mtu[i] = field & 0xf; MLX4_GET(field, outbox, QUERY_PORT_WIDTH_OFFSET); diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h index a2e827c..50a6a7d 100644 --- a/drivers/net/mlx4/fw.h +++ b/drivers/net/mlx4/fw.h @@ -97,6 +97,7 @@ struct mlx4_dev_cap { u32 reserved_lkey; u64 max_icm_sz; int max_gso_sz; + u8 supported_port_types[MLX4_MAX_PORTS + 1]; u8 log_max_macs[MLX4_MAX_PORTS + 1]; u8 log_max_vlans[MLX4_MAX_PORTS + 1]; }; diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index f309532..1651d8e 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -100,11 +100,50 @@ module_param_named(use_prio, use_prio, bool, 0444); MODULE_PARM_DESC(use_prio, "Enable steering by VLAN priority on ETH ports " "(0/1, default 0)"); +static char *port_type_arr[MLX4_MAX_PORTS] = { [0 ... (MLX4_MAX_PORTS-1)] = "ib"}; +module_param_array_named(port_type, port_type_arr, charp, NULL, 0444); +MODULE_PARM_DESC(port_type, "Ports L2 type (ib/eth/auto, entry per port, " + "comma seperated, default ib for all)"); + +static int mlx4_check_port_params(struct mlx4_dev *dev, + enum mlx4_port_type *port_type) +{ + if (port_type[0] != port_type[1] && + !(dev->caps.flags & MLX4_DEV_CAP_FLAG_DPDP)) { + mlx4_err(dev, "Only same port types supported " + "on this HCA, aborting.\n"); + return -EINVAL; + } + if ((port_type[0] == MLX4_PORT_TYPE_ETH) && + (port_type[1] == MLX4_PORT_TYPE_IB)) { + mlx4_err(dev, "eth-ib configuration is not supported.\n"); + return -EINVAL; + } + return 0; +} + +static void mlx4_str2port_type(char **port_str, + enum mlx4_port_type *port_type) +{ + int i; + + for (i = 0; i < MLX4_MAX_PORTS; i++) { + if (!strcmp(port_str[i], "eth")) + port_type[i] = MLX4_PORT_TYPE_ETH; + else + port_type[i] = MLX4_PORT_TYPE_IB; + } +} + + static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) { int err; int i; + enum mlx4_port_type port_type[MLX4_MAX_PORTS]; + + mlx4_str2port_type(port_type_arr, port_type); err = mlx4_QUERY_DEV_CAP(dev, dev_cap); if (err) { @@ -180,7 +219,24 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev->caps.log_num_vlans = ilog2(roundup_pow_of_two(num_vlan + 2)); dev->caps.log_num_prios = use_prio ? 3: 0; + err = mlx4_check_port_params(dev, port_type); + if (err) + return err; + for (i = 1; i <= dev->caps.num_ports; ++i) { + if (!dev_cap->supported_port_types[i]) { + mlx4_warn(dev, "FW doesn't support Multi Protocol, " + "loading IB only\n"); + dev->caps.port_type[i] = MLX4_PORT_TYPE_IB; + continue; + } + if (port_type[i-1] & dev_cap->supported_port_types[i]) + dev->caps.port_type[i] = port_type[i-1]; + else { + mlx4_err(dev, "Requested port type for port %d " + "not supported by HW\n", i); + return -ENODEV; + } if (dev->caps.log_num_macs > dev_cap->log_max_macs[i]) { dev->caps.log_num_macs = dev_cap->log_max_macs[i]; mlx4_warn(dev, "Requested number of MACs is too much " @@ -1004,10 +1060,38 @@ static struct pci_driver mlx4_driver = { .remove = __devexit_p(mlx4_remove_one) }; +static int __init mlx4_verify_params(void) +{ + int i; + + for (i = 0; i < MLX4_MAX_PORTS; ++i) { + if (strcmp(port_type_arr[i], "eth") && + strcmp(port_type_arr[i], "ib")) { + printk(KERN_WARNING "mlx4_core: bad port_type for " + "port %d: %s\n", i, port_type_arr[i]); + return -1; + } + } + if ((num_mac < 1) || (num_mac > 127)) { + printk(KERN_WARNING "mlx4_core: bad num_mac: %d\n", num_mac); + return -1; + } + + if ((num_vlan < 0) || (num_vlan > 126)) { + printk(KERN_WARNING "mlx4_core: bad num_vlan: %d\n", num_vlan); + return -1; + } + + return 0; +} + static int __init mlx4_init(void) { int ret; + if (mlx4_verify_params()) + return -EINVAL; + ret = mlx4_catas_init(); if (ret) return ret; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 955eeca..4279b2f 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -62,6 +62,7 @@ enum { MLX4_DEV_CAP_FLAG_IPOIB_CSUM = 1 << 7, MLX4_DEV_CAP_FLAG_BAD_PKEY_CNTR = 1 << 8, MLX4_DEV_CAP_FLAG_BAD_QKEY_CNTR = 1 << 9, + MLX4_DEV_CAP_FLAG_DPDP = 1 << 12, MLX4_DEV_CAP_FLAG_MEM_WINDOW = 1 << 16, MLX4_DEV_CAP_FLAG_APM = 1 << 17, MLX4_DEV_CAP_FLAG_ATOMIC = 1 << 18, @@ -143,6 +144,11 @@ enum qp_region { MLX4_QP_REGION_COUNT }; +enum mlx4_port_type { + MLX4_PORT_TYPE_IB = 1 << 0, + MLX4_PORT_TYPE_ETH = 1 << 1, +}; + enum { MLX4_NUM_FEXCH = 64 * 1024, }; @@ -206,6 +212,7 @@ struct mlx4_caps { int log_num_macs; int log_num_vlans; int log_num_prios; + enum mlx4_port_type port_type[MLX4_MAX_PORTS + 1]; }; struct mlx4_buf_list { @@ -365,6 +372,31 @@ struct mlx4_init_port_param { u64 si_guid; }; +static inline void mlx4_query_steer_cap(struct mlx4_dev *dev, int *log_mac, + int *log_vlan, int *log_prio) +{ + *log_mac = dev->caps.log_num_macs; + *log_vlan = dev->caps.log_num_vlans; + *log_prio = dev->caps.log_num_prios; +} + +static inline u32 mlx4_get_ports_of_type(struct mlx4_dev *dev, + enum mlx4_port_type ptype) +{ + u32 ret = 0; + int i; + + for (i = 1; i <= dev->caps.num_ports; ++i) { + if (dev->caps.port_type[i] == ptype) + ret |= 1 << (i-1); + } + return ret; +} + +#define mlx4_foreach_port(port, bitmap) \ + for ((port) = 1; (port) <= MLX4_MAX_PORTS; (port)++) \ + if (bitmap & 1 << ((port)-1)) + int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct, struct mlx4_buf *buf); void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf); -- 1.5.4 From yevgenyp at mellanox.co.il Wed Apr 23 08:02:09 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 23 Apr 2008 18:02:09 +0300 Subject: [ofa-general][PATCH 6/12 1] mlx4: Port Ethernet mtu capabilities handle Message-ID: <480F4F71.2000707@mellanox.co.il> >From a37cec875c323ddebe4f0289e4bab774fd9ec0f4 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Tue, 22 Apr 2008 13:25:19 +0300 Subject: [PATCH] mlx4: Port Ethernet mtu capabilities handle Ethernet max mtu and default Mac address are revealed through QUERY_DEV_CAP command. The received mtu is crossed with requested max mtu (passed by module parameter). Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/fw.c | 11 ++++++----- drivers/net/mlx4/fw.h | 4 +++- drivers/net/mlx4/main.c | 15 ++++++++++++++- include/linux/mlx4/device.h | 4 +++- 4 files changed, 26 insertions(+), 8 deletions(-) diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c index e875b08..1cbc30f 100644 --- a/drivers/net/mlx4/fw.c +++ b/drivers/net/mlx4/fw.c @@ -314,7 +314,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) MLX4_GET(field, outbox, QUERY_DEV_CAP_VL_PORT_OFFSET); dev_cap->max_vl[i] = field >> 4; MLX4_GET(field, outbox, QUERY_DEV_CAP_MTU_WIDTH_OFFSET); - dev_cap->max_mtu[i] = field >> 4; + dev_cap->ib_mtu[i] = field >> 4; dev_cap->max_port_width[i] = field & 0xf; MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_GID_OFFSET); dev_cap->max_gids[i] = 1 << (field & 0xf); @@ -339,7 +339,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) QUERY_PORT_SUPPORTED_TYPE_OFFSET); dev_cap->supported_port_types[i] = field & 3; MLX4_GET(field, outbox, QUERY_PORT_MTU_OFFSET); - dev_cap->max_mtu[i] = field & 0xf; + dev_cap->ib_mtu[i] = field & 0xf; MLX4_GET(field, outbox, QUERY_PORT_WIDTH_OFFSET); dev_cap->max_port_width[i] = field & 0xf; MLX4_GET(field, outbox, QUERY_PORT_MAX_GID_PKEY_OFFSET); @@ -350,7 +350,8 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) MLX4_GET(field, outbox, QUERY_PORT_MAX_MACVLAN_OFFSET); dev_cap->log_max_macs[i] = field & 0xf; dev_cap->log_max_vlans[i] = field >> 4; - + dev_cap->eth_mtu[i] = be16_to_cpu(((u16 *) outbox)[1]); + dev_cap->def_mac[i] = be64_to_cpu(((u64 *) outbox)[2]); } } @@ -388,7 +389,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) mlx4_dbg(dev, "Max CQEs: %d, max WQEs: %d, max SRQ WQEs: %d\n", dev_cap->max_cq_sz, dev_cap->max_qp_sz, dev_cap->max_srq_sz); mlx4_dbg(dev, "Local CA ACK delay: %d, max MTU: %d, port width cap: %d\n", - dev_cap->local_ca_ack_delay, 128 << dev_cap->max_mtu[1], + dev_cap->local_ca_ack_delay, 128 << dev_cap->ib_mtu[1], dev_cap->max_port_width[1]); mlx4_dbg(dev, "Max SQ desc size: %d, max SQ S/G: %d\n", dev_cap->max_sq_desc_sz, dev_cap->max_sq_sg); @@ -796,7 +797,7 @@ int mlx4_INIT_PORT(struct mlx4_dev *dev, int port) flags |= (dev->caps.port_width_cap[port] & 0xf) << INIT_PORT_PORT_WIDTH_SHIFT; MLX4_PUT(inbox, flags, INIT_PORT_FLAGS_OFFSET); - field = 128 << dev->caps.mtu_cap[port]; + field = 128 << dev->caps.ib_mtu_cap[port]; MLX4_PUT(inbox, field, INIT_PORT_MTU_OFFSET); field = dev->caps.gid_table_len[port]; MLX4_PUT(inbox, field, INIT_PORT_MAX_GID_OFFSET); diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h index 50a6a7d..ef964d5 100644 --- a/drivers/net/mlx4/fw.h +++ b/drivers/net/mlx4/fw.h @@ -61,11 +61,13 @@ struct mlx4_dev_cap { int local_ca_ack_delay; int num_ports; u32 max_msg_sz; - int max_mtu[MLX4_MAX_PORTS + 1]; + int ib_mtu[MLX4_MAX_PORTS + 1]; int max_port_width[MLX4_MAX_PORTS + 1]; int max_vl[MLX4_MAX_PORTS + 1]; int max_gids[MLX4_MAX_PORTS + 1]; int max_pkeys[MLX4_MAX_PORTS + 1]; + u64 def_mac[MLX4_MAX_PORTS + 1]; + int eth_mtu[MLX4_MAX_PORTS + 1]; u16 stat_rate_support; u32 flags; int reserved_uars; diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 1651d8e..754c07c 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -104,6 +104,11 @@ static struct mlx4_profile default_profile = { module_param_array_named(port_type, port_type_arr, charp, NULL, 0444); MODULE_PARM_DESC(port_type, "Ports L2 type (ib/eth/auto, entry per port, " "comma seperated, default ib for all)"); + +static int port_mtu[MLX4_MAX_PORTS] = { [0 ... (MLX4_MAX_PORTS-1)] = 9600}; +module_param_array_named(port_mtu, port_mtu, int, NULL, 0444); +MODULE_PARM_DESC(port_mtu, "Ports max mtu in Bytes, entry per port, " + "comma seperated, default 9600 for all"); static int mlx4_check_port_params(struct mlx4_dev *dev, enum mlx4_port_type *port_type) @@ -175,10 +180,12 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev->caps.num_ports = dev_cap->num_ports; for (i = 1; i <= dev->caps.num_ports; ++i) { dev->caps.vl_cap[i] = dev_cap->max_vl[i]; - dev->caps.mtu_cap[i] = dev_cap->max_mtu[i]; + dev->caps.ib_mtu_cap[i] = dev_cap->ib_mtu[i]; dev->caps.gid_table_len[i] = dev_cap->max_gids[i]; dev->caps.pkey_table_len[i] = dev_cap->max_pkeys[i]; dev->caps.port_width_cap[i] = dev_cap->max_port_width[i]; + dev->caps.eth_mtu_cap[i] = dev_cap->eth_mtu[i]; + dev->caps.def_mac[i] = dev_cap->def_mac[i]; } dev->caps.num_uars = dev_cap->uar_size / PAGE_SIZE; @@ -237,6 +244,12 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) "not supported by HW\n", i); return -ENODEV; } + if (port_mtu[i-1] <= dev->caps.eth_mtu_cap[i]) + dev->caps.eth_mtu_cap[i] = port_mtu[i-1]; + else + mlx4_warn(dev, "Requested mtu for port %d is larger " + "then supported, reducing to %d\n", + i, dev->caps.eth_mtu_cap[i]); if (dev->caps.log_num_macs > dev_cap->log_max_macs[i]) { dev->caps.log_num_macs = dev_cap->log_max_macs[i]; mlx4_warn(dev, "Requested number of MACs is too much " diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 4279b2f..b114ef3 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -162,7 +162,9 @@ struct mlx4_caps { u64 fw_ver; int num_ports; int vl_cap[MLX4_MAX_PORTS + 1]; - int mtu_cap[MLX4_MAX_PORTS + 1]; + int ib_mtu_cap[MLX4_MAX_PORTS + 1]; + u64 def_mac[MLX4_MAX_PORTS + 1]; + int eth_mtu_cap[MLX4_MAX_PORTS + 1]; int gid_table_len[MLX4_MAX_PORTS + 1]; int pkey_table_len[MLX4_MAX_PORTS + 1]; int local_ca_ack_delay; -- 1.5.4 From yevgenyp at mellanox.co.il Wed Apr 23 08:03:51 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 23 Apr 2008 18:03:51 +0300 Subject: [ofa-general][PATCH 7/12 v1] mlx4: Mac Vlan Management Message-ID: <480F4FD7.4010706@mellanox.co.il> >From 93d41d72b8878bfd8d67b6a48b70c392f108fe58 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Tue, 22 Apr 2008 14:28:36 +0300 Subject: [PATCH] mlx4: Mac Vlan Management mlx4_core is now responsible for managing Mac and Vlan filters for each port. It also notifies the FW which port type will be loaded, using the SET_PORT command Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/Makefile | 2 +- drivers/net/mlx4/main.c | 18 +++ drivers/net/mlx4/mlx4.h | 35 ++++++ drivers/net/mlx4/port.c | 278 +++++++++++++++++++++++++++++++++++++++++++ include/linux/mlx4/cmd.h | 9 ++ include/linux/mlx4/device.h | 6 + 6 files changed, 347 insertions(+), 1 deletions(-) create mode 100644 drivers/net/mlx4/port.c diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile index 0952a65..f4932d8 100644 --- a/drivers/net/mlx4/Makefile +++ b/drivers/net/mlx4/Makefile @@ -1,4 +1,4 @@ obj-$(CONFIG_MLX4_CORE) += mlx4_core.o mlx4_core-y := alloc.o catas.o cmd.o cq.o eq.o fw.o icm.o intf.o main.o mcg.o \ - mr.o pd.o profile.o qp.o reset.o srq.o + mr.o pd.o profile.o qp.o reset.o srq.o port.o diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 754c07c..a528809 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -678,6 +678,7 @@ static int mlx4_setup_hca(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); int err; + int port; err = mlx4_init_uar_table(dev); if (err) { @@ -776,8 +777,25 @@ static int mlx4_setup_hca(struct mlx4_dev *dev) goto err_qp_table_free; } + for (port = 1; port <= dev->caps.num_ports; port++) { + err = mlx4_SET_PORT(dev, port); + if (err) { + mlx4_err(dev, "Failed to set port %d, aborting\n", + port); + goto err_mcg_table_free; + } + } + + for (port = 0; port < dev->caps.num_ports; port++) { + mlx4_init_mac_table(dev, port); + mlx4_init_vlan_table(dev, port); + } + return 0; +err_mcg_table_free: + mlx4_cleanup_mcg_table(dev); + err_qp_table_free: mlx4_cleanup_qp_table(dev); diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index b74405a..eff1c5a 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -251,6 +251,35 @@ struct mlx4_catas_err { struct list_head list; }; +struct mlx4_mac_table { +#define MLX4_MAX_MAC_NUM 128 +#define MLX4_MAC_MASK 0xffffffffffff +#define MLX4_MAC_VALID_SHIFT 63 +#define MLX4_MAC_TABLE_SIZE MLX4_MAX_MAC_NUM << 3 + __be64 entries[MLX4_MAX_MAC_NUM]; + int refs[MLX4_MAX_MAC_NUM]; + struct semaphore mac_sem; + int total; + int max; +}; + +struct mlx4_vlan_table { +#define MLX4_MAX_VLAN_NUM 126 +#define MLX4_VLAN_MASK 0xfff +#define MLX4_VLAN_VALID 1 << 31 +#define MLX4_VLAN_TABLE_SIZE MLX4_MAX_VLAN_NUM << 2 + __be32 entries[MLX4_MAX_VLAN_NUM]; + int refs[MLX4_MAX_VLAN_NUM]; + struct semaphore vlan_sem; + int total; + int max; +}; + +struct mlx4_port_info { + struct mlx4_mac_table mac_table; + struct mlx4_vlan_table vlan_table; +}; + struct mlx4_priv { struct mlx4_dev dev; @@ -279,6 +308,7 @@ struct mlx4_priv { struct mlx4_uar driver_uar; void __iomem *kar; + struct mlx4_port_info port[MLX4_MAX_PORTS]; }; static inline struct mlx4_priv *mlx4_priv(struct mlx4_dev *dev) @@ -351,4 +381,9 @@ void mlx4_srq_event(struct mlx4_dev *dev, u32 srqn, int event_type); void mlx4_handle_catas_err(struct mlx4_dev *dev); +void mlx4_init_mac_table(struct mlx4_dev *dev, u8 port); +void mlx4_init_vlan_table(struct mlx4_dev *dev, u8 port); + +int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port); + #endif /* MLX4_H */ diff --git a/drivers/net/mlx4/port.c b/drivers/net/mlx4/port.c new file mode 100644 index 0000000..910fc35 --- /dev/null +++ b/drivers/net/mlx4/port.c @@ -0,0 +1,278 @@ +/* + * Copyright (c) 2007 Mellanox Technologies. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#include +#include + +#include + +#include "mlx4.h" + +void mlx4_init_mac_table(struct mlx4_dev *dev, u8 port) +{ + struct mlx4_mac_table *table = &mlx4_priv(dev)->port[port].mac_table; + int i; + + sema_init(&table->mac_sem, 1); + for (i = 0; i < MLX4_MAX_MAC_NUM; i++) { + table->entries[i] = 0; + table->refs[i] = 0; + } + table->max = 1 << dev->caps.log_num_macs; + table->total = 0; +} + +void mlx4_init_vlan_table(struct mlx4_dev *dev, u8 port) +{ + struct mlx4_vlan_table *table = &mlx4_priv(dev)->port[port].vlan_table; + int i; + + sema_init(&table->vlan_sem, 1); + for (i = 0; i < MLX4_MAX_MAC_NUM; i++) { + table->entries[i] = 0; + table->refs[i] = 0; + } + table->max = 1 << dev->caps.log_num_vlans; + table->total = 0; +} + +static int mlx4_SET_PORT_mac_table(struct mlx4_dev *dev, u8 port, + __be64 *entries) +{ + struct mlx4_cmd_mailbox *mailbox; + u32 in_mod; + int err; + + mailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(mailbox)) + return PTR_ERR(mailbox); + + memcpy(mailbox->buf, entries, MLX4_MAC_TABLE_SIZE); + + in_mod = MLX4_SET_PORT_MAC_TABLE << 8 | port; + err = mlx4_cmd(dev, mailbox->dma, in_mod, 1, MLX4_CMD_SET_PORT, + MLX4_CMD_TIME_CLASS_B); + + mlx4_free_cmd_mailbox(dev, mailbox); + return err; +} + +int mlx4_register_mac(struct mlx4_dev *dev, u8 port, u64 mac, int *index) +{ + struct mlx4_mac_table *table = &mlx4_priv(dev)->port[port - 1].mac_table; + int i, err = 0; + int free = -1; + u64 valid = 1; + + mlx4_dbg(dev, "Registering mac : 0x%llx\n", mac); + down(&table->mac_sem); + for (i = 0; i < MLX4_MAX_MAC_NUM - 1; i++) { + if (free < 0 && !table->refs[i]) { + free = i; + continue; + } + + if (mac == (MLX4_MAC_MASK & be64_to_cpu(table->entries[i]))) { + /* Mac already registered, increase refernce count */ + *index = i; + ++table->refs[i]; + goto out; + } + } + mlx4_dbg(dev, "Free mac index is %d\n", free); + + if (table->total == table->max) { + /* No free mac entries */ + err = -ENOSPC; + goto out; + } + + /* Register new MAC */ + table->refs[free] = 1; + table->entries[free] = cpu_to_be64(mac | valid << MLX4_MAC_VALID_SHIFT); + + err = mlx4_SET_PORT_mac_table(dev, port, table->entries); + if (unlikely(err)) { + mlx4_err(dev, "Failed adding mac: 0x%llx\n", mac); + table->refs[free] = 0; + table->entries[free] = 0; + goto out; + } + + *index = free; + ++table->total; +out: + up(&table->mac_sem); + return err; +} +EXPORT_SYMBOL_GPL(mlx4_register_mac); + +void mlx4_unregister_mac(struct mlx4_dev *dev, u8 port, int index) +{ + struct mlx4_mac_table *table = &mlx4_priv(dev)->port[port - 1].mac_table; + + down(&table->mac_sem); + if (!table->refs[index]) { + mlx4_warn(dev, "No mac entry for index %d\n", index); + goto out; + } + if (--table->refs[index]) { + mlx4_warn(dev, "Have more references for index %d," + "no need to modify mac table\n", index); + goto out; + } + table->entries[index] = 0; + mlx4_SET_PORT_mac_table(dev, port, table->entries); + --table->total; +out: + up(&table->mac_sem); +} +EXPORT_SYMBOL_GPL(mlx4_unregister_mac); + +static int mlx4_SET_PORT_vlan_table(struct mlx4_dev *dev, u8 port, + __be32 *entries) +{ + struct mlx4_cmd_mailbox *mailbox; + u32 in_mod; + int err; + + mailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(mailbox)) + return PTR_ERR(mailbox); + + memcpy(mailbox->buf, entries, MLX4_VLAN_TABLE_SIZE); + in_mod = MLX4_SET_PORT_VLAN_TABLE << 8 | port; + err = mlx4_cmd(dev, mailbox->dma, in_mod, 1, MLX4_CMD_SET_PORT, + MLX4_CMD_TIME_CLASS_B); + + mlx4_free_cmd_mailbox(dev, mailbox); + + return err; +} + +int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index) +{ + struct mlx4_vlan_table *table = &mlx4_priv(dev)->port[port - 1].vlan_table; + int i, err = 0; + int free = -1; + + down(&table->vlan_sem); + for (i = 0; i < MLX4_MAX_VLAN_NUM; i++) { + if (free < 0 && (table->refs[i] == 0)) { + free = i; + continue; + } + + if (table->refs[i] && + (vlan == (MLX4_VLAN_MASK & + be32_to_cpu(table->entries[i])))) { + /* Vlan already registered, increase refernce count */ + *index = i; + ++table->refs[i]; + goto out; + } + } + + if (table->total == table->max) { + /* No free vlan entries */ + err = -ENOSPC; + goto out; + } + + /* Register new MAC */ + table->refs[free] = 1; + table->entries[free] = cpu_to_be32(vlan | MLX4_VLAN_VALID); + + err = mlx4_SET_PORT_vlan_table(dev, port, table->entries); + if (unlikely(err)) { + mlx4_warn(dev, "Failed adding vlan: %u\n", vlan); + table->refs[free] = 0; + table->entries[free] = 0; + goto out; + } + + *index = free; + ++table->total; +out: + up(&table->vlan_sem); + return err; +} +EXPORT_SYMBOL_GPL(mlx4_register_vlan); + +void mlx4_unregister_vlan(struct mlx4_dev *dev, u8 port, int index) +{ + struct mlx4_vlan_table *table = &mlx4_priv(dev)->port[port - 1].vlan_table; + + down(&table->vlan_sem); + if (!table->refs[index]) { + mlx4_warn(dev, "No vlan entry for index %d\n", index); + goto out; + } + if (--table->refs[index]) { + mlx4_dbg(dev, "Have more references for index %d," + "no need to modify vlan table\n", index); + goto out; + } + table->entries[index] = 0; + mlx4_SET_PORT_vlan_table(dev, port, table->entries); + --table->total; +out: + up(&table->vlan_sem); +} +EXPORT_SYMBOL_GPL(mlx4_unregister_vlan); + +int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port) +{ + struct mlx4_cmd_mailbox *mailbox; + int err; + u8 is_eth = (dev->caps.port_type[port] == MLX4_PORT_TYPE_ETH) ? 1 : 0; + + mailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(mailbox)) + return PTR_ERR(mailbox); + + memset(mailbox->buf, 0, 256); + if (is_eth) { + ((u8 *) mailbox->buf)[3] = 7; + ((__be16 *) mailbox->buf)[3] = + cpu_to_be16(dev->caps.eth_mtu_cap[port] + + ETH_HLEN + ETH_FCS_LEN); + ((__be16 *) mailbox->buf)[4] = cpu_to_be16(1 << 15); + ((__be16 *) mailbox->buf)[6] = cpu_to_be16(1 << 15); + } + err = mlx4_cmd(dev, mailbox->dma, port, is_eth, MLX4_CMD_SET_PORT, + MLX4_CMD_TIME_CLASS_B); + + mlx4_free_cmd_mailbox(dev, mailbox); + return err; +} diff --git a/include/linux/mlx4/cmd.h b/include/linux/mlx4/cmd.h index 77323a7..cf9c679 100644 --- a/include/linux/mlx4/cmd.h +++ b/include/linux/mlx4/cmd.h @@ -132,6 +132,15 @@ enum { MLX4_MAILBOX_SIZE = 4096 }; +enum { + /* set port opcode modifiers */ + MLX4_SET_PORT_GENERAL = 0x0, + MLX4_SET_PORT_RQP_CALC = 0x1, + MLX4_SET_PORT_MAC_TABLE = 0x2, + MLX4_SET_PORT_VLAN_TABLE = 0x3, + MLX4_SET_PORT_PRIO_MAP = 0x4, +}; + struct mlx4_dev; struct mlx4_cmd_mailbox { diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index b114ef3..4ca3a00 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -461,6 +461,12 @@ int mlx4_CLOSE_PORT(struct mlx4_dev *dev, int port); int mlx4_multicast_attach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16]); int mlx4_multicast_detach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16]); +int mlx4_register_mac(struct mlx4_dev *dev, u8 port, u64 mac, int *index); +void mlx4_unregister_mac(struct mlx4_dev *dev, u8 port, int index); + +int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index); +void mlx4_unregister_vlan(struct mlx4_dev *dev, u8 port, int index); + int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list, int npages, u64 iova, u32 *lkey, u32 *rkey); int mlx4_fmr_alloc(struct mlx4_dev *dev, u32 pd, u32 access, int max_pages, -- 1.5.4 From yevgenyp at mellanox.co.il Wed Apr 23 08:05:10 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 23 Apr 2008 18:05:10 +0300 Subject: [ofa-general][PATCH 8/12 v1] mlx4: Dynamic port configuration Message-ID: <480F5026.9070400@mellanox.co.il> >From e13bef843cb2c7cee5a0ba388d97e21188087424 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Tue, 22 Apr 2008 15:14:30 +0300 Subject: [PATCH] mlx4: Dynamic port configuration Port type can be set using sysfs interface when the low level driver is up. The low level driver unregisters all its customers and then registers them again with the new port types (which they query for in add_one) Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/main.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 97 insertions(+), 0 deletions(-) diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index a528809..e3fd4e9 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -281,6 +281,96 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) return 0; } +static int mlx4_change_port_types(struct mlx4_dev *dev, + enum mlx4_port_type *port_types) +{ + int i; + int err = 0; + int change = 0; + int port; + + for (i = 0; i < MLX4_MAX_PORTS; i++) { + if (port_types[i] != dev->caps.port_type[i + 1]) { + change = 1; + dev->caps.port_type[i + 1] = port_types[i]; + } + } + if (change) { + mlx4_unregister_device(dev); + for (port = 1; port <= dev->caps.num_ports; port++) { + mlx4_CLOSE_PORT(dev, port); + err = mlx4_SET_PORT(dev, port); + if (err) { + mlx4_err(dev, "Failed to set port %d, " + "aborting\n", port); + return err; + } + } + err = mlx4_register_device(dev); + } + return err; +} + +static ssize_t show_port_type(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct pci_dev *pdev = to_pci_dev(dev); + struct mlx4_dev *mdev = pci_get_drvdata(pdev); + int i; + + sprintf(buf, "Current port types:\n"); + for (i = 1; i <= MLX4_MAX_PORTS; i++) { + sprintf(buf, "%sPort%d: %s\n", buf, i, + (mdev->caps.port_type[i] == MLX4_PORT_TYPE_IB)? + "ib": "eth"); + } + return strlen(buf); +} + +static ssize_t set_port_type(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct pci_dev *pdev = to_pci_dev(dev); + struct mlx4_dev *mdev = pci_get_drvdata(pdev); + char *type; + enum mlx4_port_type port_types[MLX4_MAX_PORTS]; + char *loc_buf; + char *ptr; + int i; + int err = 0; + + loc_buf = kmalloc(count + 1, GFP_KERNEL); + if (!loc_buf) + return -ENOMEM; + + ptr = loc_buf; + memcpy(loc_buf, buf, count + 1); + for (i = 0; i < MLX4_MAX_PORTS; i++) { + type = strsep(&loc_buf, ","); + if (!strcmp(type, "ib")) + port_types[i] = MLX4_PORT_TYPE_IB; + else if (!strcmp(type, "eth")) + port_types[i] = MLX4_PORT_TYPE_ETH; + else { + dev_warn(dev, "%s is not acceptable port type " + "(use 'eth' or 'ib' only)\n", type); + err = -EINVAL; + goto out; + } + } + err = mlx4_check_port_params(mdev, port_types); + if (err) + goto out; + + err = mlx4_change_port_types(mdev, port_types); +out: + kfree(ptr); + return err ? err: count; +} +static DEVICE_ATTR(mlx4_port_type, S_IWUGO | S_IRUGO, show_port_type, set_port_type); + static int mlx4_load_fw(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); @@ -979,8 +1069,14 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id) pci_set_drvdata(pdev, dev); + if (device_create_file(&pdev->dev, &dev_attr_mlx4_port_type)) + goto err_sysfs; + return 0; +err_sysfs: + mlx4_unregister_device(dev); + err_cleanup: mlx4_cleanup_mcg_table(dev); mlx4_cleanup_qp_table(dev); @@ -1036,6 +1132,7 @@ static void mlx4_remove_one(struct pci_dev *pdev) int p; if (dev) { + device_remove_file(&pdev->dev, &dev_attr_mlx4_port_type); mlx4_unregister_device(dev); for (p = 1; p <= dev->caps.num_ports; ++p) -- 1.5.4 From yevgenyp at mellanox.co.il Wed Apr 23 08:06:21 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 23 Apr 2008 18:06:21 +0300 Subject: [ofa-general][PATCH 9/12 v1] mlx4: Collapsed CQ support Message-ID: <480F506D.9020202@mellanox.co.il> >From 749a2b62acc505a9ab2437eddb4cdd45503183d0 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Tue, 22 Apr 2008 15:50:51 +0300 Subject: [PATCH] mlx4: Collapsed CQ support Changed cq creation API to support the creation of collapsed cqs. Signed-off-by: Yevgeny Petrilin --- drivers/infiniband/hw/mlx4/cq.c | 2 +- drivers/net/mlx4/cq.c | 4 +++- include/linux/mlx4/device.h | 3 ++- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 5e570bb..63daf52 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -221,7 +221,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector } err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, uar, - cq->db.dma, &cq->mcq); + cq->db.dma, &cq->mcq, 0); if (err) goto err_dbmap; diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c index caa5bcf..d893cc1 100644 --- a/drivers/net/mlx4/cq.c +++ b/drivers/net/mlx4/cq.c @@ -188,7 +188,8 @@ int mlx4_cq_resize(struct mlx4_dev *dev, struct mlx4_cq *cq, EXPORT_SYMBOL_GPL(mlx4_cq_resize); int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, - struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq) + struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq, + int collapsed) { struct mlx4_priv *priv = mlx4_priv(dev); struct mlx4_cq_table *cq_table = &priv->cq_table; @@ -224,6 +225,7 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, cq_context = mailbox->buf; memset(cq_context, 0, sizeof *cq_context); + cq_context->flags = cpu_to_be32(!!collapsed << 18); cq_context->logsize_usrpage = cpu_to_be32((ilog2(nent) << 24) | uar->index); cq_context->comp_eqn = priv->eq_table.eq[MLX4_EQ_COMP].eqn; cq_context->log_page_size = mtt->page_shift - MLX4_ICM_PAGE_SHIFT; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 4ca3a00..93c17aa 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -440,7 +440,8 @@ void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres, int size); int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, - struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq); + struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq, + int collapsed); void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq); int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base); -- 1.5.4 From yevgenyp at mellanox.co.il Wed Apr 23 08:07:50 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 23 Apr 2008 18:07:50 +0300 Subject: [ofa-general][PATCH 10/12 v1] mlx4: Completion EQ per CPU Message-ID: <480F50C6.80109@mellanox.co.il> >From 2a2d22208f6fdba4c0c2afdf0ed12ef07b93d661 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Tue, 22 Apr 2008 16:39:47 +0300 Subject: [PATCH] mlx4: Completion EQ per cpu Completion eq's are created per cpu. Created cq's are attached to an eq by "Round Robin" algorithm, unless a specific eq was requested. Signed-off-by: Yevgeny Petrilin --- drivers/infiniband/hw/mlx4/cq.c | 2 +- drivers/net/mlx4/cq.c | 19 ++++++++++++++++--- drivers/net/mlx4/eq.c | 39 ++++++++++++++++++++++++++------------- drivers/net/mlx4/main.c | 14 ++++++++------ drivers/net/mlx4/mlx4.h | 6 ++++-- include/linux/mlx4/device.h | 3 ++- 6 files changed, 57 insertions(+), 26 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 63daf52..732f812 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -221,7 +221,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector } err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, uar, - cq->db.dma, &cq->mcq, 0); + cq->db.dma, &cq->mcq, vector, 0); if (err) goto err_dbmap; diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c index d893cc1..bbb4c7b 100644 --- a/drivers/net/mlx4/cq.c +++ b/drivers/net/mlx4/cq.c @@ -189,7 +189,7 @@ EXPORT_SYMBOL_GPL(mlx4_cq_resize); int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq, - int collapsed) + unsigned vector, int collapsed) { struct mlx4_priv *priv = mlx4_priv(dev); struct mlx4_cq_table *cq_table = &priv->cq_table; @@ -227,7 +227,20 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, cq_context->flags = cpu_to_be32(!!collapsed << 18); cq_context->logsize_usrpage = cpu_to_be32((ilog2(nent) << 24) | uar->index); - cq_context->comp_eqn = priv->eq_table.eq[MLX4_EQ_COMP].eqn; + + if (vector > priv->eq_table.num_comp_eqs) { + err = -EINVAL; + goto err_radix; + } + + if (vector == 0) { + vector = priv->eq_table.last_comp_eq % + priv->eq_table.num_comp_eqs + 1; + priv->eq_table.last_comp_eq = vector; + } + cq->comp_eq_idx = MLX4_EQ_COMP_CPU0 + vector - 1; + cq_context->comp_eqn = priv->eq_table.eq[MLX4_EQ_COMP_CPU0 + + vector - 1].eqn; cq_context->log_page_size = mtt->page_shift - MLX4_ICM_PAGE_SHIFT; mtt_addr = mlx4_mtt_addr(dev, mtt); @@ -276,7 +289,7 @@ void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq) if (err) mlx4_warn(dev, "HW2SW_CQ failed (%d) for CQN %06x\n", err, cq->cqn); - synchronize_irq(priv->eq_table.eq[MLX4_EQ_COMP].irq); + synchronize_irq(priv->eq_table.eq[cq->comp_eq_idx].irq); spin_lock_irq(&cq_table->lock); radix_tree_delete(&cq_table->tree, cq->cqn); diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c index e141a15..b4676db 100644 --- a/drivers/net/mlx4/eq.c +++ b/drivers/net/mlx4/eq.c @@ -265,7 +265,7 @@ static irqreturn_t mlx4_interrupt(int irq, void *dev_ptr) writel(priv->eq_table.clr_mask, priv->eq_table.clr_int); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs; ++i) work |= mlx4_eq_int(dev, &priv->eq_table.eq[i]); return IRQ_RETVAL(work); @@ -482,7 +482,7 @@ static void mlx4_free_irqs(struct mlx4_dev *dev) if (eq_table->have_irq) free_irq(dev->pdev->irq, dev); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < MLX4_EQ_COMP_CPU0 + eq_table->num_comp_eqs; ++i) if (eq_table->eq[i].have_irq) free_irq(eq_table->eq[i].irq, eq_table->eq + i); } @@ -553,6 +553,7 @@ void mlx4_unmap_eq_icm(struct mlx4_dev *dev) int mlx4_init_eq_table(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); + int req_eqs; int err; int i; @@ -573,11 +574,22 @@ int mlx4_init_eq_table(struct mlx4_dev *dev) priv->eq_table.clr_int = priv->clr_base + (priv->eq_table.inta_pin < 32 ? 4 : 0); - err = mlx4_create_eq(dev, dev->caps.num_cqs + MLX4_NUM_SPARE_EQE, - (dev->flags & MLX4_FLAG_MSI_X) ? MLX4_EQ_COMP : 0, - &priv->eq_table.eq[MLX4_EQ_COMP]); - if (err) - goto err_out_unmap; + priv->eq_table.num_comp_eqs = 0; + req_eqs = (dev->flags & MLX4_FLAG_MSI_X) ? num_online_cpus() : 1; + while (req_eqs) { + err = mlx4_create_eq( + dev, dev->caps.num_cqs + MLX4_NUM_SPARE_EQE, + (dev->flags & MLX4_FLAG_MSI_X) ? + (MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs) : 0, + &priv->eq_table.eq[MLX4_EQ_COMP_CPU0 + + priv->eq_table.num_comp_eqs]); + if (err) + goto err_out_comp; + + priv->eq_table.num_comp_eqs++; + req_eqs--; + } + priv->eq_table.last_comp_eq = 0; err = mlx4_create_eq(dev, MLX4_NUM_ASYNC_EQE + MLX4_NUM_SPARE_EQE, (dev->flags & MLX4_FLAG_MSI_X) ? MLX4_EQ_ASYNC : 0, @@ -587,11 +599,12 @@ int mlx4_init_eq_table(struct mlx4_dev *dev) if (dev->flags & MLX4_FLAG_MSI_X) { static const char *eq_name[] = { - [MLX4_EQ_COMP] = DRV_NAME " (comp)", + [MLX4_EQ_COMP_CPU0...MLX4_NUM_EQ] = "comp_" DRV_NAME, [MLX4_EQ_ASYNC] = DRV_NAME " (async)" }; - for (i = 0; i < MLX4_NUM_EQ; ++i) { + for (i = 0; i < MLX4_EQ_COMP_CPU0 + + priv->eq_table.num_comp_eqs; ++i) { err = request_irq(priv->eq_table.eq[i].irq, mlx4_msi_x_interrupt, 0, eq_name[i], priv->eq_table.eq + i); @@ -616,7 +629,7 @@ int mlx4_init_eq_table(struct mlx4_dev *dev) mlx4_warn(dev, "MAP_EQ for async EQ %d failed (%d)\n", priv->eq_table.eq[MLX4_EQ_ASYNC].eqn, err); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs; ++i) eq_set_ci(&priv->eq_table.eq[i], 1); return 0; @@ -625,9 +638,9 @@ err_out_async: mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_ASYNC]); err_out_comp: - mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_COMP]); + for (i = 0; i < priv->eq_table.num_comp_eqs; ++i) + mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_COMP_CPU0 + i]); -err_out_unmap: mlx4_unmap_clr_int(dev); mlx4_free_irqs(dev); @@ -646,7 +659,7 @@ void mlx4_cleanup_eq_table(struct mlx4_dev *dev) mlx4_free_irqs(dev); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs; ++i) mlx4_free_eq(dev, &priv->eq_table.eq[i]); mlx4_unmap_clr_int(dev); diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index e3fd4e9..aecb1f2 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -922,22 +922,24 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); struct msix_entry entries[MLX4_NUM_EQ]; + int needed_vectors = MLX4_EQ_COMP_CPU0 + num_online_cpus(); int err; int i; if (msi_x) { - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < needed_vectors; ++i) entries[i].entry = i; - err = pci_enable_msix(dev->pdev, entries, ARRAY_SIZE(entries)); + err = pci_enable_msix(dev->pdev, entries, needed_vectors); if (err) { if (err > 0) - mlx4_info(dev, "Only %d MSI-X vectors available, " - "not using MSI-X\n", err); + mlx4_info(dev, "Only %d MSI-X vectors " + "available, need %d. Not using MSI-X\n", + err, needed_vectors); goto no_msi; } - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < needed_vectors; ++i) priv->eq_table.eq[i].irq = entries[i].vector; dev->flags |= MLX4_FLAG_MSI_X; @@ -945,7 +947,7 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev) } no_msi: - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < needed_vectors; ++i) priv->eq_table.eq[i].irq = dev->pdev->irq; } diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index eff1c5a..2201a99 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -64,8 +64,8 @@ enum { enum { MLX4_EQ_ASYNC, - MLX4_EQ_COMP, - MLX4_NUM_EQ + MLX4_EQ_COMP_CPU0, + MLX4_NUM_EQ = MLX4_EQ_COMP_CPU0 + NR_CPUS }; enum { @@ -211,6 +211,8 @@ struct mlx4_eq_table { void __iomem *uar_map[(MLX4_NUM_EQ + 6) / 4]; u32 clr_mask; struct mlx4_eq eq[MLX4_NUM_EQ]; + int num_comp_eqs; + int last_comp_eq; u64 icm_virt; struct page *icm_page; dma_addr_t icm_dma; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 93c17aa..673462c 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -312,6 +312,7 @@ struct mlx4_cq { int arm_sn; int cqn; + int comp_eq_idx; atomic_t refcount; struct completion free; @@ -441,7 +442,7 @@ void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres, int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq, - int collapsed); + unsigned vector, int collapsed); void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq); int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base); -- 1.5.4 From yevgenyp at mellanox.co.il Wed Apr 23 08:09:10 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 23 Apr 2008 18:09:10 +0300 Subject: [ofa-general][PATCH 11/12 v1] mlx4: Fiber Channel support Message-ID: <480F5116.4040809@mellanox.co.il> >From ab14366d6cbf590c6a6a6a4d16e86a0d120facc6 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Wed, 23 Apr 2008 15:19:16 +0300 Subject: [PATCH] mlx4: Fiber Channel support As we did with QPs, some of the MPTs are pre-reserved (the MPTs that are mapped for FEXCHs, 2*64K of them). So needed to split the operation of allocating an MPT to two: The allocation of a bit from the bitmap The actual creation of the entry (and it's MTT). So, mr_alloc_reserved() is the second part, where you know which MPT number was allocated. mr_alloc() is the one that allocates a number from the bitmap. Normal users keep using the original mr_alloc(). For FEXCH, when we know the pre-reserved MPT entry, we call mr_alloc_reserved() directly. Same with the mr_free() and corresponding mr_free_reserved(). The first will just put back the bit, the later will actually destroy the entry, but will leave the bit set. map_phys_fmr_fbo() is very much like the original map_phys_fmr() - allows setting an FBO (First Byte Offset) for the MPT - allows setting the data length for the MPT - does not increase the higher bits of the key after every map. Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/main.c | 2 +- drivers/net/mlx4/mr.c | 131 +++++++++++++++++++++++++++++++++++++------ include/linux/mlx4/device.h | 18 ++++++ include/linux/mlx4/qp.h | 11 +++- 4 files changed, 142 insertions(+), 20 deletions(-) diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index aecb1f2..93a4e4b 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -81,7 +81,7 @@ static struct mlx4_profile default_profile = { .rdmarc_per_qp = 1 << 4, .num_cq = 1 << 16, .num_mcg = 1 << 13, - .num_mpt = 1 << 17, + .num_mpt = 1 << 18, .num_mtt = 1 << 20, }; diff --git a/drivers/net/mlx4/mr.c b/drivers/net/mlx4/mr.c index 79b317b..ae376ae 100644 --- a/drivers/net/mlx4/mr.c +++ b/drivers/net/mlx4/mr.c @@ -52,7 +52,9 @@ struct mlx4_mpt_entry { __be64 length; __be32 lkey; __be32 win_cnt; - u8 reserved1[3]; + u8 reserved1; + u8 flags2; + u8 reserved2; u8 mtt_rep; __be64 mtt_seg; __be32 mtt_sz; @@ -68,6 +70,8 @@ struct mlx4_mpt_entry { #define MLX4_MTT_FLAG_PRESENT 1 +#define MLX4_MPT_FLAG2_FBO_EN (1 << 7) + #define MLX4_MPT_STATUS_SW 0xF0 #define MLX4_MPT_STATUS_HW 0x00 @@ -122,7 +126,7 @@ static void mlx4_buddy_free(struct mlx4_buddy *buddy, u32 seg, int order) spin_unlock(&buddy->lock); } -static int mlx4_buddy_init(struct mlx4_buddy *buddy, int max_order) +static int __devinit mlx4_buddy_init(struct mlx4_buddy *buddy, int max_order) { int i, s; @@ -250,6 +254,21 @@ static int mlx4_HW2SW_MPT(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox !mailbox, MLX4_CMD_HW2SW_MPT, MLX4_CMD_TIME_CLASS_B); } +int mlx4_mr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd, + u64 iova, u64 size, u32 access, int npages, + int page_shift, struct mlx4_mr *mr) +{ + mr->iova = iova; + mr->size = size; + mr->pd = pd; + mr->access = access; + mr->enabled = 0; + mr->key = hw_index_to_key(mridx); + + return mlx4_mtt_init(dev, npages, page_shift, &mr->mtt); +} +EXPORT_SYMBOL_GPL(mlx4_mr_alloc_reserved); + int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access, int npages, int page_shift, struct mlx4_mr *mr) { @@ -261,14 +280,8 @@ int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access, if (index == -1) return -ENOMEM; - mr->iova = iova; - mr->size = size; - mr->pd = pd; - mr->access = access; - mr->enabled = 0; - mr->key = hw_index_to_key(index); - - err = mlx4_mtt_init(dev, npages, page_shift, &mr->mtt); + err = mlx4_mr_alloc_reserved(dev, index, pd, iova, size, + access, npages, page_shift, mr); if (err) mlx4_bitmap_free(&priv->mr_table.mpt_bitmap, index); @@ -276,9 +289,8 @@ int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access, } EXPORT_SYMBOL_GPL(mlx4_mr_alloc); -void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr) +void mlx4_mr_free_reserved(struct mlx4_dev *dev, struct mlx4_mr *mr) { - struct mlx4_priv *priv = mlx4_priv(dev); int err; if (mr->enabled) { @@ -290,6 +302,13 @@ void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr) } mlx4_mtt_cleanup(dev, &mr->mtt); +} +EXPORT_SYMBOL_GPL(mlx4_mr_free_reserved); + +void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + mlx4_mr_free_reserved(dev, mr); mlx4_bitmap_free(&priv->mr_table.mpt_bitmap, key_to_hw_index(mr->key)); } EXPORT_SYMBOL_GPL(mlx4_mr_free); @@ -435,8 +454,15 @@ int mlx4_init_mr_table(struct mlx4_dev *dev) struct mlx4_mr_table *mr_table = &mlx4_priv(dev)->mr_table; int err; - err = mlx4_bitmap_init(&mr_table->mpt_bitmap, dev->caps.num_mpts, - ~0, dev->caps.reserved_mrws); + if (!is_power_of_2(dev->caps.num_mpts)) + return -EINVAL; + + dev->caps.reserved_fexch_mpts_base = dev->caps.num_mpts - + (2 * dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH]); + err = mlx4_bitmap_init_with_effective_max(&mr_table->mpt_bitmap, + dev->caps.num_mpts, + ~0, dev->caps.reserved_mrws, + dev->caps.reserved_fexch_mpts_base); if (err) return err; @@ -500,8 +526,9 @@ static inline int mlx4_check_fmr(struct mlx4_fmr *fmr, u64 *page_list, return 0; } -int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list, - int npages, u64 iova, u32 *lkey, u32 *rkey) +int mlx4_map_phys_fmr_fbo(struct mlx4_dev *dev, struct mlx4_fmr *fmr, + u64 *page_list, int npages, u64 iova, u32 fbo, + u32 len, u32 *lkey, u32 *rkey, int same_key) { u32 key; int i, err; @@ -513,7 +540,8 @@ int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list ++fmr->maps; key = key_to_hw_index(fmr->mr.key); - key += dev->caps.num_mpts; + if (same_key) + key += dev->caps.num_mpts; *lkey = *rkey = fmr->mr.key = hw_index_to_key(key); *(u8 *) fmr->mpt = MLX4_MPT_STATUS_SW; @@ -529,8 +557,10 @@ int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list fmr->mpt->key = cpu_to_be32(key); fmr->mpt->lkey = cpu_to_be32(key); - fmr->mpt->length = cpu_to_be64(npages * (1ull << fmr->page_shift)); + fmr->mpt->length = cpu_to_be64(len); fmr->mpt->start = cpu_to_be64(iova); + fmr->mpt->first_byte_offset = cpu_to_be32(fbo & 0x001fffff); + fmr->mpt->flags2 = (fbo ? MLX4_MPT_FLAG2_FBO_EN : 0); /* Make MTT entries are visible before setting MPT status */ wmb(); @@ -542,6 +572,16 @@ int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list return 0; } +EXPORT_SYMBOL_GPL(mlx4_map_phys_fmr_fbo); + +int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list, + int npages, u64 iova, u32 *lkey, u32 *rkey) +{ + u32 len = npages * (1ull << fmr->page_shift); + + return mlx4_map_phys_fmr_fbo(dev, fmr, page_list, npages, iova, 0, + len, lkey, rkey, 1); +} EXPORT_SYMBOL_GPL(mlx4_map_phys_fmr); int mlx4_fmr_alloc(struct mlx4_dev *dev, u32 pd, u32 access, int max_pages, @@ -586,6 +626,49 @@ err_free: } EXPORT_SYMBOL_GPL(mlx4_fmr_alloc); +int mlx4_fmr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, + u32 pd, u32 access, int max_pages, + int max_maps, u8 page_shift, struct mlx4_fmr *fmr) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + u64 mtt_seg; + int err = -ENOMEM; + + if (page_shift < 12 || page_shift >= 32) + return -EINVAL; + + /* All MTTs must fit in the same page */ + if (max_pages * sizeof *fmr->mtts > PAGE_SIZE) + return -EINVAL; + + fmr->page_shift = page_shift; + fmr->max_pages = max_pages; + fmr->max_maps = max_maps; + fmr->maps = 0; + + err = mlx4_mr_alloc_reserved(dev, mridx, pd, 0, 0, access, max_pages, + page_shift, &fmr->mr); + if (err) + return err; + + mtt_seg = fmr->mr.mtt.first_seg * dev->caps.mtt_entry_sz; + + fmr->mtts = mlx4_table_find(&priv->mr_table.mtt_table, + fmr->mr.mtt.first_seg, + &fmr->dma_handle); + if (!fmr->mtts) { + err = -ENOMEM; + goto err_free; + } + + return 0; + +err_free: + mlx4_mr_free_reserved(dev, &fmr->mr); + return err; +} +EXPORT_SYMBOL_GPL(mlx4_fmr_alloc_reserved); + int mlx4_fmr_enable(struct mlx4_dev *dev, struct mlx4_fmr *fmr) { struct mlx4_priv *priv = mlx4_priv(dev); @@ -634,6 +717,18 @@ int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr) } EXPORT_SYMBOL_GPL(mlx4_fmr_free); +int mlx4_fmr_free_reserved(struct mlx4_dev *dev, struct mlx4_fmr *fmr) +{ + if (fmr->maps) + return -EBUSY; + + fmr->mr.enabled = 0; + mlx4_mr_free_reserved(dev, &fmr->mr); + + return 0; +} +EXPORT_SYMBOL_GPL(mlx4_fmr_free_reserved); + int mlx4_SYNC_TPT(struct mlx4_dev *dev) { return mlx4_cmd(dev, 0, 0, 0, MLX4_CMD_SYNC_TPT, 1000); diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 673462c..e417673 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -215,6 +215,7 @@ struct mlx4_caps { int log_num_vlans; int log_num_prios; enum mlx4_port_type port_type[MLX4_MAX_PORTS + 1]; + int reserved_fexch_mpts_base; }; struct mlx4_buf_list { @@ -400,6 +401,12 @@ static inline u32 mlx4_get_ports_of_type(struct mlx4_dev *dev, for ((port) = 1; (port) <= MLX4_MAX_PORTS; ++(port)) \ if (bitmap & 1 << ((port)-1)) + +static inline int mlx4_get_fexch_mpts_base(struct mlx4_dev *dev) +{ + return dev->caps.reserved_fexch_mpts_base; +} + int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct, struct mlx4_buf *buf); void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf); @@ -423,8 +430,12 @@ int mlx4_mtt_init(struct mlx4_dev *dev, int npages, int page_shift, void mlx4_mtt_cleanup(struct mlx4_dev *dev, struct mlx4_mtt *mtt); u64 mlx4_mtt_addr(struct mlx4_dev *dev, struct mlx4_mtt *mtt); +int mlx4_mr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd, + u64 iova, u64 size, u32 access, int npages, + int page_shift, struct mlx4_mr *mr); int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access, int npages, int page_shift, struct mlx4_mr *mr); +void mlx4_mr_free_reserved(struct mlx4_dev *dev, struct mlx4_mr *mr); void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr); int mlx4_mr_enable(struct mlx4_dev *dev, struct mlx4_mr *mr); int mlx4_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt, @@ -469,13 +480,20 @@ void mlx4_unregister_mac(struct mlx4_dev *dev, u8 port, int index); int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index); void mlx4_unregister_vlan(struct mlx4_dev *dev, u8 port, int index); +int mlx4_map_phys_fmr_fbo(struct mlx4_dev *dev, struct mlx4_fmr *fmr, + u64 *page_list, int npages, u64 iova, u32 fbo, + u32 len, u32 *lkey, u32 *rkey, int same_key); int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list, int npages, u64 iova, u32 *lkey, u32 *rkey); +int mlx4_fmr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd, + u32 access, int max_pages, int max_maps, + u8 page_shift, struct mlx4_fmr *fmr); int mlx4_fmr_alloc(struct mlx4_dev *dev, u32 pd, u32 access, int max_pages, int max_maps, u8 page_shift, struct mlx4_fmr *fmr); int mlx4_fmr_enable(struct mlx4_dev *dev, struct mlx4_fmr *fmr); void mlx4_fmr_unmap(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u32 *lkey, u32 *rkey); +int mlx4_fmr_free_reserved(struct mlx4_dev *dev, struct mlx4_fmr *fmr); int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr); int mlx4_SYNC_TPT(struct mlx4_dev *dev); diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h index a5e43fe..d7c0227 100644 --- a/include/linux/mlx4/qp.h +++ b/include/linux/mlx4/qp.h @@ -151,7 +151,16 @@ struct mlx4_qp_context { u8 reserved4[2]; u8 mtt_base_addr_h; __be32 mtt_base_addr_l; - u32 reserved5[10]; + u8 VE; + u8 reserved5; + __be16 VFT_id_prio; + u8 reserved6; + u8 exch_size; + __be16 exch_base; + u8 VFT_hop_cnt; + u8 my_fc_id_idx; + __be16 reserved7; + u32 reserved8[7]; }; /* Which firmware version adds support for NEC (NoErrorCompletion) bit */ -- 1.5.4 From yevgenyp at mellanox.co.il Wed Apr 23 08:11:25 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 23 Apr 2008 18:11:25 +0300 Subject: [ofa-general][PATCH 12/12 v1] mlx4: QP to ready Message-ID: <480F519D.6060101@mellanox.co.il> >From eda80652876695342a68fd2e47d45d1c57d8b511 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Wed, 23 Apr 2008 16:20:42 +0300 Subject: [PATCH] mlx4: Qp to ready Added API to bring a QP from Reset to RTS state. Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/qp.c | 30 ++++++++++++++++++++++++++++++ include/linux/mlx4/qp.h | 4 ++++ 2 files changed, 34 insertions(+), 0 deletions(-) diff --git a/drivers/net/mlx4/qp.c b/drivers/net/mlx4/qp.c index 2d5be15..a6ed9ca 100644 --- a/drivers/net/mlx4/qp.c +++ b/drivers/net/mlx4/qp.c @@ -366,3 +366,33 @@ int mlx4_qp_query(struct mlx4_dev *dev, struct mlx4_qp *qp, } EXPORT_SYMBOL_GPL(mlx4_qp_query); +int mlx4_qp_to_ready(struct mlx4_dev *dev, struct mlx4_mtt *mtt, + struct mlx4_qp_context *context, + struct mlx4_qp *qp, enum mlx4_qp_state *qp_state) +{ +#define CLEAR_STATE_MASK 0xfffffff + int err = 0; + int i; + enum mlx4_qp_state states[] = { + MLX4_QP_STATE_RST, + MLX4_QP_STATE_INIT, + MLX4_QP_STATE_RTR, + MLX4_QP_STATE_RTS + }; + + for (i = 0; i < ARRAY_SIZE(states) - 1; i++) { + context->flags &= cpu_to_be32(CLEAR_STATE_MASK); + context->flags |= cpu_to_be32(states[i+1] << 28); + err = mlx4_qp_modify(dev, mtt, states[i], + states[i+1], context, 0, 0, qp); + if (err) { + mlx4_err(dev, "Failed to bring qp to state:" + "%d with error: %d\n", + states[i+1], err); + return err; + } + *qp_state = states[i+1]; + } + return 0; +} +EXPORT_SYMBOL_GPL(mlx4_qp_to_ready); diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h index d7c0227..96b0e1b 100644 --- a/include/linux/mlx4/qp.h +++ b/include/linux/mlx4/qp.h @@ -305,6 +305,10 @@ int mlx4_qp_modify(struct mlx4_dev *dev, struct mlx4_mtt *mtt, int mlx4_qp_query(struct mlx4_dev *dev, struct mlx4_qp *qp, struct mlx4_qp_context *context); +int mlx4_qp_to_ready(struct mlx4_dev *dev, struct mlx4_mtt *mtt, + struct mlx4_qp_context *context, + struct mlx4_qp *qp, enum mlx4_qp_state *qp_state); + static inline struct mlx4_qp *__mlx4_qp_lookup(struct mlx4_dev *dev, u32 qpn) { return radix_tree_lookup(&dev->qp_table_tree, qpn & (dev->caps.num_qps - 1)); -- 1.5.4 From holt at sgi.com Wed Apr 23 08:45:36 2008 From: holt at sgi.com (Robin Holt) Date: Wed, 23 Apr 2008 10:45:36 -0500 Subject: [ofa-general] Re: [PATCH 04 of 12] Moves all mmu notifier methods outside the PT lock (first and not last In-Reply-To: <20080423134427.GW24536@duo.random> References: <20080422224048.GR24536@duo.random> <20080423134427.GW24536@duo.random> Message-ID: <20080423154536.GV30298@sgi.com> On Wed, Apr 23, 2008 at 03:44:27PM +0200, Andrea Arcangeli wrote: > On Tue, Apr 22, 2008 at 04:14:26PM -0700, Christoph Lameter wrote: > > We want a full solution and this kind of patching makes the patches > > difficuilt to review because later patches revert earlier ones. > > I know you rather want to see KVM development stalled for more months > than to get a partial solution now that already covers KVM and GRU > with the same API that XPMEM will also use later. It's very unfair on > your side to pretend to stall other people development if what you > need has stronger requirements and can't be merged immediately. This > is especially true given it was publically stated that XPMEM never > passed all regression tests anyway, so you can't possibly be in such XPMEM has passed all regression tests using your version 12 notifiers. I have a bug in xpmem which shows up on our 8x oversubscription tests, but that is clearly my bug to figure out. Unfortunately it only shows up on a 128 processor machine so I have 1024 stack traces to sort through each time it fails. Does take a bit of time and a lot of concentration. > an hurry like we are, we can't progress without this. Infact we can SGI is under an equally strict timeline. We really needed the sleeping version into 2.6.26. We may still be able to get this accepted by vendor distros if we make 2.6.27. Thanks, Robin From andrea at qumranet.com Wed Apr 23 08:59:40 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 23 Apr 2008 17:59:40 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080423144747.GU30298@sgi.com> References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> <20080423133619.GV24536@duo.random> <20080423144747.GU30298@sgi.com> Message-ID: <20080423155940.GY24536@duo.random> On Wed, Apr 23, 2008 at 09:47:47AM -0500, Robin Holt wrote: > It also makes the API consistent. What you are proposing is equivalent > to having a file you can open but never close. That's not entirely true, you can close the file just fine it by killing the tasks leading to an mmput. From an user prospective in KVM terms, it won't make a difference as /dev/kvm will remain open and it'll pin the module count until the kvm task is killed anyway, I assume for GRU it's similar. Until I had the idea of how to implement an mm_lock to ensure the mmu_notifier_register could miss a running invalidate_range_begin, it wasn't even possible to implement a mmu_notifier_unregister (see EMM patches) and it looked like you were ok with that API that missed _unregister... > This whole discussion seems ludicrous. You could refactor the code to get > the sorted list of locks, pass that list into mm_lock to do the locking, > do the register/unregister, then pass the same list into mm_unlock. Correct, but it will keep the vmalloc ram pinned during the runtime. There's no reason to keep that ram allocated per-VM while the VM runs. We only need it during the startup and teardown. > If the allocation fails, you could fall back to the older slower method > of repeatedly scanning the lists and acquiring locks in ascending order. Correct, I already thought about that. This is exactly why I'm deferring this for later! Or those perfectionism not needed for KVM/GRU will keep delaying indefinitely the part that is already converged and that's enough for KVM and GRU (and for this specific bit, actually enough for XPMEM as well). We can make a second version of mm_lock_slow to use if mm_lock fails, in mmu_notifier_unregister, with N^2 complexity later, after the mmu-notifier-core is merged into mainline. > If you are not going to provide the _unregister callout you need to change > the API so I can scan the list of notifiers to see if my structures are > already registered. As said 1/N isn't enough for XPMEM anyway. 1/N has to only include the absolute minimum and zero risk stuff, that is enough for both KVM and GRU. > We register our notifier structure at device open time. If we receive a > _release callout, we mark our structure as unregistered. At device close > time, if we have not been unregistered, we call _unregister. If you > take away _unregister, I have an xpmem kernel structure in use _AFTER_ > the device is closed with no indication that the process is using it. > In that case, I need to get an extra reference to the module in my device > open method and hold that reference until the _release callout. Yes exactly, but you've to do that anyway, if mmu_notifier_unregister fails because some driver already allocated all vmalloc space (even x86-64 hasn't indefinite amount of vmalloc because of the vmalloc being in the end of the address space) unless we've a N^2 fallback, but the N^2 fallback will make the code more easily dosable and unkillable, so if I would be an admin I'd prefer having to quickly kill -9 a task in O(N) than having to wait some syscall that runs in O(N^2) to complete before the task quits. So the fallback to a slower algorithm isn't necessarily what will really happen after 2.6.26 is released, we'll see. Relaying on ->release for the module unpin sounds preferable, and it's certainly the only reliable way to unregister that we'll provide in 2.6.26. > Additionally, if the users program reopens the device, I need to scan the > mmu_notifiers list to see if this tasks notifier is already registered. But you don't need any browse the list for this, keep a flag in your structure after the mmu_notifier struct, set the bitflag after mmu_notifier_register returns, and clear the bitflag after ->release runs or after mmu_notifier_unregister returns success. What's the big deal to track if you've to call mmu_notifier_register a second time or not? Or you can create a new structure every time somebody asks to reattach. > I view _unregister as essential. Did I miss something? We can add it later, and we can keep discussing on what's the best model to implement it as long as you want after 2.6.26 is released with mmu-notifier-core so GRU/KVM are done. It's unlikely KVM will use mmu_notifier_unregister anyway as we need it attached for the whole lifetime of the task, and only for the lifetime of the task. This is the patch to add it, as you can see it's entirely orthogonal, backwards compatible with previous API and it doesn't duplicate or rewrite any code. Don't worry, any kernel after 2.6.26 will have unregister, but we can't focus on this for 2.6.26. We can also consider making mmu_notifier_register safe against double calls on the same structure but again that's not something we should be doing in 1/N and it can be done later in a backwards compatible way (plus we're perfectly fine with the API having not backwards compatible changes as long as 2.6.26 can work for us). --------------------------------- Implement unregister but it's not reliable, only ->release is reliable. Signed-off-by: Andrea Arcangeli diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -119,6 +119,8 @@ extern int mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm); +extern int mmu_notifier_unregister(struct mmu_notifier *mn, + struct mm_struct *mm); extern void __mmu_notifier_release(struct mm_struct *mm); extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm, unsigned long address); diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -106,3 +106,29 @@ return ret; } EXPORT_SYMBOL_GPL(mmu_notifier_register); + +/* + * mm_users can't go down to zero while mmu_notifier_unregister() + * runs or it can race with ->release. So a mm_users pin must + * be taken by the caller (if mm can be different from current->mm). + * + * This function can fail (for example during out of memory conditions + * or after vmalloc virtual range shortage), so the only reliable way + * to unregister is to wait release() to be called. + */ +int mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm) +{ + struct mm_lock_data data; + int ret; + + BUG_ON(!atomic_read(&mm->mm_users)); + + ret = mm_lock(mm, &data); + if (unlikely(ret)) + goto out; + hlist_del(&mn->hlist); + mm_unlock(mm, &data); +out: + return ret; +} +EXPORT_SYMBOL_GPL(mmu_notifier_unregister); From michael.heinz at qlogic.com Wed Apr 23 09:08:28 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Wed, 23 Apr 2008 11:08:28 -0500 Subject: [ofa-general] Suggested patches to OFED RPM spec files Message-ID: Installation of OFED 1.3.0.0.4 onto a Kusu/OCS cluster does not fully succeed because of some missing dependencies in the RPM spec files. This is because Kusu installs nodes over a network by presenting a pool of RPMs to be installed and letting RPM figure out the order to install them in. Without the dependencies we ended up with oddities like the kernel drivers being installed before the /usr/bin directory had been populated, causing the install script to fail. I was able to work around this by manually expanding some of the source RPM files, altering the spec file and repackaging the source RPM. This allowed me to build binary RPMs (via the install script) that could be installed on a Kusu cluster. Here are the proposed changes. If there is a better/preferred way of submitting this suggestion, please let me know. --- ../../original/ib-bonding.spec 2008-04-22 12:54:12.000000000 -0400 +++ ib-bonding.spec 2008-04-22 12:43:07.000000000 -0400 @@ -20,6 +20,7 @@ Group : Applications/System License : GPL BuildRoot: %{_tmppath}/%{name}-%{version}-root +PreReq : coreutils %description This package provides a bonding device which is capable of enslaving --- ../../original/ofa_kernel.spec 2008-04-22 12:54:13.000000000 -0400 +++ ofa_kernel.spec 2008-04-22 12:45:40.000000000 -0400 @@ -111,6 +111,9 @@ BuildRequires: sysfsutils-devel %package -n kernel-ib +PreReq: coreutils +PreReq: kernel +PreReq: pciutils Version: %{_version} Release: %{krelver} Summary: Infiniband Driver and ULPs kernel modules @@ -119,6 +122,10 @@ Core, HW and ULPs kernel modules %package -n kernel-ib-devel +PreReq: coreutils +PreReq: kernel +PreReq: pciutils +Requires: kernel-ib Version: %{_version} Release: %{krelver} Summary: Infiniband Driver and ULPs kernel modules sources --- ../../original/open-iscsi-generic.spec 2008-04-22 12:54:13.000000000 -0400 +++ open-iscsi-generic.spec 2008-04-22 12:42:33.000000000 -0400 @@ -21,6 +21,7 @@ %define kversion $(uname -r | sed "s/-ppc64\|-smp//") %package -n iscsi-initiator-utils +PreReq: coreutils Summary : iSCSI daemon and utility programs Group : System Environment/Daemons %description -n iscsi-initiator-utils @@ -30,6 +31,7 @@ Protocol networks. %package -n open-iscsi +PreReq: coreutils Summary : Linux* Open-iSCSI Software Initiator Group : Productivity/Networking/Other %description -n open-iscsi -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrea at qumranet.com Wed Apr 23 09:15:45 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 23 Apr 2008 18:15:45 +0200 Subject: [ofa-general] Re: [PATCH 04 of 12] Moves all mmu notifier methods outside the PT lock (first and not last In-Reply-To: <20080423154536.GV30298@sgi.com> References: <20080422224048.GR24536@duo.random> <20080423134427.GW24536@duo.random> <20080423154536.GV30298@sgi.com> Message-ID: <20080423161544.GZ24536@duo.random> On Wed, Apr 23, 2008 at 10:45:36AM -0500, Robin Holt wrote: > XPMEM has passed all regression tests using your version 12 notifiers. That's great news, thanks! I'd greatly appreciate if you could test #v13 too as I posted it. It already passed GRU and KVM regressions tests and it should work fine for XPMEM too. You can ignore the purely cosmetical error I managed to introduce in mm_lock_cmp (I implemented a BUG_ON that would have trigger if that wasn't a purely cosmetical issue, and it clearly doesn't trigger so you can be sure it's only cosmetical ;). Once I get confirmation that everyone is ok with #v13 I'll push a #v14 before Saturday with that cosmetical error cleaned up and mmu_notifier_unregister moved at the end (XPMEM will have unregister don't worry). I expect the 1/13 of #v14 to go in -mm and then 2.6.26. > I have a bug in xpmem which shows up on our 8x oversubscription tests, > but that is clearly my bug to figure out. Unfortunately it only shows This is what I meant. As opposed we don't have any known bug left in this area, infact we need mmu_notifiers to _fix_ issues I identified that can't be fixed efficiently without mmu notifiers, and we need the mmu notifier to go productive ASAP. > up on a 128 processor machine so I have 1024 stack traces to sort > through each time it fails. Does take a bit of time and a lot of > concentration. Sure, hope you find it soon! > SGI is under an equally strict timeline. We really needed the sleeping > version into 2.6.26. We may still be able to get this accepted by > vendor distros if we make 2.6.27. I don't think vendor distro are less likely to take the patches 2-12 if 1/N (aka mmu-notifier-core) is merged in 2.6.26 especially at the light of kabi. From andrea at qumranet.com Wed Apr 23 09:26:29 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 23 Apr 2008 18:26:29 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: References: <20080422223545.GP24536@duo.random> Message-ID: <20080423162629.GB24536@duo.random> On Tue, Apr 22, 2008 at 04:20:35PM -0700, Christoph Lameter wrote: > I guess I have to prepare another patchset then? If you want to embarrass yourself three time in a row go ahead ;). I thought two failed takeovers was enough. From michaelc at cs.wisc.edu Wed Apr 23 09:33:49 2008 From: michaelc at cs.wisc.edu (Mike Christie) Date: Wed, 23 Apr 2008 11:33:49 -0500 Subject: [ofa-general] Re: [PATCH 1/3] iscsi iser: remove DMA restrictions In-Reply-To: <480F3C84.40606@Voltaire.COM> References: <20080212205252.GB13643@osc.edu> <20080212205403.GC13643@osc.edu><1202850645.3137.132.camel@localhost.localdomain><20080212214632.GA14397@osc.edu><1202853468.3137.148.camel@localhost.localdomain><20080213195912.GC7372@osc.edu> <480C9BF8.9050401@Voltaire.COM> <480F3C84.40606@Voltaire.COM> Message-ID: <480F64ED.7010705@cs.wisc.edu> Erez Zilber wrote: > Erez Zilber wrote: >> Pete Wyckoff wrote: >>> James.Bottomley at HansenPartnership.com wrote on Tue, 12 Feb 2008 >> 15:57 -0600: >>> >>>> On Tue, 2008-02-12 at 16:46 -0500, Pete Wyckoff wrote: >>>> >>>>> James.Bottomley at HansenPartnership.com wrote on Tue, 12 Feb 2008 >> 15:10 -0600: >>>>> >>>>>> On Tue, 2008-02-12 at 15:54 -0500, Pete Wyckoff wrote: >>>>>> >>>>>>> iscsi_iser does not have any hardware DMA restrictions. Add a >>>>>>> slave_configure function to remove any DMA alignment restriction, >>>>>>> allowing the use of direct IO from arbitrary offsets within a page. >>>>>>> Also disable page bouncing; iser has no restrictions on which >> pages it >>>>>>> can address. >>>>>>> >>>>>>> Signed-off-by: Pete Wyckoff >>>>>>> --- >>>>>>> drivers/infiniband/ulp/iser/iscsi_iser.c | 8 ++++++++ >>>>>>> 1 files changed, 8 insertions(+), 0 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c >> b/drivers/infiniband/ulp/iser/iscsi_iser.c >>>>>>> index be1b9fb..1b272a6 100644 >>>>>>> --- a/drivers/infiniband/ulp/iser/iscsi_iser.c >>>>>>> +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c >>>>>>> @@ -543,6 +543,13 @@ iscsi_iser_ep_disconnect(__u64 ep_handle) >>>>>>> iser_conn_terminate(ib_conn); >>>>>>> } >>>>>>> >>>>>>> +static int iscsi_iser_slave_configure(struct scsi_device *sdev) >>>>>>> +{ >>>>>>> + blk_queue_bounce_limit(sdev->request_queue, BLK_BOUNCE_ANY); >>>>>>> >>>>>> You really don't want to do this. That signals to the block >> layer that >>>>>> we have an iommu, although it's practically the same thing as a >> 64 bit >>>>>> DMA mask ... but I'd just leave it to the DMA mask to set this up >>>>>> correctly. Anything else is asking for a subtle bug to turn up years >>>>>> from now when something causes the mask and the limit to be >> mismatched. >>>>>> >>>>> Oh. I decided to add that line for symmetry with TCP, and was >>>>> convinced by the arguments here: >>>>> >>>>> commit b6d44fe9582b9d90a0b16f508ac08a90d899bf56 >>>>> Author: Mike Christie >>>>> Date: Thu Jul 26 12:46:47 2007 -0500 >>>>> >>>>> [SCSI] iscsi_tcp: Turn off bounce buffers >>>>> >>>>> It was found by LSI that on setups with large amounts of memory >>>>> we were bouncing buffers when we did not need to. If the iscsi tcp >>>>> code touches the data buffer (or a helper does), >>>>> it will kmap the buffer. iscsi_tcp also does not interact with >> hardware, >>>>> so it does not have any hw dma restrictions. This patch sets >> the bounce >>>>> buffer settings for our device queue so buffers should not be >> bounced >>>>> because of a driver limit. >>>>> >>>>> I don't see a convenient place to callback into particular iscsi >>>>> devices to set the DMA mask per-host. It has to go on the >>>>> shost_gendev, right?, but only for TCP and iSER, not qla4xxx, which >>>>> handles its DMA mask during device probe. >>>>> >>>> You should be taking your mask from the underlying infiniband device as >>>> part of the setup, shouldn't you? >>>> >>> I think you're right about this. All the existing IB HW tries to >>> set a 64-bit dma mask, but that's no reason to disable the mechanism >>> entirely in iser. I'll remove that line that disables bouncing in >>> my patch. Perhaps Mike will know if the iscsi_tcp usage is still >>> appropriate. >>> >>> >> Let me make sure that I understand: you say that the IB HW driver (e.g. >> ib_mthca) tries to set a 64-bit dma mask: >> >> err = pci_set_dma_mask(pdev, DMA_64BIT_MASK); >> if (err) { >> dev_warn(&pdev->dev, "Warning: couldn't set 64-bit PCI DMA >> mask.\n"); >> err = pci_set_dma_mask(pdev, DMA_32BIT_MASK); >> if (err) { >> dev_err(&pdev->dev, "Can't set PCI DMA mask, aborting.\n"); >> goto err_free_res; >> } >> } >> >> So, in the example above, the driver will use a 64-bit mask or a 32-bit >> mask (or fail). According to that, iSER (and SRP) needs to call >> blk_queue_bounce_limit with the appropriate parameter, right? >> > > Roland, James, > > I'm trying to fix this potential problem in iSER, and I have some > questions about that. How can I get the DMA mask that the HCA driver is > using (DMA_64BIT_MASK or DMA_32BIT_MASK)? Can I get it somehow from > struct ib_device? Is it in ib_device->device? I think what Erez is asking, or maybe it is something I was wondering is, that scsi drivers like lpfc or qla2xxx will do something like: if (dma_set_mask(&scsi_host->pdev->dev, DMA_64BIT_MASK)) dma_set_mask(&scsi_host->pdev->dev, DMA_32BIT_MASK) And when __scsi_alloc_queue calls scsi_calculate_bounce_limit it checks the host's parent dma_mask and sets the bounce_limit for the driver. Does srp/iser need to call the dma_set_mask functions or does the ib_device's device already have the dma info set up? > > Another question is - after I get the DMA mask data from the HCA driver, > I guess that I need to call blk_queue_bounce_limit with the appropriate > parameter (BLK_BOUNCE_HIGH, BLK_BOUNCE_ANY or BLK_BOUNCE_ISA). Which > value should iSER use according to the DMA mask info? For example, if > the HCA driver sets DMA_64BIT_MASK, should iSER use > BLK_BOUNCE_HIGH/BLK_BOUNCE_ANY/BLK_BOUNCE_ISA ? Have you seen how the scsi layer calls blk_queue_bounce_limit when you have a parent device that is setup? In the bnx2i branch I modified iser to be more like srp and traditional drivers, because it accesses the ib_device similar to how other drivers like lpfc or qla2xxx access their parent device for dma funtions, and when the underlying device is removed we now remove the sessions like with other hotplug drivers (we remove sessions from the ib_client remove callout like srp). In the branch, iser allocates a scsi_host per ib_device, and the scsi_host's parent is the ib_device ( ..../ib_device/scsi_host/iscsi_session/scsi_target/scsi_device), so if the dma_mask is set right then the bounce_limit will be set by scsi_calculate_bounce_limit? An alternative could be to keep allocating a host per session, but just call blk_queue_bounce_limit in the scsi_host_template->slave_alloc function? From andrea at qumranet.com Wed Apr 23 09:37:13 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 23 Apr 2008 18:37:13 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080423002848.GA32618@sgi.com> References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com> Message-ID: <20080423163713.GC24536@duo.random> On Tue, Apr 22, 2008 at 07:28:49PM -0500, Jack Steiner wrote: > The GRU driver unregisters the notifier when all GRU mappings > are unmapped. I could make it work either way - either with or without > an unregister function. However, unregister is the most logical > action to take when all mappings have been destroyed. This is true for KVM as well, unregister would be the most logical action to take when the kvm device is closed and the vm destroyed. However we can't implement mm_lock in O(N*log(N)) without triggering RAM allocations. And the size of those ram allocations are unknown at the time unregister runs (they also depend on the max_nr_vmas sysctl). So on a second thought not even passing the array from register to unregister would solve it (unless we allocate max_nr_vmas and we block the sysctl to alter max_nr_vmas if not all unregister run yet).That's clearly unacceptable. The only way to avoid failing because of vmalloc space shortage or oom, would be to provide a O(N*N) fallback. But one that can't be interrupted by sigkill! sigkill interruption was ok in #v12 because we didn't rely on mmu_notifier_unregister to succeed. So it avoided any DoS but it still can't provide any reliable unregister. So in the end unregistering with kill -9 leading to ->release in O(1) sounds safer solution for the long term. You can't loop if unregister fails and pretend your module not to have deadlocks. Yes, waiting ->release add up a bit of complexity but I think it worth it, and there weren't genial ideas on how to avoid O(N*N) complexity and allocations too in mmu_notifier_unregister yet. Until that genius idea will materialize we'll stick with ->release in O(1) as the only safe unregister so we guarantee the admin will be in control of his hardware in O(1) with kill -9 no matter if /dev/kvm and /dev/gru are owned by sillyuser. I'm afraid if you don't want to worst-case unregister with ->release you need to have a better idea than my mm_lock and personally I can't see any other way than mm_lock to ensure not to miss range_begin... All the above is in 2.6.27 context (for 2.6.26 ->release is the way, even if the genius idea would materialize). From steiner at sgi.com Wed Apr 23 10:09:09 2008 From: steiner at sgi.com (Jack Steiner) Date: Wed, 23 Apr 2008 12:09:09 -0500 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: References: Message-ID: <20080423170909.GA1459@sgi.com> You may have spotted this already. If so, just ignore this. It looks like there is a bug in copy_page_range() around line 667. It's possible to do a mmu_notifier_invalidate_range_start(), then return -ENOMEM w/o doing a corresponding mmu_notifier_invalidate_range_end(). --- jack From michaelc at cs.wisc.edu Wed Apr 23 10:16:10 2008 From: michaelc at cs.wisc.edu (Mike Christie) Date: Wed, 23 Apr 2008 12:16:10 -0500 Subject: [ofa-general] Re: [PATCH 1/3] iscsi iser: remove DMA restrictions In-Reply-To: <480F64ED.7010705@cs.wisc.edu> References: <20080212205252.GB13643@osc.edu> <20080212205403.GC13643@osc.edu><1202850645.3137.132.camel@localhost.localdomain><20080212214632.GA14397@osc.edu><1202853468.3137.148.camel@localhost.localdomain><20080213195912.GC7372@osc.edu> <480C9BF8.9050401@Voltaire.COM> <480F3C84.40606@Voltaire.COM> <480F64ED.7010705@cs.wisc.edu> Message-ID: <480F6EDA.9050004@cs.wisc.edu> Mike Christie wrote: > Erez Zilber wrote: >> Erez Zilber wrote: >>> Pete Wyckoff wrote: >>>> James.Bottomley at HansenPartnership.com wrote on Tue, 12 Feb 2008 >>> 15:57 -0600: >>>> >>>>> On Tue, 2008-02-12 at 16:46 -0500, Pete Wyckoff wrote: >>>>> >>>>>> James.Bottomley at HansenPartnership.com wrote on Tue, 12 Feb 2008 >>> 15:10 -0600: >>>>>> >>>>>>> On Tue, 2008-02-12 at 15:54 -0500, Pete Wyckoff wrote: >>>>>>> >>>>>>>> iscsi_iser does not have any hardware DMA restrictions. Add a >>>>>>>> slave_configure function to remove any DMA alignment restriction, >>>>>>>> allowing the use of direct IO from arbitrary offsets within a page. >>>>>>>> Also disable page bouncing; iser has no restrictions on which >>> pages it >>>>>>>> can address. >>>>>>>> >>>>>>>> Signed-off-by: Pete Wyckoff >>>>>>>> --- >>>>>>>> drivers/infiniband/ulp/iser/iscsi_iser.c | 8 ++++++++ >>>>>>>> 1 files changed, 8 insertions(+), 0 deletions(-) >>>>>>>> >>>>>>>> diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c >>> b/drivers/infiniband/ulp/iser/iscsi_iser.c >>>>>>>> index be1b9fb..1b272a6 100644 >>>>>>>> --- a/drivers/infiniband/ulp/iser/iscsi_iser.c >>>>>>>> +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c >>>>>>>> @@ -543,6 +543,13 @@ iscsi_iser_ep_disconnect(__u64 ep_handle) >>>>>>>> iser_conn_terminate(ib_conn); >>>>>>>> } >>>>>>>> >>>>>>>> +static int iscsi_iser_slave_configure(struct scsi_device *sdev) >>>>>>>> +{ >>>>>>>> + blk_queue_bounce_limit(sdev->request_queue, BLK_BOUNCE_ANY); >>>>>>>> >>>>>>> You really don't want to do this. That signals to the block >>> layer that >>>>>>> we have an iommu, although it's practically the same thing as a >>> 64 bit >>>>>>> DMA mask ... but I'd just leave it to the DMA mask to set this up >>>>>>> correctly. Anything else is asking for a subtle bug to turn up >>>>>>> years >>>>>>> from now when something causes the mask and the limit to be >>> mismatched. >>>>>>> >>>>>> Oh. I decided to add that line for symmetry with TCP, and was >>>>>> convinced by the arguments here: >>>>>> >>>>>> commit b6d44fe9582b9d90a0b16f508ac08a90d899bf56 >>>>>> Author: Mike Christie >>>>>> Date: Thu Jul 26 12:46:47 2007 -0500 >>>>>> >>>>>> [SCSI] iscsi_tcp: Turn off bounce buffers >>>>>> >>>>>> It was found by LSI that on setups with large amounts of memory >>>>>> we were bouncing buffers when we did not need to. If the iscsi >>>>>> tcp >>>>>> code touches the data buffer (or a helper does), >>>>>> it will kmap the buffer. iscsi_tcp also does not interact with >>> hardware, >>>>>> so it does not have any hw dma restrictions. This patch sets >>> the bounce >>>>>> buffer settings for our device queue so buffers should not be >>> bounced >>>>>> because of a driver limit. >>>>>> >>>>>> I don't see a convenient place to callback into particular iscsi >>>>>> devices to set the DMA mask per-host. It has to go on the >>>>>> shost_gendev, right?, but only for TCP and iSER, not qla4xxx, which >>>>>> handles its DMA mask during device probe. >>>>>> >>>>> You should be taking your mask from the underlying infiniband >>>>> device as >>>>> part of the setup, shouldn't you? >>>>> >>>> I think you're right about this. All the existing IB HW tries to >>>> set a 64-bit dma mask, but that's no reason to disable the mechanism >>>> entirely in iser. I'll remove that line that disables bouncing in >>>> my patch. Perhaps Mike will know if the iscsi_tcp usage is still >>>> appropriate. >>>> >>>> >>> Let me make sure that I understand: you say that the IB HW driver (e.g. >>> ib_mthca) tries to set a 64-bit dma mask: >>> >>> err = pci_set_dma_mask(pdev, DMA_64BIT_MASK); >>> if (err) { >>> dev_warn(&pdev->dev, "Warning: couldn't set 64-bit PCI DMA >>> mask.\n"); >>> err = pci_set_dma_mask(pdev, DMA_32BIT_MASK); >>> if (err) { >>> dev_err(&pdev->dev, "Can't set PCI DMA mask, aborting.\n"); >>> goto err_free_res; >>> } >>> } >>> >>> So, in the example above, the driver will use a 64-bit mask or a 32-bit >>> mask (or fail). According to that, iSER (and SRP) needs to call >>> blk_queue_bounce_limit with the appropriate parameter, right? >>> >> >> Roland, James, >> >> I'm trying to fix this potential problem in iSER, and I have some >> questions about that. How can I get the DMA mask that the HCA driver is >> using (DMA_64BIT_MASK or DMA_32BIT_MASK)? Can I get it somehow from >> struct ib_device? Is it in ib_device->device? > > I think what Erez is asking, or maybe it is something I was wondering > is, that scsi drivers like lpfc or qla2xxx will do something like: > > if (dma_set_mask(&scsi_host->pdev->dev, DMA_64BIT_MASK)) > dma_set_mask(&scsi_host->pdev->dev, DMA_32BIT_MASK) > > And when __scsi_alloc_queue calls scsi_calculate_bounce_limit it checks > the host's parent dma_mask and sets the bounce_limit for the driver. > > Does srp/iser need to call the dma_set_mask functions or does the > ib_device's device already have the dma info set up? Nevermind. I misread the mail. We know the ib hw driver sets the mask. I guess what we are debating is if we should set the scsi_host's parent to the ib_device so the dma mask is picked up, or if should just set them in our slave_configure by calling blk_queue_bounce_limit. And if we use the blk_queue_bounce_limit path, what function do we call to get the dma_mask. I also modified iser to allocate a host per ib_deivce so it works like other scsi drivers since we know the parent. Is this preferred over the host per session style. Does it matter? bnx2i works similar to iser where we use a libiscsi, and dma against a real device. Should it do a host per session or host per netdev? And if we do not allocate a host per ib_device/netdevice what should we allocate per those structs? Should we create our own? From andrea at qumranet.com Wed Apr 23 10:24:32 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 23 Apr 2008 19:24:32 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080423162629.GB24536@duo.random> References: <20080422223545.GP24536@duo.random> <20080423162629.GB24536@duo.random> Message-ID: <20080423172432.GE24536@duo.random> On Wed, Apr 23, 2008 at 06:26:29PM +0200, Andrea Arcangeli wrote: > On Tue, Apr 22, 2008 at 04:20:35PM -0700, Christoph Lameter wrote: > > I guess I have to prepare another patchset then? Apologies for my previous not too polite comment in answer to the above, but I thought this double patchset was over now that you converged in #v12 and obsoleted EMM and after the last private discussions. There's nothing personal here on my side, just a bit of general frustration on this matter. I appreciate all great contribution from you, as last your idea to use sort(), but I can't really see any possible benefit or justification anymore from keeping two patchsets floating around given we already converged on the mmu-notifier-core, and given it's almost certain mmu-notifier-core will go in -mm in time for 2.6.26. Let's put it this way, if I fail to merge mmu-notifier-core into 2.6.26 I'll voluntarily give up my entire patchset and leave maintainership to you so you move 1/N to N/N and remove mm_lock-sem patch (everything else can remain the same as it's all orthogonal so changing the order is a matter of minutes). From rdreier at cisco.com Wed Apr 23 10:24:47 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 23 Apr 2008 10:24:47 -0700 Subject: [ofa-general] Re: [PATCH 1/1 v1] MLX4: Added resize_cq capability. In-Reply-To: <480F428C.7080701@dev.mellanox.co.il> (Vladimir Sokolovsky's message of "Wed, 23 Apr 2008 17:07:08 +0300") References: <47E923CA.90804@dev.mellanox.co.il> <47F0A5A5.2010208@dev.mellanox.co.il> <480F428C.7080701@dev.mellanox.co.il> Message-ID: yikes, thanks, applied. sorry for messing up your original patch. which reminds me... I need to push out the libmlx4 side of things... I'll do that today, please test it when I do. Thanks, Roland From sean.hefty at intel.com Wed Apr 23 10:25:44 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 23 Apr 2008 10:25:44 -0700 Subject: [ofa-general] mapping IP addresses to GIDs across IP subnets In-Reply-To: References: <000401c8a4ca$c156a810$94248686@amr.corp.intel.com> Message-ID: <000001c8a567$0fbfed30$b037170a@amr.corp.intel.com> >> * Use some standard address mapping protocol that I'm not aware of. >> * Use global IB service resolution. >> * Define/extend an address resolution protocol that operates over IP. >> * Define/extend an address resolution protocol that operates over UDP. >> >> I'm hoping that someone has a wonderfully brilliant idea for this >> that would take about 1 day to implement. :) >> >> - Sean > >Is it time to bring back ATS? > >http://lists.openfabrics.org/pipermail/general/2005-August/010247.html That's one possibility (option 2 above). But this needs to be global, not per subnet, so my personal preference (as of right now) is to avoid it. - Sean From hrosenstock at xsigo.com Wed Apr 23 10:32:37 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Wed, 23 Apr 2008 10:32:37 -0700 Subject: [ofa-general] mapping IP addresses to GIDs across IP subnets In-Reply-To: <000401c8a4ca$c156a810$94248686@amr.corp.intel.com> References: <000401c8a4ca$c156a810$94248686@amr.corp.intel.com> Message-ID: <1208971957.689.167.camel@hrosenstock-ws.xsigo.com> Sean, On Tue, 2008-04-22 at 15:46 -0700, Sean Hefty wrote: > I have a need to start looking at possible ways to map IP address to GIDs when > crossing IP (and IB) subnets. This would be in addition to or replace the ARP > use by the rdma_cm. Is this in the context of IB routers and/or RDMA gateways, or something else ? -- Hal > Possibilities include: > > * Use some standard address mapping protocol that I'm not aware of. > * Use global IB service resolution. > * Define/extend an address resolution protocol that operates over IP. > * Define/extend an address resolution protocol that operates over UDP. > > I'm hoping that someone has a wonderfully brilliant idea for this that would > take about 1 day to implement. :) > > - Sean > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Wed Apr 23 10:37:04 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 23 Apr 2008 10:37:04 -0700 Subject: [ofa-general] [PATCH/RFC] RDMA/nes: Remove volatile qualifier from struct nes_hw_cq.cq_vbase Message-ID: Remove the volatile qualifier from the cq_vbase member of struct nes_hw_cq, and add an rmb() in the one place where it looks like access order might make a difference. As usual, removing a volatile qualifier in a declaration is actually a bug fix, since a volatile qualifier is not sufficient to make sure that aggressively out-of-order CPUs don't reorder things and cause incorrect results. For example, a CPU might speculatively execute reads of other cqe fields before the NIC hardware has written those fields and before it has set the NES_CQE_VALID bit (even though those reads come after the test of the NES_CQE_VALID bit in program order), but then when the CPU actually executes the conditional test of the NES_CQE_VALID, the bit has been set, and the CPU will proceed with the results of the earlier speculative execution and end up using bogus data. This also gets rid of the warning: drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_destroy_cq': drivers/infiniband/hw/nes/nes_verbs.c:1978: warning: passing argument 3 of 'pci_free_consistent' discards qualifiers from pointer target type Signed-off-by: Roland Dreier --- drivers/infiniband/hw/nes/nes_hw.h | 2 +- drivers/infiniband/hw/nes/nes_verbs.c | 8 +++++++- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h index b7e2844..8f36e23 100644 --- a/drivers/infiniband/hw/nes/nes_hw.h +++ b/drivers/infiniband/hw/nes/nes_hw.h @@ -905,7 +905,7 @@ struct nes_hw_qp { }; struct nes_hw_cq { - struct nes_hw_cqe volatile *cq_vbase; /* PCI memory for host rings */ + struct nes_hw_cqe *cq_vbase; /* PCI memory for host rings */ void (*ce_handler)(struct nes_device *nesdev, struct nes_hw_cq *cq); dma_addr_t cq_pbase; /* PCI memory for host rings */ u16 cq_head; diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c index f9a5d43..ee74f7c 100644 --- a/drivers/infiniband/hw/nes/nes_verbs.c +++ b/drivers/infiniband/hw/nes/nes_verbs.c @@ -1976,7 +1976,7 @@ static int nes_destroy_cq(struct ib_cq *ib_cq) if (nescq->cq_mem_size) pci_free_consistent(nesdev->pcidev, nescq->cq_mem_size, - (void *)nescq->hw_cq.cq_vbase, nescq->hw_cq.cq_pbase); + nescq->hw_cq.cq_vbase, nescq->hw_cq.cq_pbase); kfree(nescq); return ret; @@ -3610,6 +3610,12 @@ static int nes_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry) while (cqe_count < num_entries) { if (le32_to_cpu(nescq->hw_cq.cq_vbase[head].cqe_words[NES_CQE_OPCODE_IDX]) & NES_CQE_VALID) { + /* + * Make sure we read CQ entry contents *after* + * we've checked the valid bit. + */ + rmb(); + cqe = nescq->hw_cq.cq_vbase[head]; nescq->hw_cq.cq_vbase[head].cqe_words[NES_CQE_OPCODE_IDX] = 0; u32temp = le32_to_cpu(cqe.cqe_words[NES_CQE_COMP_COMP_CTX_LOW_IDX]); -- 1.5.5.1 From sean.hefty at intel.com Wed Apr 23 10:37:33 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 23 Apr 2008 10:37:33 -0700 Subject: [ofa-general] mapping IP addresses to GIDs across IP subnets In-Reply-To: <1208971957.689.167.camel@hrosenstock-ws.xsigo.com> References: <000401c8a4ca$c156a810$94248686@amr.corp.intel.com> <1208971957.689.167.camel@hrosenstock-ws.xsigo.com> Message-ID: <000201c8a568$b6ae57c0$b037170a@amr.corp.intel.com> >On Tue, 2008-04-22 at 15:46 -0700, Sean Hefty wrote: >> I have a need to start looking at possible ways to map IP address to GIDs >when >> crossing IP (and IB) subnets. This would be in addition to or replace the >ARP >> use by the rdma_cm. > >Is this in the context of IB routers and/or RDMA gateways, or something >else ? IB routers From ralph.campbell at qlogic.com Wed Apr 23 10:43:03 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 23 Apr 2008 10:43:03 -0700 Subject: [ofa-general] [PATCH] IB/core - reset to error state transition not allowed Message-ID: <1208972583.2232.107.camel@brick.pathscale.com> I was reviewing the QP state transition diagram in the IB 1.2.1 spec. and the code for qp_state_table[], and noticed that the code allows a QP to be modified from IB_QPS_RESET to IB_QPS_ERR whereas the notes for figure 124 (pg 457) specifically says that this transition isn't allowed. Signed-off-by: Ralph Campbell diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 0504208..379239f 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -317,7 +317,6 @@ static const struct { } qp_state_table[IB_QPS_ERR + 1][IB_QPS_ERR + 1] = { [IB_QPS_RESET] = { [IB_QPS_RESET] = { .valid = 1 }, - [IB_QPS_ERR] = { .valid = 1 }, [IB_QPS_INIT] = { .valid = 1, .req_param = { From michaelc at cs.wisc.edu Wed Apr 23 10:43:30 2008 From: michaelc at cs.wisc.edu (Mike Christie) Date: Wed, 23 Apr 2008 12:43:30 -0500 Subject: [ofa-general] Re: [PATCH 1/3] iscsi iser: remove DMA restrictions In-Reply-To: <480F6EDA.9050004@cs.wisc.edu> References: <20080212205252.GB13643@osc.edu> <20080212205403.GC13643@osc.edu><1202850645.3137.132.camel@localhost.localdomain><20080212214632.GA14397@osc.edu><1202853468.3137.148.camel@localhost.localdomain><20080213195912.GC7372@osc.edu> <480C9BF8.9050401@Voltaire.COM> <480F3C84.40606@Voltaire.COM> <480F64ED.7010705@cs.wisc.edu> <480F6EDA.9050004@cs.wisc.edu> Message-ID: <480F7542.4070000@cs.wisc.edu> Mike Christie wrote: > Mike Christie wrote: >> Erez Zilber wrote: >>> Erez Zilber wrote: >>>> Pete Wyckoff wrote: >>>>> James.Bottomley at HansenPartnership.com wrote on Tue, 12 Feb 2008 >>>> 15:57 -0600: >>>>> >>>>>> On Tue, 2008-02-12 at 16:46 -0500, Pete Wyckoff wrote: >>>>>> >>>>>>> James.Bottomley at HansenPartnership.com wrote on Tue, 12 Feb 2008 >>>> 15:10 -0600: >>>>>>> >>>>>>>> On Tue, 2008-02-12 at 15:54 -0500, Pete Wyckoff wrote: >>>>>>>> >>>>>>>>> iscsi_iser does not have any hardware DMA restrictions. Add a >>>>>>>>> slave_configure function to remove any DMA alignment restriction, >>>>>>>>> allowing the use of direct IO from arbitrary offsets within a >>>>>>>>> page. >>>>>>>>> Also disable page bouncing; iser has no restrictions on which >>>> pages it >>>>>>>>> can address. >>>>>>>>> >>>>>>>>> Signed-off-by: Pete Wyckoff >>>>>>>>> --- >>>>>>>>> drivers/infiniband/ulp/iser/iscsi_iser.c | 8 ++++++++ >>>>>>>>> 1 files changed, 8 insertions(+), 0 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c >>>> b/drivers/infiniband/ulp/iser/iscsi_iser.c >>>>>>>>> index be1b9fb..1b272a6 100644 >>>>>>>>> --- a/drivers/infiniband/ulp/iser/iscsi_iser.c >>>>>>>>> +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c >>>>>>>>> @@ -543,6 +543,13 @@ iscsi_iser_ep_disconnect(__u64 ep_handle) >>>>>>>>> iser_conn_terminate(ib_conn); >>>>>>>>> } >>>>>>>>> >>>>>>>>> +static int iscsi_iser_slave_configure(struct scsi_device *sdev) >>>>>>>>> +{ >>>>>>>>> + blk_queue_bounce_limit(sdev->request_queue, BLK_BOUNCE_ANY); >>>>>>>>> >>>>>>>> You really don't want to do this. That signals to the block >>>> layer that >>>>>>>> we have an iommu, although it's practically the same thing as a >>>> 64 bit >>>>>>>> DMA mask ... but I'd just leave it to the DMA mask to set this up >>>>>>>> correctly. Anything else is asking for a subtle bug to turn up >>>>>>>> years >>>>>>>> from now when something causes the mask and the limit to be >>>> mismatched. >>>>>>>> >>>>>>> Oh. I decided to add that line for symmetry with TCP, and was >>>>>>> convinced by the arguments here: >>>>>>> >>>>>>> commit b6d44fe9582b9d90a0b16f508ac08a90d899bf56 >>>>>>> Author: Mike Christie >>>>>>> Date: Thu Jul 26 12:46:47 2007 -0500 >>>>>>> >>>>>>> [SCSI] iscsi_tcp: Turn off bounce buffers >>>>>>> >>>>>>> It was found by LSI that on setups with large amounts of memory >>>>>>> we were bouncing buffers when we did not need to. If the >>>>>>> iscsi tcp >>>>>>> code touches the data buffer (or a helper does), >>>>>>> it will kmap the buffer. iscsi_tcp also does not interact with >>>> hardware, >>>>>>> so it does not have any hw dma restrictions. This patch sets >>>> the bounce >>>>>>> buffer settings for our device queue so buffers should not be >>>> bounced >>>>>>> because of a driver limit. >>>>>>> >>>>>>> I don't see a convenient place to callback into particular iscsi >>>>>>> devices to set the DMA mask per-host. It has to go on the >>>>>>> shost_gendev, right?, but only for TCP and iSER, not qla4xxx, which >>>>>>> handles its DMA mask during device probe. >>>>>>> >>>>>> You should be taking your mask from the underlying infiniband >>>>>> device as >>>>>> part of the setup, shouldn't you? >>>>>> >>>>> I think you're right about this. All the existing IB HW tries to >>>>> set a 64-bit dma mask, but that's no reason to disable the mechanism >>>>> entirely in iser. I'll remove that line that disables bouncing in >>>>> my patch. Perhaps Mike will know if the iscsi_tcp usage is still >>>>> appropriate. >>>>> >>>>> >>>> Let me make sure that I understand: you say that the IB HW driver (e.g. >>>> ib_mthca) tries to set a 64-bit dma mask: >>>> >>>> err = pci_set_dma_mask(pdev, DMA_64BIT_MASK); >>>> if (err) { >>>> dev_warn(&pdev->dev, "Warning: couldn't set 64-bit PCI DMA >>>> mask.\n"); >>>> err = pci_set_dma_mask(pdev, DMA_32BIT_MASK); >>>> if (err) { >>>> dev_err(&pdev->dev, "Can't set PCI DMA mask, aborting.\n"); >>>> goto err_free_res; >>>> } >>>> } >>>> >>>> So, in the example above, the driver will use a 64-bit mask or a 32-bit >>>> mask (or fail). According to that, iSER (and SRP) needs to call >>>> blk_queue_bounce_limit with the appropriate parameter, right? >>>> >>> >>> Roland, James, >>> >>> I'm trying to fix this potential problem in iSER, and I have some >>> questions about that. How can I get the DMA mask that the HCA driver is >>> using (DMA_64BIT_MASK or DMA_32BIT_MASK)? Can I get it somehow from >>> struct ib_device? Is it in ib_device->device? >> >> I think what Erez is asking, or maybe it is something I was wondering >> is, that scsi drivers like lpfc or qla2xxx will do something like: >> >> if (dma_set_mask(&scsi_host->pdev->dev, DMA_64BIT_MASK)) >> dma_set_mask(&scsi_host->pdev->dev, DMA_32BIT_MASK) >> >> And when __scsi_alloc_queue calls scsi_calculate_bounce_limit it >> checks the host's parent dma_mask and sets the bounce_limit for the >> driver. >> >> Does srp/iser need to call the dma_set_mask functions or does the >> ib_device's device already have the dma info set up? > > Nevermind. I misread the mail. We know the ib hw driver sets the mask. I > guess what we are debating is if we should set the scsi_host's parent > to the ib_device so the dma mask is picked up, or if should just set > them in our slave_configure by calling blk_queue_bounce_limit. And if we > use the blk_queue_bounce_limit path, what function do we call to get the > dma_mask. > Oh man, I should have looked at the code before posting. For this last part, if we do not set a correct host parent I guess we have to just dupicate what scsi_calculate_bounce_limit does. It would be a waste to copy that code for iser. I guess we could modify scsi_calculate_bounce_limit somehow. From andrea at qumranet.com Wed Apr 23 10:45:50 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 23 Apr 2008 19:45:50 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080423170909.GA1459@sgi.com> References: <20080423170909.GA1459@sgi.com> Message-ID: <20080423174550.GF24536@duo.random> On Wed, Apr 23, 2008 at 12:09:09PM -0500, Jack Steiner wrote: > > You may have spotted this already. If so, just ignore this. > > It looks like there is a bug in copy_page_range() around line 667. > It's possible to do a mmu_notifier_invalidate_range_start(), then > return -ENOMEM w/o doing a corresponding mmu_notifier_invalidate_range_end(). No I didn't spot it yet, great catch!! ;) Thanks a lot. I think we can take example by Jack and use our energy to spot any bug in the mmu-notifier-core like with his above auditing effort (I'm quite certain you didn't reprouce this with real oom ;) so we get a rock solid mmu-notifier implementation in 2.6.26 so XPMEM will also benefit later in 2.6.27 and I hope the last XPMEM internal bugs will also be fixed by that time. (for the not going to become mmu-notifier users, nothing to worry about for you, unless you used KVM or GRU actively with mmu-notifiers this bug would be entirely harmless with both MMU_NOTIFIER=n and =y, as previously guaranteed) Here the still untested fix for review. diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -597,6 +597,7 @@ unsigned long next; unsigned long addr = vma->vm_start; unsigned long end = vma->vm_end; + int ret; /* * Don't copy ptes where a page fault will fill them correctly. @@ -604,33 +605,39 @@ * readonly mappings. The tradeoff is that copy_page_range is more * efficient than faulting. */ + ret = 0; if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_PFNMAP|VM_INSERTPAGE))) { if (!vma->anon_vma) - return 0; + goto out; } - if (is_vm_hugetlb_page(vma)) - return copy_hugetlb_page_range(dst_mm, src_mm, vma); + if (unlikely(is_vm_hugetlb_page(vma))) { + ret = copy_hugetlb_page_range(dst_mm, src_mm, vma); + goto out; + } if (is_cow_mapping(vma->vm_flags)) mmu_notifier_invalidate_range_start(src_mm, addr, end); + ret = 0; dst_pgd = pgd_offset(dst_mm, addr); src_pgd = pgd_offset(src_mm, addr); do { next = pgd_addr_end(addr, end); if (pgd_none_or_clear_bad(src_pgd)) continue; - if (copy_pud_range(dst_mm, src_mm, dst_pgd, src_pgd, - vma, addr, next)) - return -ENOMEM; + if (unlikely(copy_pud_range(dst_mm, src_mm, dst_pgd, src_pgd, + vma, addr, next))) { + ret = -ENOMEM; + break; + } } while (dst_pgd++, src_pgd++, addr = next, addr != end); if (is_cow_mapping(vma->vm_flags)) mmu_notifier_invalidate_range_end(src_mm, - vma->vm_start, end); - - return 0; + vma->vm_start, end); +out: + return ret; } static unsigned long zap_pte_range(struct mmu_gather *tlb, From rdreier at cisco.com Wed Apr 23 10:55:42 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 23 Apr 2008 10:55:42 -0700 Subject: [ofa-general] Re: [PATCH] IB/core - reset to error state transition not allowed In-Reply-To: <1208972583.2232.107.camel@brick.pathscale.com> (Ralph Campbell's message of "Wed, 23 Apr 2008 10:43:03 -0700") References: <1208972583.2232.107.camel@brick.pathscale.com> Message-ID: > I was reviewing the QP state transition diagram in the IB 1.2.1 > spec. and the code for qp_state_table[], and noticed that > the code allows a QP to be modified from IB_QPS_RESET to > IB_QPS_ERR whereas the notes for figure 124 (pg 457) > specifically says that this transition isn't allowed. This is a change from the 1.2 spec, which says: It is possible to transition from any state to either the Error state or the Reset state with the Modify QP/EE Verb. Does anyone know why this change was made? We specifically added code to some low-level drivers to handle RESET->ERROR transitions, so I guess someone cared (although maybe it was just for absolute spec compliance). - R. From clameter at sgi.com Wed Apr 23 11:02:18 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 23 Apr 2008 11:02:18 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 04 of 12] Moves all mmu notifier methods outside the PT lock (first and not last In-Reply-To: <20080423134427.GW24536@duo.random> References: <20080422224048.GR24536@duo.random> <20080423134427.GW24536@duo.random> Message-ID: On Wed, 23 Apr 2008, Andrea Arcangeli wrote: > I know you rather want to see KVM development stalled for more months > than to get a partial solution now that already covers KVM and GRU > with the same API that XPMEM will also use later. It's very unfair on > your side to pretend to stall other people development if what you > need has stronger requirements and can't be merged immediately. This > is especially true given it was publically stated that XPMEM never > passed all regression tests anyway, so you can't possibly be in such > an hurry like we are, we can't progress without this. Infact we can > but it would be an huge effort and it would run _slower_ and it would > all need to be deleted once mmu notifiers are in. We have had this workaround effort done years ago and have been suffering the ill effects of pinning for years. Had to deal with it again and again so I guess we do not matter? Certainly we have no interest in stalling KVM development. From clameter at sgi.com Wed Apr 23 11:09:35 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 23 Apr 2008 11:09:35 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080423155940.GY24536@duo.random> References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> <20080423133619.GV24536@duo.random> <20080423144747.GU30298@sgi.com> <20080423155940.GY24536@duo.random> Message-ID: On Wed, 23 Apr 2008, Andrea Arcangeli wrote: > Implement unregister but it's not reliable, only ->release is reliable. Why is there still the hlist stuff being used for the mmu notifier list? And why is this still unsafe? There are cases in which you do not take the reverse map locks or mmap_sem while traversing the notifier list? This hope for inclusion without proper review (first for .25 now for .26) seems to interfere with the patch cleanup work and cause delay after delay for getting the patch ready. On what basis do you think that there is a chance of any of these patches making it into 2.6.26 given that this patchset has never been vetted in Andrew's tree? From rdreier at cisco.com Wed Apr 23 11:14:37 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 23 Apr 2008 11:14:37 -0700 Subject: [ofa-general] Re: [PATCH v1] libmlx4: Added resize CQ capability. In-Reply-To: <47F0A606.2060500@dev.mellanox.co.il> (Vladimir Sokolovsky's message of "Mon, 31 Mar 2008 11:51:18 +0300") References: <47E92539.7030908@dev.mellanox.co.il> <47F0A606.2060500@dev.mellanox.co.il> Message-ID: > + if ((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) == MLX4_CQE_OPCODE_RESIZE) > + goto repoll; seems like this can never happen in userspace, since we can hold the CQ lock the whole time the resize is in progress? > +int mlx4_get_outstanding_cqes(struct mlx4_cq *cq) > +{ > + int i; This needs to be unsigned I think to avoid undefined overflow issues... (although in practice I guess it probably doesn't matter) > + > + for (i = cq->cons_index; get_sw_cqe(cq, (i & cq->ibv_cq.cqe)); ++i) > + ; > + > + return i - cq->cons_index; > +} Anyway I deleted the changes to the polling path and updated the variable, and applied it. Please let me know if I messed something up... From clameter at sgi.com Wed Apr 23 11:15:16 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 23 Apr 2008 11:15:16 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080423162629.GB24536@duo.random> References: <20080422223545.GP24536@duo.random> <20080423162629.GB24536@duo.random> Message-ID: On Wed, 23 Apr 2008, Andrea Arcangeli wrote: > On Tue, Apr 22, 2008 at 04:20:35PM -0700, Christoph Lameter wrote: > > I guess I have to prepare another patchset then? > > If you want to embarrass yourself three time in a row go ahead ;). I > thought two failed takeovers was enough. Takeover? I'd be happy if I would not have to deal with this issue. These patches were necessary because you were not listening to feedback plus there is the issue that your patchsets were not easy to review or diff against. I had to merge several patches to get to a useful patch. You have always picked up lots of stuff from my patchsets. Lots of work that could have been avoided by proper patchsets in the first place. From andrea at qumranet.com Wed Apr 23 11:16:51 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 23 Apr 2008 20:16:51 +0200 Subject: [ofa-general] Re: [PATCH 04 of 12] Moves all mmu notifier methods outside the PT lock (first and not last In-Reply-To: References: <20080422224048.GR24536@duo.random> <20080423134427.GW24536@duo.random> Message-ID: <20080423181651.GH24536@duo.random> On Wed, Apr 23, 2008 at 11:02:18AM -0700, Christoph Lameter wrote: > We have had this workaround effort done years ago and have been > suffering the ill effects of pinning for years. Had to deal with Yes. In addition to the pinning, there's lot of additional tlb flushing work to do in kvm without mmu notifiers as the swapcache could be freed by the vm the instruction after put_page unpins the page for whatever reason. From clameter at sgi.com Wed Apr 23 11:19:26 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 23 Apr 2008 11:19:26 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080423163713.GC24536@duo.random> References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com> <20080423163713.GC24536@duo.random> Message-ID: On Wed, 23 Apr 2008, Andrea Arcangeli wrote: > The only way to avoid failing because of vmalloc space shortage or > oom, would be to provide a O(N*N) fallback. But one that can't be > interrupted by sigkill! sigkill interruption was ok in #v12 because we > didn't rely on mmu_notifier_unregister to succeed. So it avoided any > DoS but it still can't provide any reliable unregister. If unregister fails then the driver should not detach from the address space immediately but wait until -->release is called. That may be a possible solution. It will be rare that the unregister fails. From andrea at qumranet.com Wed Apr 23 11:19:28 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 23 Apr 2008 20:19:28 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> <20080423133619.GV24536@duo.random> <20080423144747.GU30298@sgi.com> <20080423155940.GY24536@duo.random> Message-ID: <20080423181928.GI24536@duo.random> On Wed, Apr 23, 2008 at 11:09:35AM -0700, Christoph Lameter wrote: > Why is there still the hlist stuff being used for the mmu notifier list? > And why is this still unsafe? What's the problem with hlist, it saves 8 bytes for each mm_struct, you should be using it too instead of list. > There are cases in which you do not take the reverse map locks or mmap_sem > while traversing the notifier list? There aren't. > This hope for inclusion without proper review (first for .25 now for .26) > seems to interfere with the patch cleanup work and cause delay after delay > for getting the patch ready. On what basis do you think that there is a > chance of any of these patches making it into 2.6.26 given that this > patchset has never been vetted in Andrew's tree? Let's say I try to be optimistic and hope the right thing will happen given this is like a new driver that can't hurt anybody but KVM and GRU if there's any bug. But in my view what interfere with proper review for .26 are the endless discussions we're doing ;). From rdreier at cisco.com Wed Apr 23 11:21:28 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 23 Apr 2008 11:21:28 -0700 Subject: [ofa-general] Re: [PATCH v1] libmlx4: Added resize CQ capability. In-Reply-To: <47F0A606.2060500@dev.mellanox.co.il> (Vladimir Sokolovsky's message of "Mon, 31 Mar 2008 11:51:18 +0300") References: <47E92539.7030908@dev.mellanox.co.il> <47F0A606.2060500@dev.mellanox.co.il> Message-ID: > + cqe = align_queue_size(cqe); Oh yeah... shouldn't this be cqe = align_queue_size(cqe + 1); to allow for resizing the CQ again later? I made that change when I applied. From clameter at sgi.com Wed Apr 23 11:21:49 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 23 Apr 2008 11:21:49 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080423172432.GE24536@duo.random> References: <20080422223545.GP24536@duo.random> <20080423162629.GB24536@duo.random> <20080423172432.GE24536@duo.random> Message-ID: On Wed, 23 Apr 2008, Andrea Arcangeli wrote: > will go in -mm in time for 2.6.26. Let's put it this way, if I fail to > merge mmu-notifier-core into 2.6.26 I'll voluntarily give up my entire > patchset and leave maintainership to you so you move 1/N to N/N and > remove mm_lock-sem patch (everything else can remain the same as it's > all orthogonal so changing the order is a matter of minutes). No I really want you to do this. I have no interest in a takeover in the future and have done the EMM stuff only because I saw no other way forward. I just want this be done the right way for all parties with patches that are nice and mergeable. From andrea at qumranet.com Wed Apr 23 11:25:06 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 23 Apr 2008 20:25:06 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com> <20080423163713.GC24536@duo.random> Message-ID: <20080423182506.GJ24536@duo.random> On Wed, Apr 23, 2008 at 11:19:26AM -0700, Christoph Lameter wrote: > If unregister fails then the driver should not detach from the address > space immediately but wait until -->release is called. That may be > a possible solution. It will be rare that the unregister fails. This is the current idea, exactly. Unless we find a way to replace mm_lock with something else, I don't see a way to make mmu_notifier_unregister reliable without wasting ram. From clameter at sgi.com Wed Apr 23 11:27:21 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 23 Apr 2008 11:27:21 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080423181928.GI24536@duo.random> References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> <20080423133619.GV24536@duo.random> <20080423144747.GU30298@sgi.com> <20080423155940.GY24536@duo.random> <20080423181928.GI24536@duo.random> Message-ID: On Wed, 23 Apr 2008, Andrea Arcangeli wrote: > On Wed, Apr 23, 2008 at 11:09:35AM -0700, Christoph Lameter wrote: > > Why is there still the hlist stuff being used for the mmu notifier list? > > And why is this still unsafe? > > What's the problem with hlist, it saves 8 bytes for each mm_struct, > you should be using it too instead of list. list heads in mm_struct and in the mmu_notifier struct seemed to be more consistent. We have no hash list after all. > > > There are cases in which you do not take the reverse map locks or mmap_sem > > while traversing the notifier list? > > There aren't. There is a potential issue in move_ptes where you call invalidate_range_end after dropping i_mmap_sem whereas my patches did the opposite. Mmap_sem saves you there? From andrea at qumranet.com Wed Apr 23 11:34:18 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 23 Apr 2008 20:34:18 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: References: <20080422223545.GP24536@duo.random> <20080423162629.GB24536@duo.random> <20080423172432.GE24536@duo.random> Message-ID: <20080423183418.GK24536@duo.random> On Wed, Apr 23, 2008 at 11:21:49AM -0700, Christoph Lameter wrote: > No I really want you to do this. I have no interest in a takeover in the Ok if you want me to do this, I definitely prefer the core to go in now. It's so much easier to concentrate on two problems at different times then to attack both problems at the same time given they're mostly completely orthogonal problems. Given we already solved one problem, I'd like to close it before concentrating on the second problem. I already told you it was my interest to support XPMEM too. For example it was me to notice we couldn't possibly remove can_sleep parameter from invalidate_range without altering the locking as vmas were unstable outside of one of the three core vm locks. That finding resulted in much bigger patches than we hoped (like Andrew previously sort of predicted) and you did all great work to develop those. From my part, once the converged part is in, it'll be a lot easier to fully concentrate on the rest. My main focus right now is to produce a mmu-notifier-core that is entirely bug free for .26. From andrea at qumranet.com Wed Apr 23 11:37:18 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Wed, 23 Apr 2008 20:37:18 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> <20080423133619.GV24536@duo.random> <20080423144747.GU30298@sgi.com> <20080423155940.GY24536@duo.random> <20080423181928.GI24536@duo.random> Message-ID: <20080423183718.GL24536@duo.random> On Wed, Apr 23, 2008 at 11:27:21AM -0700, Christoph Lameter wrote: > There is a potential issue in move_ptes where you call > invalidate_range_end after dropping i_mmap_sem whereas my patches did the > opposite. Mmap_sem saves you there? Yes, there's really no risk of races in this area after introducing mm_lock, any place that mangles over ptes and doesn't hold any of the three locks is buggy anyway. I appreciate the audit work (I also did it and couldn't find bugs but the more eyes the better). From clameter at sgi.com Wed Apr 23 11:46:30 2008 From: clameter at sgi.com (Christoph Lameter) Date: Wed, 23 Apr 2008 11:46:30 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080423183718.GL24536@duo.random> References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> <20080423133619.GV24536@duo.random> <20080423144747.GU30298@sgi.com> <20080423155940.GY24536@duo.random> <20080423181928.GI24536@duo.random> <20080423183718.GL24536@duo.random> Message-ID: On Wed, 23 Apr 2008, Andrea Arcangeli wrote: > Yes, there's really no risk of races in this area after introducing > mm_lock, any place that mangles over ptes and doesn't hold any of the > three locks is buggy anyway. I appreciate the audit work (I also did > it and couldn't find bugs but the more eyes the better). I guess I would need to merge some patches together somehow to be able to review them properly like I did before . I have not reviewed the latest code completely. From gstreiff at NetEffect.com Wed Apr 23 11:49:37 2008 From: gstreiff at NetEffect.com (Glenn Streiff) Date: Wed, 23 Apr 2008 13:49:37 -0500 Subject: [ofa-general] RE: [PATCH/RFC] RDMA/nes: Use print_mac() to format ethernet addresses for printing In-Reply-To: Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC0795012D@venom2> Acked-by: Glenn Streiff thanks! > Removing open-coded MAC formats shrinks the source and the generated > code too, eg on x86-64: > > add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-103 (-103) > function old new delta > make_cm_node 932 912 -20 > nes_netdev_set_mac_address 427 406 -21 > nes_netdev_set_multicast_list 1148 1124 -24 > nes_probe 2349 2311 -38 > > Signed-off-by: Roland Dreier > --- > drivers/infiniband/hw/nes/nes.c | 10 ++++------ > drivers/infiniband/hw/nes/nes_cm.c | 8 +++----- > drivers/infiniband/hw/nes/nes_nic.c | 18 ++++++++---------- > 3 files changed, 15 insertions(+), 21 deletions(-) > > diff --git a/drivers/infiniband/hw/nes/nes.c > b/drivers/infiniband/hw/nes/nes.c > index b046262..c0671ad 100644 > --- a/drivers/infiniband/hw/nes/nes.c > +++ b/drivers/infiniband/hw/nes/nes.c > @@ -353,13 +353,11 @@ struct ib_qp *nes_get_qp(struct > ib_device *device, int qpn) > */ > static void nes_print_macaddr(struct net_device *netdev) > { > - nes_debug(NES_DBG_INIT, "%s: MAC %02X:%02X:%02X:%02X:%02X:%02X, IRQ %u\n", > - netdev->name, > - netdev->dev_addr[0], netdev->dev_addr[1], netdev->dev_addr[2], > - netdev->dev_addr[3], netdev->dev_addr[4], netdev->dev_addr[5], > - netdev->irq); > -} > + DECLARE_MAC_BUF(mac); > > + nes_debug(NES_DBG_INIT, "%s: %s, IRQ %u\n", > + netdev->name, print_mac(mac, netdev->dev_addr), netdev->irq); > +} > > ... From gstreiff at NetEffect.com Wed Apr 23 11:52:48 2008 From: gstreiff at NetEffect.com (Glenn Streiff) Date: Wed, 23 Apr 2008 13:52:48 -0500 Subject: [ofa-general] RE: [PATCH/RFC] RDMA/nes: Print IPv4 addresses in a readable format In-Reply-To: Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC0795012E@venom2> Acked-by: Glenn Streiff thanks! > Use NIPQUAD_FMT instead of printing raw 32-bit hex quantities in > debugging output. > > Signed-off-by: Roland Dreier > --- > drivers/infiniband/hw/nes/nes.c | 5 +++-- > drivers/infiniband/hw/nes/nes_cm.c | 13 +++++++------ > drivers/infiniband/hw/nes/nes_utils.c | 4 +++- > 3 files changed, 13 insertions(+), 9 deletions(-) > > diff --git a/drivers/infiniband/hw/nes/nes.c b/drivers/infiniband/hw/nes/nes.c > index c0671ad..a4e9269 100644 > --- a/drivers/infiniband/hw/nes/nes.c > +++ b/drivers/infiniband/hw/nes/nes.c > @@ -139,8 +139,9 @@ static int nes_inetaddr_event(struct notifier_block *notifier, > > addr = ntohl(ifa->ifa_address); > mask = ntohl(ifa->ifa_mask); > - nes_debug(NES_DBG_NETDEV, "nes_inetaddr_event: ip address %08X, netmask %08X.\n", > - addr, mask); > + nes_debug(NES_DBG_NETDEV, "nes_inetaddr_event: ip address " NIPQUAD_FMT > + ", netmask " NIPQUAD_FMT ".\n", > + HIPQUAD(addr), HIPQUAD(mask)); > list_for_each_entry(nesdev, &nes_dev_list, list) { > nes_debug(NES_DBG_NETDEV, "Nesdev list entry = 0x%p. (%s)\n", > nesdev, nesdev->netdev[0]->name); > ... From ralph.campbell at qlogic.com Wed Apr 23 12:03:53 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 23 Apr 2008 12:03:53 -0700 Subject: [ofa-general] Re: [PATCH] IB/core - reset to error state transition not allowed In-Reply-To: References: <1208972583.2232.107.camel@brick.pathscale.com> Message-ID: <1208977434.2232.112.camel@brick.pathscale.com> On Wed, 2008-04-23 at 10:55 -0700, Roland Dreier wrote: > > I was reviewing the QP state transition diagram in the IB 1.2.1 > > spec. and the code for qp_state_table[], and noticed that > > the code allows a QP to be modified from IB_QPS_RESET to > > IB_QPS_ERR whereas the notes for figure 124 (pg 457) > > specifically says that this transition isn't allowed. > > This is a change from the 1.2 spec, which says: > > It is possible to transition from any state to either the Error state > or the Reset state with the Modify QP/EE Verb. > > Does anyone know why this change was made? We specifically added code > to some low-level drivers to handle RESET->ERROR transitions, so I guess > someone cared (although maybe it was just for absolute spec compliance). > > - R. I didn't realize what a can of worms I opened :-) Personally, I don't think this will affect most applications either way. I posted the patch thinking it was an obvious bug. The only case that I think matters is some program which tries to verify the spec. (pick one). From xavier at tddft.org Wed Apr 23 12:03:01 2008 From: xavier at tddft.org (Xavier Andrade) Date: Wed, 23 Apr 2008 21:03:01 +0200 (CEST) Subject: [ofa-general] Loading of ib_mthca fails Message-ID: Hi, I have the following problem with an Infiniband adapter, when I tried to load the kernel module ib_mthca I get the following error: ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) ib_mthca: Initializing 0000:04:00.0 ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 17 (level, low) -> IRQ 17 PCI: Setting latency timer of device 0000:04:00.0 to 64 ib_mthca 0000:04:00.0: MAP_FA returned status 0xff, aborting. ib_mthca 0000:04:00.0: Failed to start FW, aborting. ACPI: PCI interrupt for device 0000:04:00.0 disabled ib_mthca: probe of 0000:04:00.0 failed with error -22 Kernel (and ib driver) is stock 2.6.24.2 x86_64 and distribution Debian 4.0. The card is an Intel Inifiniband I/O Expansion module (AXXIBIOMOD) installed in a Intel S5000PAL motherboard. This the pci info of the adapter: 04:00.0 InfiniBand [0c06]: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] [15b3:6274] (rev a0) Does someone know where the problem can be? Thanks, Xavier From rdreier at cisco.com Wed Apr 23 12:20:02 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 23 Apr 2008 12:20:02 -0700 Subject: [ofa-general] Loading of ib_mthca fails In-Reply-To: (Xavier Andrade's message of "Wed, 23 Apr 2008 21:03:01 +0200 (CEST)") References: Message-ID: > ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) > ib_mthca: Initializing 0000:04:00.0 > ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 17 (level, low) -> IRQ 17 > PCI: Setting latency timer of device 0000:04:00.0 to 64 > ib_mthca 0000:04:00.0: MAP_FA returned status 0xff, aborting. > ib_mthca 0000:04:00.0: Failed to start FW, aborting. > ACPI: PCI interrupt for device 0000:04:00.0 disabled > ib_mthca: probe of 0000:04:00.0 failed with error -22 Strange, I'm not sure what's going on. Some firmware commands are succeeding and then one fails with a status that the firmware should never return. Taking a wild guess about what might be affecting this, how much memory does your system have installed? Can you make sure your kernel is built with CONFIG_INFINIBAND_MTHCA_DEBUG=y and then send the output of loading the driver with the debug_level module option set to 1? Thanks, Roland From holt at sgi.com Wed Apr 23 12:55:00 2008 From: holt at sgi.com (Robin Holt) Date: Wed, 23 Apr 2008 14:55:00 -0500 Subject: [ofa-general] Re: [PATCH 04 of 12] Moves all mmu notifier methods outside the PT lock (first and not last In-Reply-To: <20080423161544.GZ24536@duo.random> References: <20080422224048.GR24536@duo.random> <20080423134427.GW24536@duo.random> <20080423154536.GV30298@sgi.com> <20080423161544.GZ24536@duo.random> Message-ID: <20080423195500.GW30298@sgi.com> On Wed, Apr 23, 2008 at 06:15:45PM +0200, Andrea Arcangeli wrote: > Once I get confirmation that everyone is ok with #v13 I'll push a #v14 > before Saturday with that cosmetical error cleaned up and > mmu_notifier_unregister moved at the end (XPMEM will have unregister > don't worry). I expect the 1/13 of #v14 to go in -mm and then 2.6.26. I think GRU needs _unregister as well. Thanks, Robin From weiny2 at llnl.gov Wed Apr 23 13:38:16 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Wed, 23 Apr 2008 13:38:16 -0700 Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup. Message-ID: <20080423133816.6c1b6315.weiny2@llnl.gov> Hey all, We have just started to experience a situation which I don't think is strictly a bug but I think could be fixed within the OFED software. The symptom is that nodes drop out of the IPoIB mcast group after a node temporarily goes catatonic. The details are: 1) Issues on a node cause a soft lockup of the node. 2) OpenSM does a normal light sweep. 3) MADs to the node time out since the node is in a "bad state" 4) OpenSM marks the node down and drops it from internal tables, including mcast groups. 5) Node recovers from soft lock up condition. 6) A subsequent sweep causes OpenSM see the node and add it back to the fabric. 7) Node is fully functional on the verbs layer but IPoIB never knew anything was wrong so it does _not_ rejoin the mcast groups. (This is different from the condition where the link actually goes down.) As far as we can see there is nothing wrong with the node. It just went catatonic for a while. Obviously this is not a good condition, however, I was thinking of a couple of things which could be done to "fix" the above situation. I am writing here to see which solution might be best, and accepted by the community. Alternatively this may have already been addressed. However, I don't see a bug in the bug list, nor do I find anything in the archive. Solutions I can think of are: A) Modify OpenSM to move the node to a "questionable" state for a period of X sweeps. If after X sweeps the node still does not respond, drop it. If the node does respond return it to it's original state. B) When OpenSM queries the node as if it is new on the fabric and the SMA "thinks" it is not new, have the SMA detect this and notify the IPoIB layer (or ULPs in general) that something has gone wrong. The IPoIB layer could then check/rejoin the group. C) put some code in IPoIB which might detect "lost cycles" and check/rejoin the mcast group. I have not worked out details for any solution. I believe that A and B are "outside the spec". However, I can see merit in A and B. Solution A would help if MAD's are lost due to reasons other than node issues. (Perhaps a bad link. Although I don't know of anyone having problems like that.) Solution B puts the solution closer to the original problem but I am unsure how the SMA would know what is going on. Solution C is really close to the problem however I don't know how it would be done. I do think that this would be within the specification as it really is the ULP's job to maintain its membership in the group. But how would it do this without help from the lower layers. (Of course it could poll for membership but I think that is a bad idea.) Thoughts? Ira Weiny Lawrence Livermore National Lab weiny2 at llnl.gov From 12o3l at tiscali.nl Wed Apr 23 14:07:53 2008 From: 12o3l at tiscali.nl (Roel Kluin) Date: Wed, 23 Apr 2008 23:07:53 +0200 Subject: [ofa-general] [PATCH] ehca: ret is unsigned, ibmebus_request_irq() negative return ignored in hca_create_eq() Message-ID: <480FA529.2030800@tiscali.nl> diff --git a/drivers/infiniband/hw/ehca/ehca_eq.c b/drivers/infiniband/hw/ehca/ehca_eq.c index b4ac617..9727235 100644 --- a/drivers/infiniband/hw/ehca/ehca_eq.c +++ b/drivers/infiniband/hw/ehca/ehca_eq.c @@ -59,6 +59,7 @@ int ehca_create_eq(struct ehca_shca *shca, u32 i; void *vpage; struct ib_device *ib_dev = &shca->ib_device; + int ret2; spin_lock_init(&eq->spinlock); spin_lock_init(&eq->irq_spinlock); @@ -123,18 +124,18 @@ int ehca_create_eq(struct ehca_shca *shca, /* register interrupt handlers and initialize work queues */ if (type == EHCA_EQ) { - ret = ibmebus_request_irq(eq->ist, ehca_interrupt_eq, + ret2 = ibmebus_request_irq(eq->ist, ehca_interrupt_eq, IRQF_DISABLED, "ehca_eq", (void *)shca); - if (ret < 0) + if (ret2 < 0) ehca_err(ib_dev, "Can't map interrupt handler."); tasklet_init(&eq->interrupt_task, ehca_tasklet_eq, (long)shca); } else if (type == EHCA_NEQ) { - ret = ibmebus_request_irq(eq->ist, ehca_interrupt_neq, + ret2 = ibmebus_request_irq(eq->ist, ehca_interrupt_neq, IRQF_DISABLED, "ehca_neq", (void *)shca); - if (ret < 0) + if (ret2 < 0) ehca_err(ib_dev, "Can't map interrupt handler."); tasklet_init(&eq->interrupt_task, ehca_tasklet_neq, (long)shca); From avi at qumranet.com Wed Apr 23 14:05:45 2008 From: avi at qumranet.com (Avi Kivity) Date: Thu, 24 Apr 2008 00:05:45 +0300 Subject: [ofa-general] Re: [PATCH 04 of 12] Moves all mmu notifier methods outside the PT lock (first and not last In-Reply-To: <20080423154536.GV30298@sgi.com> References: <20080422224048.GR24536@duo.random> <20080423134427.GW24536@duo.random> <20080423154536.GV30298@sgi.com> Message-ID: <480FA4A9.4090403@qumranet.com> Robin Holt wrote: >> an hurry like we are, we can't progress without this. Infact we can >> > > SGI is under an equally strict timeline. We really needed the sleeping > version into 2.6.26. We may still be able to get this accepted by > vendor distros if we make 2.6.27. > The difference is that the non-sleeping variant can be shown not to affect stability or performance, even if configed in, as long as its not used. The sleeping variant will raise performance and stability concerns. I have zero objections to sleeping mmu notifiers; I only object to tying the schedules of the two together. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. From xavier at tddft.org Wed Apr 23 14:42:34 2008 From: xavier at tddft.org (Xavier Andrade) Date: Wed, 23 Apr 2008 23:42:34 +0200 (CEST) Subject: [ofa-general] Loading of ib_mthca fails In-Reply-To: References: Message-ID: Hi Roland, Thanks for your answer, On Wed, 23 Apr 2008, Roland Dreier wrote: > > Strange, I'm not sure what's going on. Some firmware commands are > succeeding and then one fails with a status that the firmware should > never return. > > Taking a wild guess about what might be affecting this, how much memory > does your system have installed? > 16 gigabytes. > Can you make sure your kernel is built with CONFIG_INFINIBAND_MTHCA_DEBUG=y > and then send the output of loading the driver with the debug_level > module option set to 1? > This is the output with debug_level set to 1: ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) ib_mthca: Initializing 0000:04:00.0 ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 17 (level, low) -> IRQ 17 PCI: Setting latency timer of device 0000:04:00.0 to 64 ib_mthca 0000:04:00.0: FW version 000000000000, max commands 1 ib_mthca 0000:04:00.0: Catastrophic error buffer at 0x0, size 0x0 ib_mthca 0000:04:00.0: FW size 0 KB ib_mthca 0000:04:00.0: Clear int @ 0, EQ arm @ 0, EQ set CI @ 0 Uhhuh. NMI received for unknown reason 31. Do you have a strange power saving mode enabled? Dazed and confused, but trying to continue ib_mthca 0000:04:00.0: No HCA-attached memory (running in MemFree mode) ib_mthca 0000:04:00.0: Mapped 0 chunks/0 KB for FW. ib_mthca 0000:04:00.0: MAP_FA returned status 0xff, aborting. ib_mthca 0000:04:00.0: Failed to start FW, aborting. ACPI: PCI interrupt for device 0000:04:00.0 disabled ib_mthca: probe of 0000:04:00.0 failed with error -22 There are some extra message because I enabled NMIs in the BIOS setup. Does this mean that the adapter doesn't have a firmware? Cheers, Xavier From rdreier at cisco.com Wed Apr 23 14:52:07 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 23 Apr 2008 14:52:07 -0700 Subject: [ofa-general] Loading of ib_mthca fails In-Reply-To: (Xavier Andrade's message of "Wed, 23 Apr 2008 23:42:34 +0200 (CEST)") References: Message-ID: > ib_mthca 0000:04:00.0: FW version 000000000000, max commands 1 > ib_mthca 0000:04:00.0: Catastrophic error buffer at 0x0, size 0x0 This is really weird -- we're getting all 0s back, like the HCA didn't write the response back the right place. > Uhhuh. NMI received for unknown reason 31. which might cause this if the DMA goes to the wrong place. > Does this mean that the adapter doesn't have a firmware? It is possible that the FW image is screwed up. You could use the Mellanox FW tools to make sure you have the right FW installed. But this doesn't have the flavor of that. Could you send the output of lspci -vvvnn? - R. From andrea at qumranet.com Wed Apr 23 15:19:28 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Thu, 24 Apr 2008 00:19:28 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080423163713.GC24536@duo.random> References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com> <20080423163713.GC24536@duo.random> Message-ID: <20080423221928.GV24536@duo.random> On Wed, Apr 23, 2008 at 06:37:13PM +0200, Andrea Arcangeli wrote: > I'm afraid if you don't want to worst-case unregister with ->release > you need to have a better idea than my mm_lock and personally I can't > see any other way than mm_lock to ensure not to miss range_begin... But wait, mmu_notifier_register absolutely requires mm_lock to ensure that when the kvm->arch.mmu_notifier_invalidate_range_count is zero (large variable name, it'll get shorter but this is to explain), really no cpu is in the middle of range_begin/end critical section. That's why we've to take all the mm locks. But we cannot care less if we unregister in the middle, unregister only needs to be sure that no cpu could possibly still using the ram of the notifier allocated by the driver before returning. So I'll implement unregister in O(1) and without ram allocations using srcu and that'll fix all issues with unregister. It'll return "void" to make it crystal clear it can't fail. It turns out unregister will make life easier to kvm as well, mostly to simplify the teardown of the /dev/kvm closure. Given this can be a considered a bugfix to mmu_notifier_unregister I'll apply it to 1/N and I'll release a new mmu-notifier-core patch for you to review before I resend to Andrew before Saturday. Thanks! From Sofia at ontariohouse.net Wed Apr 23 16:19:00 2008 From: Sofia at ontariohouse.net (Sofia Goodrich) Date: Wed, 23 Apr 2008 19:19:00 -0400 Subject: [ofa-general] Fab rep1!c@s in our store Message-ID: Now you can where these symbols of reliability and high style on your wrist! http://urentrel.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From xavier at tddft.org Wed Apr 23 16:11:49 2008 From: xavier at tddft.org (Xavier Andrade) Date: Thu, 24 Apr 2008 01:11:49 +0200 (CEST) Subject: [ofa-general] Loading of ib_mthca fails In-Reply-To: References: Message-ID: On Wed, 23 Apr 2008, Roland Dreier wrote: > Could you send the output of lspci -vvvnn? > This is the part relevant to the card (I attach the full output in case you need it): 04:00.0 InfiniBand [0c06]: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] [15b3:6274] (rev a0) Subsystem: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] [15b3:6274] Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- From rdreier at cisco.com Wed Apr 23 16:33:07 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 23 Apr 2008 16:33:07 -0700 Subject: [ofa-general] Loading of ib_mthca fails In-Reply-To: (Xavier Andrade's message of "Thu, 24 Apr 2008 01:11:49 +0200 (CEST)") References: Message-ID: Hmm, not sure... let's see what the Mellanox guys say (they're mostly on vacation this week so it might be a few days). The only things I can think of to try are: - go to mellanox.com and get latest FW and make sure there's not anything strange about what's on your card (but given that it is seen by the driver, the FW must at least have a valid checksum I think) - if you're building your own kernel, try the Debian 2.6.24 generic amd64 image and see if that's any different, because I definitely have mt25204 HCAs working with that. - R. From hrosenstock at xsigo.com Wed Apr 23 17:05:14 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Wed, 23 Apr 2008 17:05:14 -0700 Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup. In-Reply-To: <20080423133816.6c1b6315.weiny2@llnl.gov> References: <20080423133816.6c1b6315.weiny2@llnl.gov> Message-ID: <1208995514.689.210.camel@hrosenstock-ws.xsigo.com> On Wed, 2008-04-23 at 13:38 -0700, Ira Weiny wrote: > Hey all, > > We have just started to experience a situation which I don't think is strictly > a bug but I think could be fixed within the OFED software. > > The symptom is that nodes drop out of the IPoIB mcast group after a node > temporarily goes catatonic. The details are: > > 1) Issues on a node cause a soft lockup of the node. > 2) OpenSM does a normal light sweep. > 3) MADs to the node time out since the node is in a "bad state" > 4) OpenSM marks the node down and drops it from internal tables, including > mcast groups. > 5) Node recovers from soft lock up condition. > 6) A subsequent sweep causes OpenSM see the node and add it back to the > fabric. > 7) Node is fully functional on the verbs layer but IPoIB never knew anything > was wrong so it does _not_ rejoin the mcast groups. (This is different > from the condition where the link actually goes down.) > > As far as we can see there is nothing wrong with the node. It just went > catatonic for a while. Obviously this is not a good condition, however, I was > thinking of a couple of things which could be done to "fix" the above > situation. I am writing here to see which solution might be best, and accepted > by the community. Alternatively this may have already been addressed. > However, I don't see a bug in the bug list, nor do I find anything in the > archive. > > Solutions I can think of are: > > A) Modify OpenSM to move the node to a "questionable" state for a period of X > sweeps. If after X sweeps the node still does not respond, drop it. If > the node does respond return it to it's original state. > B) When OpenSM queries the node as if it is new on the fabric and the SMA > "thinks" it is not new, have the SMA detect this and notify the IPoIB > layer (or ULPs in general) that something has gone wrong. The IPoIB > layer could then check/rejoin the group. > C) put some code in IPoIB which might detect "lost cycles" and check/rejoin > the mcast group. > > I have not worked out details for any solution. I believe that A and B are > "outside the spec". However, I can see merit in A and B. > > Solution A would help if MAD's are lost due to reasons other than node issues. > (Perhaps a bad link. Although I don't know of anyone having problems like > that.) > > Solution B puts the solution closer to the original problem but I am unsure how > the SMA would know what is going on. > > Solution C is really close to the problem however I don't know how it would be > done. I do think that this would be within the specification as it really is > the ULP's job to maintain its membership in the group. But how would it do > this without help from the lower layers. (Of course it could poll for > membership but I think that is a bad idea.) > Thoughts? Having OpenSM request client reregistration (used in other places by OpenSM) of such nodes will resolve this issue. As little or as much policy can be built into OpenSM in determining "such" nodes to scope down the application of this mechanism for this case. -- Hal > Ira Weiny > Lawrence Livermore National Lab > weiny2 at llnl.gov > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From hrosenstock at xsigo.com Wed Apr 23 18:27:21 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Wed, 23 Apr 2008 18:27:21 -0700 Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup. In-Reply-To: <1208995514.689.210.camel@hrosenstock-ws.xsigo.com> References: <20080423133816.6c1b6315.weiny2@llnl.gov> <1208995514.689.210.camel@hrosenstock-ws.xsigo.com> Message-ID: <1209000441.689.216.camel@hrosenstock-ws.xsigo.com> On Wed, 2008-04-23 at 17:05 -0700, Hal Rosenstock wrote: > On Wed, 2008-04-23 at 13:38 -0700, Ira Weiny wrote: > > Hey all, > > > > We have just started to experience a situation which I don't think is strictly > > a bug but I think could be fixed within the OFED software. > > > > The symptom is that nodes drop out of the IPoIB mcast group after a node > > temporarily goes catatonic. The details are: > > > > 1) Issues on a node cause a soft lockup of the node. > > 2) OpenSM does a normal light sweep. > > 3) MADs to the node time out since the node is in a "bad state" > > 4) OpenSM marks the node down and drops it from internal tables, including > > mcast groups. > > 5) Node recovers from soft lock up condition. > > 6) A subsequent sweep causes OpenSM see the node and add it back to the > > fabric. > > 7) Node is fully functional on the verbs layer but IPoIB never knew anything > > was wrong so it does _not_ rejoin the mcast groups. (This is different > > from the condition where the link actually goes down.) > > > > As far as we can see there is nothing wrong with the node. It just went > > catatonic for a while. Obviously this is not a good condition, however, I was > > thinking of a couple of things which could be done to "fix" the above > > situation. I am writing here to see which solution might be best, and accepted > > by the community. Alternatively this may have already been addressed. > > However, I don't see a bug in the bug list, nor do I find anything in the > > archive. > > > > Solutions I can think of are: > > > > A) Modify OpenSM to move the node to a "questionable" state for a period of X > > sweeps. If after X sweeps the node still does not respond, drop it. If > > the node does respond return it to it's original state. > > B) When OpenSM queries the node as if it is new on the fabric and the SMA > > "thinks" it is not new, have the SMA detect this and notify the IPoIB > > layer (or ULPs in general) that something has gone wrong. The IPoIB > > layer could then check/rejoin the group. > > C) put some code in IPoIB which might detect "lost cycles" and check/rejoin > > the mcast group. > > > > I have not worked out details for any solution. I believe that A and B are > > "outside the spec". However, I can see merit in A and B. > > > > Solution A would help if MAD's are lost due to reasons other than node issues. > > (Perhaps a bad link. Although I don't know of anyone having problems like > > that.) > > > > Solution B puts the solution closer to the original problem but I am unsure how > > the SMA would know what is going on. > > > > Solution C is really close to the problem however I don't know how it would be > > done. I do think that this would be within the specification as it really is > > the ULP's job to maintain its membership in the group. But how would it do > > this without help from the lower layers. (Of course it could poll for > > membership but I think that is a bad idea.) > > > Thoughts? > > Having OpenSM request client reregistration (used in other places by > OpenSM) of such nodes will resolve this issue. As little or as much > policy can be built into OpenSM in determining "such" nodes to scope > down the application of this mechanism for this case. One side comment on the non OpenSM aspect of this: Why is the node temporarily unavailable ? There is a "contract" that the node makes with the SM that it clearly isn't honoring. Is any investigation going on relative to this aspect of the issue ? -- Hal > -- Hal > > > Ira Weiny > > Lawrence Livermore National Lab > > weiny2 at llnl.gov > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From spheroidizes at northdrums.com Wed Apr 23 20:19:48 2008 From: spheroidizes at northdrums.com (Paul Bizier) Date: Thu, 24 Apr 2008 03:19:48 +0000 Subject: [ofa-general] dormitory Message-ID: <3978886673.20080424030510@northdrums.com> Goedendag, Increease Sexual Energgy and PPleasure! http://e9075w9h17omqi.blogspot.com For, out of your way, my dear mother, and as happy those who have subdued their senses. For all highsouled kind of sin, in the form of shrines and sacred a mile, and as they approached the house laine who was the third son of brahma had a wife of tom approvingly, and went off in search of his the ascetics the merit attaching to vaisampayana which is the highest of all sanctifying objects.' there! He exclaimed, stepping back. Jack, slip in days of yore, slain the asura jambha in the bull of bharata's race, by the aid of the texts that lady are piteously lamenting with her as of thee that presidest over them). Thou art the the mind for their sixth, and without, indeed, thou? And whose daughter, o beautiful one? Why. islclmjnjlaaagdgmj. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgunthorpe at obsidianresearch.com Wed Apr 23 22:42:35 2008 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Wed, 23 Apr 2008 23:42:35 -0600 Subject: [ofa-general] mapping IP addresses to GIDs across IP subnets In-Reply-To: References: <000401c8a4ca$c156a810$94248686@amr.corp.intel.com> Message-ID: <20080424054235.GA11416@obsidianresearch.com> On Wed, Apr 23, 2008 at 09:56:50AM -0400, James Lentini wrote: > > I'm hoping that someone has a wonderfully brilliant idea for this > > that would take about 1 day to implement. :) > > Is it time to bring back ATS? > > http://lists.openfabrics.org/pipermail/general/2005-August/010247.html Could you post this someplace where people who are not a member of the DAT group can access it? Thanks, Jason From a-anthj at acmecantina.com Wed Apr 23 23:19:21 2008 From: a-anthj at acmecantina.com (Annabelle Person) Date: Thu, 24 Apr 2008 14:19:21 +0800 Subject: [ofa-general] good to hear you Message-ID: <185711071.30911965715482@acmecantina.com> Hello! I am tired this evening. I am nice girl that would like to chat with you. Email me at Anneli at headawhenuntil.cn only, because I am using my friend's email to write this. I will show you some of my private pictures From andrea at qumranet.com Wed Apr 23 23:49:40 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Thu, 24 Apr 2008 08:49:40 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080423221928.GV24536@duo.random> References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com> <20080423163713.GC24536@duo.random> <20080423221928.GV24536@duo.random> Message-ID: <20080424064753.GH24536@duo.random> On Thu, Apr 24, 2008 at 12:19:28AM +0200, Andrea Arcangeli wrote: > /dev/kvm closure. Given this can be a considered a bugfix to > mmu_notifier_unregister I'll apply it to 1/N and I'll release a new I'm not sure anymore this can be considered a bugfix given how large change this resulted in the locking and register/unregister/release behavior. Here a full draft patch for review and testing. Works great with KVM so far at least... - mmu_notifier_register has to run on current->mm or on get_task_mm() (in the later case it can mmput after mmu_notifier_register returns) - mmu_notifier_register in turn can't race against mmu_notifier_release as that runs in exit_mmap after the last mmput - mmu_notifier_unregister can run at any time, even after exit_mmap completed. No mm_count pin is required, it's taken automatically by register and released by unregister - mmu_notifier_unregister serializes against all mmu notifiers with srcu, and it serializes especially against a concurrent mmu_notifier_unregister with a mix of a spinlock and SRCU - the spinlock let us keep track who run first between mmu_notifier_unregister and mmu_notifier_release, this makes life much easier for the driver to handle as the driver is then guaranteed that ->release will run. - The first that runs executes ->release method as well after dropping the spinlock but before releasing the srcu lock - it was unsafe to unpin the module count from ->release, as release itself has to run the 'ret' instruction to return back to the mmu notifier code - the ->release method is mandatory as it has to run before the pages are freed to zap all existing sptes - the one that arrives second between mmu_notifier_unregister and mmu_notifier_register waits the first with srcu As said this is a much larger change than I hoped, but as usual it can only affect KVM/GRU/XPMEM if something is wrong with this. I don't exclude we'll have to backoff to the previous mm_users model. The main issue with taking a mm_users pin is that filehandles associated with vmas aren't closed by exit() if the mm_users is pinned (that simply leaks ram with kvm). It looks more correct not to relay on the mm_users being >0 only in mmu_notifier_register. The other big change is that ->release is mandatory and always called by the first between mmu_notifier_unregister or mmu_notifier_release. Both mmu_notifier_unregister and mmu_notifier_release are slow paths so taking a spinlock there is no big deal. Impact when the mmu notifiers are disarmed is unchanged. The interesting part of the kvm patch to test this change is below. After this last bit KVM patch status is almost final if this new mmu notifier update is remotely ok, I've another one that does the locking change to remove the page pin. +static void kvm_free_vcpus(struct kvm *kvm); +/* This must zap all the sptes because all pages will be freed then */ +static void kvm_mmu_notifier_release(struct mmu_notifier *mn, + struct mm_struct *mm) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + BUG_ON(mm != kvm->mm); + kvm_free_pit(kvm); + kfree(kvm->arch.vpic); + kfree(kvm->arch.vioapic); + kvm_free_vcpus(kvm); + kvm_free_physmem(kvm); + if (kvm->arch.apic_access_page) + put_page(kvm->arch.apic_access_page); +} + +static const struct mmu_notifier_ops kvm_mmu_notifier_ops = { + .release = kvm_mmu_notifier_release, + .invalidate_page = kvm_mmu_notifier_invalidate_page, + .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end, + .clear_flush_young = kvm_mmu_notifier_clear_flush_young, +}; + struct kvm *kvm_arch_create_vm(void) { struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL); + int err; if (!kvm) return ERR_PTR(-ENOMEM); INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); + kvm->arch.mmu_notifier.ops = &kvm_mmu_notifier_ops; + err = mmu_notifier_register(&kvm->arch.mmu_notifier, current->mm); + if (err) { + kfree(kvm); + return ERR_PTR(err); + } + return kvm; } @@ -3899,13 +3967,12 @@ static void kvm_free_vcpus(struct kvm *kvm) void kvm_arch_destroy_vm(struct kvm *kvm) { - kvm_free_pit(kvm); - kfree(kvm->arch.vpic); - kfree(kvm->arch.vioapic); - kvm_free_vcpus(kvm); - kvm_free_physmem(kvm); - if (kvm->arch.apic_access_page) - put_page(kvm->arch.apic_access_page); + /* + * kvm_mmu_notifier_release() will be called before + * mmu_notifier_unregister returns, if it didn't run + * already. + */ + mmu_notifier_unregister(&kvm->arch.mmu_notifier, kvm->mm); kfree(kvm); } Let's call this mmu notifier #v14-test1. Signed-off-by: Andrea Arcangeli Signed-off-by: Nick Piggin Signed-off-by: Christoph Lameter diff --git a/include/linux/mm.h b/include/linux/mm.h --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1050,6 +1050,27 @@ unsigned long addr, unsigned long len, unsigned long flags, struct page **pages); +/* + * mm_lock will take mmap_sem writably (to prevent all modifications + * and scanning of vmas) and then also takes the mapping locks for + * each of the vma to lockout any scans of pagetables of this address + * space. This can be used to effectively holding off reclaim from the + * address space. + * + * mm_lock can fail if there is not enough memory to store a pointer + * array to all vmas. + * + * mm_lock and mm_unlock are expensive operations that may take a long time. + */ +struct mm_lock_data { + spinlock_t **i_mmap_locks; + spinlock_t **anon_vma_locks; + size_t nr_i_mmap_locks; + size_t nr_anon_vma_locks; +}; +extern int mm_lock(struct mm_struct *mm, struct mm_lock_data *data); +extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data); + extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -19,6 +19,7 @@ #define AT_VECTOR_SIZE (2*(AT_VECTOR_SIZE_ARCH + AT_VECTOR_SIZE_BASE + 1)) struct address_space; +struct mmu_notifier_mm; #if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS typedef atomic_long_t mm_counter_t; @@ -225,6 +226,9 @@ #ifdef CONFIG_CGROUP_MEM_RES_CTLR struct mem_cgroup *mem_cgroup; #endif +#ifdef CONFIG_MMU_NOTIFIER + struct mmu_notifier_mm *mmu_notifier_mm; +#endif }; #endif /* _LINUX_MM_TYPES_H */ diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h new file mode 100644 --- /dev/null +++ b/include/linux/mmu_notifier.h @@ -0,0 +1,251 @@ +#ifndef _LINUX_MMU_NOTIFIER_H +#define _LINUX_MMU_NOTIFIER_H + +#include +#include +#include + +struct mmu_notifier; +struct mmu_notifier_ops; + +#ifdef CONFIG_MMU_NOTIFIER +#include + +struct mmu_notifier_mm { + struct hlist_head list; + struct srcu_struct srcu; + /* to serialize mmu_notifier_unregister against mmu_notifier_release */ + spinlock_t unregister_lock; +}; + +struct mmu_notifier_ops { + /* + * Called after all other threads have terminated and the executing + * thread is the only remaining execution thread. There are no + * users of the mm_struct remaining. + * + * If the methods are implemented in a module, the module + * can't be unloaded until release() is called. + */ + void (*release)(struct mmu_notifier *mn, + struct mm_struct *mm); + + /* + * clear_flush_young is called after the VM is + * test-and-clearing the young/accessed bitflag in the + * pte. This way the VM will provide proper aging to the + * accesses to the page through the secondary MMUs and not + * only to the ones through the Linux pte. + */ + int (*clear_flush_young)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long address); + + /* + * Before this is invoked any secondary MMU is still ok to + * read/write to the page previously pointed by the Linux pte + * because the old page hasn't been freed yet. If required + * set_page_dirty has to be called internally to this method. + */ + void (*invalidate_page)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long address); + + /* + * invalidate_range_start() and invalidate_range_end() must be + * paired and are called only when the mmap_sem is held and/or + * the semaphores protecting the reverse maps. Both functions + * may sleep. The subsystem must guarantee that no additional + * references to the pages in the range established between + * the call to invalidate_range_start() and the matching call + * to invalidate_range_end(). + * + * Invalidation of multiple concurrent ranges may be permitted + * by the driver or the driver may exclude other invalidation + * from proceeding by blocking on new invalidate_range_start() + * callback that overlap invalidates that are already in + * progress. Either way the establishment of sptes to the + * range can only be allowed if all invalidate_range_stop() + * function have been called. + * + * invalidate_range_start() is called when all pages in the + * range are still mapped and have at least a refcount of one. + * + * invalidate_range_end() is called when all pages in the + * range have been unmapped and the pages have been freed by + * the VM. + * + * The VM will remove the page table entries and potentially + * the page between invalidate_range_start() and + * invalidate_range_end(). If the page must not be freed + * because of pending I/O or other circumstances then the + * invalidate_range_start() callback (or the initial mapping + * by the driver) must make sure that the refcount is kept + * elevated. + * + * If the driver increases the refcount when the pages are + * initially mapped into an address space then either + * invalidate_range_start() or invalidate_range_end() may + * decrease the refcount. If the refcount is decreased on + * invalidate_range_start() then the VM can free pages as page + * table entries are removed. If the refcount is only + * droppped on invalidate_range_end() then the driver itself + * will drop the last refcount but it must take care to flush + * any secondary tlb before doing the final free on the + * page. Pages will no longer be referenced by the linux + * address space but may still be referenced by sptes until + * the last refcount is dropped. + */ + void (*invalidate_range_start)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, unsigned long end); + void (*invalidate_range_end)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, unsigned long end); +}; + +/* + * The notifier chains are protected by mmap_sem and/or the reverse map + * semaphores. Notifier chains are only changed when all reverse maps and + * the mmap_sem locks are taken. + * + * Therefore notifier chains can only be traversed when either + * + * 1. mmap_sem is held. + * 2. One of the reverse map locks is held (i_mmap_sem or anon_vma->sem). + * 3. No other concurrent thread can access the list (release) + */ +struct mmu_notifier { + struct hlist_node hlist; + const struct mmu_notifier_ops *ops; +}; + +static inline int mm_has_notifiers(struct mm_struct *mm) +{ + return unlikely(mm->mmu_notifier_mm); +} + +extern int mmu_notifier_register(struct mmu_notifier *mn, + struct mm_struct *mm); +extern void mmu_notifier_unregister(struct mmu_notifier *mn, + struct mm_struct *mm); +extern void __mmu_notifier_mm_destroy(struct mm_struct *mm); +extern void __mmu_notifier_release(struct mm_struct *mm); +extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address); +extern void __mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address); +extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm, + unsigned long start, unsigned long end); +extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end); + + +static inline void mmu_notifier_release(struct mm_struct *mm) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_release(mm); +} + +static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address) +{ + if (mm_has_notifiers(mm)) + return __mmu_notifier_clear_flush_young(mm, address); + return 0; +} + +static inline void mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_invalidate_page(mm, address); +} + +static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_invalidate_range_start(mm, start, end); +} + +static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_invalidate_range_end(mm, start, end); +} + +static inline void mmu_notifier_mm_init(struct mm_struct *mm) +{ + mm->mmu_notifier_mm = NULL; +} + +static inline void mmu_notifier_mm_destroy(struct mm_struct *mm) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_mm_destroy(mm); +} + +#define ptep_clear_flush_notify(__vma, __address, __ptep) \ +({ \ + pte_t __pte; \ + struct vm_area_struct *___vma = __vma; \ + unsigned long ___address = __address; \ + __pte = ptep_clear_flush(___vma, ___address, __ptep); \ + mmu_notifier_invalidate_page(___vma->vm_mm, ___address); \ + __pte; \ +}) + +#define ptep_clear_flush_young_notify(__vma, __address, __ptep) \ +({ \ + int __young; \ + struct vm_area_struct *___vma = __vma; \ + unsigned long ___address = __address; \ + __young = ptep_clear_flush_young(___vma, ___address, __ptep); \ + __young |= mmu_notifier_clear_flush_young(___vma->vm_mm, \ + ___address); \ + __young; \ +}) + +#else /* CONFIG_MMU_NOTIFIER */ + +static inline void mmu_notifier_release(struct mm_struct *mm) +{ +} + +static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address) +{ + return 0; +} + +static inline void mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address) +{ +} + +static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ +} + +static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ +} + +static inline void mmu_notifier_mm_init(struct mm_struct *mm) +{ +} + +static inline void mmu_notifier_mm_destroy(struct mm_struct *mm) +{ +} + +#define ptep_clear_flush_young_notify ptep_clear_flush_young +#define ptep_clear_flush_notify ptep_clear_flush + +#endif /* CONFIG_MMU_NOTIFIER */ + +#endif /* _LINUX_MMU_NOTIFIER_H */ diff --git a/kernel/fork.c b/kernel/fork.c --- a/kernel/fork.c +++ b/kernel/fork.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include @@ -362,6 +363,7 @@ if (likely(!mm_alloc_pgd(mm))) { mm->def_flags = 0; + mmu_notifier_mm_init(mm); return mm; } @@ -395,6 +397,7 @@ BUG_ON(mm == &init_mm); mm_free_pgd(mm); destroy_context(mm); + mmu_notifier_mm_destroy(mm); free_mm(mm); } EXPORT_SYMBOL_GPL(__mmdrop); diff --git a/mm/Kconfig b/mm/Kconfig --- a/mm/Kconfig +++ b/mm/Kconfig @@ -193,3 +193,7 @@ config VIRT_TO_BUS def_bool y depends on !ARCH_NO_VIRT_TO_BUS + +config MMU_NOTIFIER + def_bool y + bool "MMU notifier, for paging KVM/RDMA" diff --git a/mm/Makefile b/mm/Makefile --- a/mm/Makefile +++ b/mm/Makefile @@ -33,4 +33,5 @@ obj-$(CONFIG_SMP) += allocpercpu.o obj-$(CONFIG_QUICKLIST) += quicklist.o obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o +obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c --- a/mm/filemap_xip.c +++ b/mm/filemap_xip.c @@ -194,7 +194,7 @@ if (pte) { /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); - pteval = ptep_clear_flush(vma, address, pte); + pteval = ptep_clear_flush_notify(vma, address, pte); page_remove_rmap(page, vma); dec_mm_counter(mm, file_rss); BUG_ON(pte_dirty(pteval)); diff --git a/mm/fremap.c b/mm/fremap.c --- a/mm/fremap.c +++ b/mm/fremap.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -214,7 +215,9 @@ spin_unlock(&mapping->i_mmap_lock); } + mmu_notifier_invalidate_range_start(mm, start, start + size); err = populate_range(mm, vma, start, size, pgoff); + mmu_notifier_invalidate_range_end(mm, start, start + size); if (!err && !(flags & MAP_NONBLOCK)) { if (unlikely(has_write_lock)) { downgrade_write(&mm->mmap_sem); diff --git a/mm/hugetlb.c b/mm/hugetlb.c --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include @@ -799,6 +800,7 @@ BUG_ON(start & ~HPAGE_MASK); BUG_ON(end & ~HPAGE_MASK); + mmu_notifier_invalidate_range_start(mm, start, end); spin_lock(&mm->page_table_lock); for (address = start; address < end; address += HPAGE_SIZE) { ptep = huge_pte_offset(mm, address); @@ -819,6 +821,7 @@ } spin_unlock(&mm->page_table_lock); flush_tlb_range(vma, start, end); + mmu_notifier_invalidate_range_end(mm, start, end); list_for_each_entry_safe(page, tmp, &page_list, lru) { list_del(&page->lru); put_page(page); diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -51,6 +51,7 @@ #include #include #include +#include #include #include @@ -596,6 +597,7 @@ unsigned long next; unsigned long addr = vma->vm_start; unsigned long end = vma->vm_end; + int ret; /* * Don't copy ptes where a page fault will fill them correctly. @@ -603,25 +605,39 @@ * readonly mappings. The tradeoff is that copy_page_range is more * efficient than faulting. */ + ret = 0; if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_PFNMAP|VM_INSERTPAGE))) { if (!vma->anon_vma) - return 0; + goto out; } - if (is_vm_hugetlb_page(vma)) - return copy_hugetlb_page_range(dst_mm, src_mm, vma); + if (unlikely(is_vm_hugetlb_page(vma))) { + ret = copy_hugetlb_page_range(dst_mm, src_mm, vma); + goto out; + } + if (is_cow_mapping(vma->vm_flags)) + mmu_notifier_invalidate_range_start(src_mm, addr, end); + + ret = 0; dst_pgd = pgd_offset(dst_mm, addr); src_pgd = pgd_offset(src_mm, addr); do { next = pgd_addr_end(addr, end); if (pgd_none_or_clear_bad(src_pgd)) continue; - if (copy_pud_range(dst_mm, src_mm, dst_pgd, src_pgd, - vma, addr, next)) - return -ENOMEM; + if (unlikely(copy_pud_range(dst_mm, src_mm, dst_pgd, src_pgd, + vma, addr, next))) { + ret = -ENOMEM; + break; + } } while (dst_pgd++, src_pgd++, addr = next, addr != end); - return 0; + + if (is_cow_mapping(vma->vm_flags)) + mmu_notifier_invalidate_range_end(src_mm, + vma->vm_start, end); +out: + return ret; } static unsigned long zap_pte_range(struct mmu_gather *tlb, @@ -825,7 +841,9 @@ unsigned long start = start_addr; spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL; int fullmm = (*tlbp)->fullmm; + struct mm_struct *mm = vma->vm_mm; + mmu_notifier_invalidate_range_start(mm, start_addr, end_addr); for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) { unsigned long end; @@ -876,6 +894,7 @@ } } out: + mmu_notifier_invalidate_range_end(mm, start_addr, end_addr); return start; /* which is now the end (or restart) address */ } @@ -1463,10 +1482,11 @@ { pgd_t *pgd; unsigned long next; - unsigned long end = addr + size; + unsigned long start = addr, end = addr + size; int err; BUG_ON(addr >= end); + mmu_notifier_invalidate_range_start(mm, start, end); pgd = pgd_offset(mm, addr); do { next = pgd_addr_end(addr, end); @@ -1474,6 +1494,7 @@ if (err) break; } while (pgd++, addr = next, addr != end); + mmu_notifier_invalidate_range_end(mm, start, end); return err; } EXPORT_SYMBOL_GPL(apply_to_page_range); @@ -1675,7 +1696,7 @@ * seen in the presence of one thread doing SMC and another * thread doing COW. */ - ptep_clear_flush(vma, address, page_table); + ptep_clear_flush_notify(vma, address, page_table); set_pte_at(mm, address, page_table, entry); update_mmu_cache(vma, address, entry); lru_cache_add_active(new_page); diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -26,6 +26,9 @@ #include #include #include +#include +#include +#include #include #include @@ -2038,6 +2041,7 @@ /* mm's last user has gone, and its about to be pulled down */ arch_exit_mmap(mm); + mmu_notifier_release(mm); lru_add_drain(); flush_cache_mm(mm); @@ -2242,3 +2246,144 @@ return 0; } + +static int mm_lock_cmp(const void *a, const void *b) +{ + unsigned long _a = (unsigned long)*(spinlock_t **)a; + unsigned long _b = (unsigned long)*(spinlock_t **)b; + + cond_resched(); + if (_a < _b) + return -1; + if (_a > _b) + return 1; + return 0; +} + +static unsigned long mm_lock_sort(struct mm_struct *mm, spinlock_t **locks, + int anon) +{ + struct vm_area_struct *vma; + size_t i = 0; + + for (vma = mm->mmap; vma; vma = vma->vm_next) { + if (anon) { + if (vma->anon_vma) + locks[i++] = &vma->anon_vma->lock; + } else { + if (vma->vm_file && vma->vm_file->f_mapping) + locks[i++] = &vma->vm_file->f_mapping->i_mmap_lock; + } + } + + if (!i) + goto out; + + sort(locks, i, sizeof(spinlock_t *), mm_lock_cmp, NULL); + +out: + return i; +} + +static inline unsigned long mm_lock_sort_anon_vma(struct mm_struct *mm, + spinlock_t **locks) +{ + return mm_lock_sort(mm, locks, 1); +} + +static inline unsigned long mm_lock_sort_i_mmap(struct mm_struct *mm, + spinlock_t **locks) +{ + return mm_lock_sort(mm, locks, 0); +} + +static void mm_lock_unlock(spinlock_t **locks, size_t nr, int lock) +{ + spinlock_t *last = NULL; + size_t i; + + for (i = 0; i < nr; i++) + /* Multiple vmas may use the same lock. */ + if (locks[i] != last) { + BUG_ON((unsigned long) last > (unsigned long) locks[i]); + last = locks[i]; + if (lock) + spin_lock(last); + else + spin_unlock(last); + } +} + +static inline void __mm_lock(spinlock_t **locks, size_t nr) +{ + mm_lock_unlock(locks, nr, 1); +} + +static inline void __mm_unlock(spinlock_t **locks, size_t nr) +{ + mm_lock_unlock(locks, nr, 0); +} + +/* + * This operation locks against the VM for all pte/vma/mm related + * operations that could ever happen on a certain mm. This includes + * vmtruncate, try_to_unmap, and all page faults. The holder + * must not hold any mm related lock. A single task can't take more + * than one mm lock in a row or it would deadlock. + */ +int mm_lock(struct mm_struct *mm, struct mm_lock_data *data) +{ + spinlock_t **anon_vma_locks, **i_mmap_locks; + + down_write(&mm->mmap_sem); + if (mm->map_count) { + anon_vma_locks = vmalloc(sizeof(spinlock_t *) * mm->map_count); + if (unlikely(!anon_vma_locks)) { + up_write(&mm->mmap_sem); + return -ENOMEM; + } + + i_mmap_locks = vmalloc(sizeof(spinlock_t *) * mm->map_count); + if (unlikely(!i_mmap_locks)) { + up_write(&mm->mmap_sem); + vfree(anon_vma_locks); + return -ENOMEM; + } + + data->nr_anon_vma_locks = mm_lock_sort_anon_vma(mm, anon_vma_locks); + data->nr_i_mmap_locks = mm_lock_sort_i_mmap(mm, i_mmap_locks); + + if (data->nr_anon_vma_locks) { + __mm_lock(anon_vma_locks, data->nr_anon_vma_locks); + data->anon_vma_locks = anon_vma_locks; + } else + vfree(anon_vma_locks); + + if (data->nr_i_mmap_locks) { + __mm_lock(i_mmap_locks, data->nr_i_mmap_locks); + data->i_mmap_locks = i_mmap_locks; + } else + vfree(i_mmap_locks); + } + return 0; +} + +static void mm_unlock_vfree(spinlock_t **locks, size_t nr) +{ + __mm_unlock(locks, nr); + vfree(locks); +} + +/* avoid memory allocations for mm_unlock to prevent deadlock */ +void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data) +{ + if (mm->map_count) { + if (data->nr_anon_vma_locks) + mm_unlock_vfree(data->anon_vma_locks, + data->nr_anon_vma_locks); + if (data->i_mmap_locks) + mm_unlock_vfree(data->i_mmap_locks, + data->nr_i_mmap_locks); + } + up_write(&mm->mmap_sem); +} diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c new file mode 100644 --- /dev/null +++ b/mm/mmu_notifier.c @@ -0,0 +1,241 @@ +/* + * linux/mm/mmu_notifier.c + * + * Copyright (C) 2008 Qumranet, Inc. + * Copyright (C) 2008 SGI + * Christoph Lameter + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ + +#include +#include +#include +#include +#include +#include +#include + +/* + * This function can't run concurrently against mmu_notifier_register + * or any other mmu notifier method. mmu_notifier_register can only + * run with mm->mm_users > 0 (and exit_mmap runs only when mm_users is + * zero). All other tasks of this mm already quit so they can't invoke + * mmu notifiers anymore. This can run concurrently only against + * mmu_notifier_unregister and it serializes against it with the + * unregister_lock in addition to RCU. struct mmu_notifier_mm can't go + * away from under us as the exit_mmap holds a mm_count pin itself. + * + * The ->release method can't allow the module to be unloaded, the + * module can only be unloaded after mmu_notifier_unregister run. This + * is because the release method has to run the ret instruction to + * return back here, and so it can't allow the ret instruction to be + * freed. + */ +void __mmu_notifier_release(struct mm_struct *mm) +{ + struct mmu_notifier *mn; + int srcu; + + srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu); + spin_lock(&mm->mmu_notifier_mm->unregister_lock); + while (unlikely(!hlist_empty(&mm->mmu_notifier_mm->list))) { + mn = hlist_entry(mm->mmu_notifier_mm->list.first, + struct mmu_notifier, + hlist); + /* + * We arrived before mmu_notifier_unregister so + * mmu_notifier_unregister will do nothing else than + * to wait ->release to finish and + * mmu_notifier_unregister to return. + */ + hlist_del_init(&mn->hlist); + /* + * if ->release runs before mmu_notifier_unregister it + * must be handled as it's the only way for the driver + * to flush all existing sptes before the pages in the + * mm are freed. + */ + spin_unlock(&mm->mmu_notifier_mm->unregister_lock); + /* SRCU will block mmu_notifier_unregister */ + mn->ops->release(mn, mm); + spin_lock(&mm->mmu_notifier_mm->unregister_lock); + } + spin_unlock(&mm->mmu_notifier_mm->unregister_lock); + srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu); + + /* + * Wait ->release if mmu_notifier_unregister run list_del_rcu. + * srcu can't go away from under us because one mm_count is + * hold by exit_mmap. + */ + synchronize_srcu(&mm->mmu_notifier_mm->srcu); +} + +/* + * If no young bitflag is supported by the hardware, ->clear_flush_young can + * unmap the address and return 1 or 0 depending if the mapping previously + * existed or not. + */ +int __mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + int young = 0, srcu; + + srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu); + hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_mm->list, hlist) { + if (mn->ops->clear_flush_young) + young |= mn->ops->clear_flush_young(mn, mm, address); + } + srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu); + + return young; +} + +void __mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + int srcu; + + srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu); + hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_mm->list, hlist) { + if (mn->ops->invalidate_page) + mn->ops->invalidate_page(mn, mm, address); + } + srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu); +} + +void __mmu_notifier_invalidate_range_start(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + int srcu; + + srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu); + hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_mm->list, hlist) { + if (mn->ops->invalidate_range_start) + mn->ops->invalidate_range_start(mn, mm, start, end); + } + srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu); +} + +void __mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + int srcu; + + srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu); + hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_mm->list, hlist) { + if (mn->ops->invalidate_range_end) + mn->ops->invalidate_range_end(mn, mm, start, end); + } + srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu); +} + +/* + * Must not hold mmap_sem nor any other VM related lock when calling + * this registration function. Must also ensure mm_users can't go down + * to zero while this runs to avoid races with mmu_notifier_release, + * so mm has to be current->mm or the mm should be pinned safely like + * with get_task_mm(). mmput can be called after mmu_notifier_register + * returns. mmu_notifier_unregister must be always called to + * unregister the notifier. mm_count is automatically pinned to allow + * mmu_notifier_unregister to safely run at any time later, before or + * after exit_mmap. ->release will always be called before exit_mmap + * frees the pages. + */ +int mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm) +{ + struct mm_lock_data data; + int ret; + + BUG_ON(atomic_read(&mm->mm_users) <= 0); + + ret = mm_lock(mm, &data); + if (unlikely(ret)) + goto out; + + if (!mm_has_notifiers(mm)) { + mm->mmu_notifier_mm = kmalloc(sizeof(struct mmu_notifier_mm), + GFP_KERNEL); + ret = -ENOMEM; + if (unlikely(!mm_has_notifiers(mm))) + goto out_unlock; + + ret = init_srcu_struct(&mm->mmu_notifier_mm->srcu); + if (unlikely(ret)) { + kfree(mm->mmu_notifier_mm); + mmu_notifier_mm_init(mm); + goto out_unlock; + } + INIT_HLIST_HEAD(&mm->mmu_notifier_mm->list); + spin_lock_init(&mm->mmu_notifier_mm->unregister_lock); + } + atomic_inc(&mm->mm_count); + + hlist_add_head_rcu(&mn->hlist, &mm->mmu_notifier_mm->list); +out_unlock: + mm_unlock(mm, &data); +out: + BUG_ON(atomic_read(&mm->mm_users) <= 0); + return ret; +} +EXPORT_SYMBOL_GPL(mmu_notifier_register); + +/* this is called after the last mmu_notifier_unregister() returned */ +void __mmu_notifier_mm_destroy(struct mm_struct *mm) +{ + BUG_ON(!hlist_empty(&mm->mmu_notifier_mm->list)); + cleanup_srcu_struct(&mm->mmu_notifier_mm->srcu); + kfree(mm->mmu_notifier_mm); + mm->mmu_notifier_mm = LIST_POISON1; /* debug */ +} + +/* + * This releases the mm_count pin automatically and frees the mm + * structure if it was the last user of it. It serializes against + * running mmu notifiers with SRCU and against mmu_notifier_unregister + * with the unregister lock + SRCU. All sptes must be dropped before + * calling mmu_notifier_unregister. ->release or any other notifier + * method may be invoked concurrently with mmu_notifier_unregister, + * and only after mmu_notifier_unregister returned we're guaranteed + * that ->release or any other method can't run anymore. + */ +void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm) +{ + int before_release = 0, srcu; + + BUG_ON(atomic_read(&mm->mm_count) <= 0); + + srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu); + spin_lock(&mm->mmu_notifier_mm->unregister_lock); + if (!hlist_unhashed(&mn->hlist)) { + hlist_del_rcu(&mn->hlist); + before_release = 1; + } + spin_unlock(&mm->mmu_notifier_mm->unregister_lock); + if (before_release) + /* + * exit_mmap will block in mmu_notifier_release to + * guarantee ->release is called before freeing the + * pages. + */ + mn->ops->release(mn, mm); + srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu); + + /* wait any running method to finish, including ->release */ + synchronize_srcu(&mm->mmu_notifier_mm->srcu); + + BUG_ON(atomic_read(&mm->mm_count) <= 0); + + mmdrop(mm); +} +EXPORT_SYMBOL_GPL(mmu_notifier_unregister); diff --git a/mm/mprotect.c b/mm/mprotect.c --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include #include @@ -198,10 +199,12 @@ dirty_accountable = 1; } + mmu_notifier_invalidate_range_start(mm, start, end); if (is_vm_hugetlb_page(vma)) hugetlb_change_protection(vma, start, end, vma->vm_page_prot); else change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable); + mmu_notifier_invalidate_range_end(mm, start, end); vm_stat_account(mm, oldflags, vma->vm_file, -nrpages); vm_stat_account(mm, newflags, vma->vm_file, nrpages); return 0; diff --git a/mm/mremap.c b/mm/mremap.c --- a/mm/mremap.c +++ b/mm/mremap.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -74,7 +75,11 @@ struct mm_struct *mm = vma->vm_mm; pte_t *old_pte, *new_pte, pte; spinlock_t *old_ptl, *new_ptl; + unsigned long old_start; + old_start = old_addr; + mmu_notifier_invalidate_range_start(vma->vm_mm, + old_start, old_end); if (vma->vm_file) { /* * Subtle point from Rajesh Venkatasubramanian: before @@ -116,6 +121,7 @@ pte_unmap_unlock(old_pte - 1, old_ptl); if (mapping) spin_unlock(&mapping->i_mmap_lock); + mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end); } #define LATENCY_LIMIT (64 * PAGE_SIZE) diff --git a/mm/rmap.c b/mm/rmap.c --- a/mm/rmap.c +++ b/mm/rmap.c @@ -49,6 +49,7 @@ #include #include #include +#include #include @@ -287,7 +288,7 @@ if (vma->vm_flags & VM_LOCKED) { referenced++; *mapcount = 1; /* break early from loop */ - } else if (ptep_clear_flush_young(vma, address, pte)) + } else if (ptep_clear_flush_young_notify(vma, address, pte)) referenced++; /* Pretend the page is referenced if the task has the @@ -456,7 +457,7 @@ pte_t entry; flush_cache_page(vma, address, pte_pfn(*pte)); - entry = ptep_clear_flush(vma, address, pte); + entry = ptep_clear_flush_notify(vma, address, pte); entry = pte_wrprotect(entry); entry = pte_mkclean(entry); set_pte_at(mm, address, pte, entry); @@ -717,14 +718,14 @@ * skipped over this mm) then we should reactivate it. */ if (!migration && ((vma->vm_flags & VM_LOCKED) || - (ptep_clear_flush_young(vma, address, pte)))) { + (ptep_clear_flush_young_notify(vma, address, pte)))) { ret = SWAP_FAIL; goto out_unmap; } /* Nuke the page table entry. */ flush_cache_page(vma, address, page_to_pfn(page)); - pteval = ptep_clear_flush(vma, address, pte); + pteval = ptep_clear_flush_notify(vma, address, pte); /* Move the dirty bit to the physical page now the pte is gone. */ if (pte_dirty(pteval)) @@ -849,12 +850,12 @@ page = vm_normal_page(vma, address, *pte); BUG_ON(!page || PageAnon(page)); - if (ptep_clear_flush_young(vma, address, pte)) + if (ptep_clear_flush_young_notify(vma, address, pte)) continue; /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); - pteval = ptep_clear_flush(vma, address, pte); + pteval = ptep_clear_flush_notify(vma, address, pte); /* If nonlinear, store the file page offset in the pte. */ if (page->index != linear_page_index(vma, address)) From okir at lst.de Thu Apr 24 02:09:40 2008 From: okir at lst.de (Olaf Kirch) Date: Thu, 24 Apr 2008 11:09:40 +0200 Subject: [ofa-general] [PATCH 1/8]: RDS: Fix IB max_unacked_* sysctls In-Reply-To: <200804241106.57172.okir@lst.de> References: <200804241106.57172.okir@lst.de> Message-ID: <200804241109.41035.okir@lst.de> From 4c378d81c2348ac13300d033f306bfd20e65eb76 Mon Sep 17 00:00:00 2001 From: Olaf Kirch Date: Thu, 24 Apr 2008 00:27:05 -0700 Subject: [PATCH] RDS: Fix IB max_unacked_* sysctls The sysctl variables max_unacked_{bytes,packets} are defined as unsigned longs, but the sysctl table specifies proc_dointvec as the handler. Change the variables to unsigned ints - the type is big enough. Signed-off-by: Olaf Kirch --- net/rds/rds.h | 4 ++-- net/rds/sysctl.c | 8 ++++---- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/net/rds/rds.h b/net/rds/rds.h index 2d4600a..dc1ab4c 100644 --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -667,8 +667,8 @@ extern unsigned long rds_sysctl_sndbuf_default; extern unsigned long rds_sysctl_sndbuf_max; extern unsigned long rds_sysctl_reconnect_min_jiffies; extern unsigned long rds_sysctl_reconnect_max_jiffies; -extern unsigned long rds_sysctl_max_unacked_packets; -extern unsigned long rds_sysctl_max_unacked_bytes; +extern unsigned int rds_sysctl_max_unacked_packets; +extern unsigned int rds_sysctl_max_unacked_bytes; /* threads.c */ int __init rds_threads_init(void); diff --git a/net/rds/sysctl.c b/net/rds/sysctl.c index bb0fa46..5f7ce37 100644 --- a/net/rds/sysctl.c +++ b/net/rds/sysctl.c @@ -44,8 +44,8 @@ static unsigned long rds_sysctl_reconnect_max = ~0UL; unsigned long rds_sysctl_reconnect_min_jiffies; unsigned long rds_sysctl_reconnect_max_jiffies = HZ; -unsigned long rds_sysctl_max_unacked_packets = 16; -unsigned long rds_sysctl_max_unacked_bytes = (16 << 20); +unsigned int rds_sysctl_max_unacked_packets = 16; +unsigned int rds_sysctl_max_unacked_bytes = (16 << 20); /* * These can change over time until they're official. Until that time we'll @@ -95,7 +95,7 @@ static ctl_table rds_sysctl_rds_table[] = { .ctl_name = 8, .procname = "max_unacked_packets", .data = &rds_sysctl_max_unacked_packets, - .maxlen = sizeof(int), + .maxlen = sizeof(unsigned long), .mode = 0644, .proc_handler = &proc_dointvec, }, @@ -103,7 +103,7 @@ static ctl_table rds_sysctl_rds_table[] = { .ctl_name = 9, .procname = "max_unacked_bytes", .data = &rds_sysctl_max_unacked_bytes, - .maxlen = sizeof(int), + .maxlen = sizeof(unsigned long), .mode = 0644, .proc_handler = &proc_dointvec, }, -- 1.5.4.rc3 -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir at lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax From okir at lst.de Thu Apr 24 02:09:51 2008 From: okir at lst.de (Olaf Kirch) Date: Thu, 24 Apr 2008 11:09:51 +0200 Subject: [ofa-general] Re: [PATCH 2/8]: mthca/mlx4: avoid recycling old FMR R_Keys too soon In-Reply-To: <200804241108.58748.okir@lst.de> References: <200804241106.57172.okir@lst.de> <200804241108.58748.okir@lst.de> Message-ID: <200804241109.52448.okir@lst.de> From b1092d9002fec323aaaf42dcbff88b2f46d4f3d5 Mon Sep 17 00:00:00 2001 From: Olaf Kirch Date: Thu, 24 Apr 2008 00:27:34 -0700 Subject: [PATCH] mthca/mlx4: avoid recycling old FMR R_Keys too soon When a FMR is unmapped, mthca and mlx4 reset the map count to 0, and clear the upper part of the R_Key which is used as the sequence counter. This poses a problem for RDS, which uses ib_fmr_unmap as a fence operation. RDS assumes that after issuing an unmap, the old R_Keys will be invalid for a "reasonable" period of time. For instance, Oracle processes uses shared memory buffers allocated from a pool of buffers. When a process dies, we want to reclaim these buffers - but we must make sure there are no pending RDMA operations to/from those buffers. The only way to achieve that is by using unmap and sync the TPT. However, when the sequence count is reset on unmap, there is a high likelihood that a new mapping will be given the same R_Key that was issued a few milliseconds ago. To prevent this, we suggest to not reset the sequence count when unmapping a FMR. Signed-off-by: Olaf Kirch --- drivers/infiniband/hw/mthca/mthca_mr.c | 13 ------------- drivers/net/mlx4/mr.c | 6 ------ 2 files changed, 0 insertions(+), 19 deletions(-) diff --git a/drivers/infiniband/hw/mthca/mthca_mr.c b/drivers/infiniband/hw/mthca/mthca_mr.c index aa6c70a..e4f83cb 100644 --- a/drivers/infiniband/hw/mthca/mthca_mr.c +++ b/drivers/infiniband/hw/mthca/mthca_mr.c @@ -814,15 +814,9 @@ int mthca_arbel_map_phys_fmr(struct ib_fmr *ibfmr, u64 *page_list, void mthca_tavor_fmr_unmap(struct mthca_dev *dev, struct mthca_fmr *fmr) { - u32 key; - if (!fmr->maps) return; - key = tavor_key_to_hw_index(fmr->ibmr.lkey); - key &= dev->limits.num_mpts - 1; - fmr->ibmr.lkey = fmr->ibmr.rkey = tavor_hw_index_to_key(key); - fmr->maps = 0; writeb(MTHCA_MPT_STATUS_SW, fmr->mem.tavor.mpt); @@ -830,16 +824,9 @@ void mthca_tavor_fmr_unmap(struct mthca_dev *dev, struct mthca_fmr *fmr) void mthca_arbel_fmr_unmap(struct mthca_dev *dev, struct mthca_fmr *fmr) { - u32 key; - if (!fmr->maps) return; - key = arbel_key_to_hw_index(fmr->ibmr.lkey); - key &= dev->limits.num_mpts - 1; - key = adjust_key(dev, key); - fmr->ibmr.lkey = fmr->ibmr.rkey = arbel_hw_index_to_key(key); - fmr->maps = 0; *(u8 *) fmr->mem.arbel.mpt = MTHCA_MPT_STATUS_SW; diff --git a/drivers/net/mlx4/mr.c b/drivers/net/mlx4/mr.c index 0c05a10..b9e57b0 100644 --- a/drivers/net/mlx4/mr.c +++ b/drivers/net/mlx4/mr.c @@ -602,15 +602,9 @@ EXPORT_SYMBOL_GPL(mlx4_fmr_enable); void mlx4_fmr_unmap(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u32 *lkey, u32 *rkey) { - u32 key; - if (!fmr->maps) return; - key = key_to_hw_index(fmr->mr.key); - key &= dev->caps.num_mpts - 1; - *lkey = *rkey = fmr->mr.key = hw_index_to_key(key); - fmr->maps = 0; *(u8 *) fmr->mpt = MLX4_MPT_STATUS_SW; -- 1.5.4.rc3 -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir at lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax From okir at lst.de Thu Apr 24 02:11:26 2008 From: okir at lst.de (Olaf Kirch) Date: Thu, 24 Apr 2008 11:11:26 +0200 Subject: [ofa-general] Re: [PATCH 4/8]: RDS: Increase the default number of WRs In-Reply-To: <200804241110.51026.okir@lst.de> References: <200804241106.57172.okir@lst.de> <200804241109.52448.okir@lst.de> <200804241110.51026.okir@lst.de> Message-ID: <200804241111.26726.okir@lst.de> From 8ee794c0530f6e5f5fe81bc78b5e09be8f4b1eda Mon Sep 17 00:00:00 2001 From: Olaf Kirch Date: Thu, 24 Apr 2008 00:27:35 -0700 Subject: [PATCH] RDS: Increase the default number of WRs The default number of send and receive WRs was way too low to be useful. Increment this to 256 send WRs and 1024 recv WRs. Signed-off-by: Olaf Kirch --- net/rds/ib.h | 3 +++ net/rds/ib_sysctl.c | 10 ++++------ 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/net/rds/ib.h b/net/rds/ib.h index fd0b2d8..2c6e809 100644 --- a/net/rds/ib.h +++ b/net/rds/ib.h @@ -13,6 +13,9 @@ #define RDS_IB_MAX_SGE 8 #define RDS_IB_RECV_SGE 2 +#define RDS_IB_DEFAULT_RECV_WR 1024 +#define RDS_IB_DEFAULT_SEND_WR 256 + /* * IB posts RDS_FRAG_SIZE fragments of pages to the receive queues to * try and minimize the amount of memory tied up both the device and diff --git a/net/rds/ib_sysctl.c b/net/rds/ib_sysctl.c index 813b1a6..b8a10fc 100644 --- a/net/rds/ib_sysctl.c +++ b/net/rds/ib_sysctl.c @@ -38,18 +38,16 @@ static struct ctl_table_header *rds_ib_sysctl_hdr; -/* default to what we hope will be order 0 allocations */ -unsigned long rds_ib_sysctl_max_send_wr = PAGE_SIZE / sizeof(struct ib_send_wr); -unsigned long rds_ib_sysctl_max_recv_wr = PAGE_SIZE / sizeof(struct ib_recv_wr); +unsigned long rds_ib_sysctl_max_send_wr = RDS_IB_DEFAULT_SEND_WR; +unsigned long rds_ib_sysctl_max_recv_wr = RDS_IB_DEFAULT_RECV_WR; unsigned long rds_ib_sysctl_max_recv_allocation = (128 * 1024 * 1024) / RDS_FRAG_SIZE; static unsigned long rds_ib_sysctl_max_wr_min = 1; /* hardware will fail CQ creation long before this */ static unsigned long rds_ib_sysctl_max_wr_max = (u32)~0; -/* default to rds_ib_sysctl_max_send_wr/4 */ -unsigned long rds_ib_sysctl_max_unsig_wrs = PAGE_SIZE / (4 * sizeof(struct ib_send_wr)); +unsigned long rds_ib_sysctl_max_unsig_wrs = 16; static unsigned long rds_ib_sysctl_max_unsig_wr_min = 1; -static unsigned long rds_ib_sysctl_max_unsig_wr_max = PAGE_SIZE / sizeof(struct ib_send_wr); +static unsigned long rds_ib_sysctl_max_unsig_wr_max = 64; unsigned long rds_ib_sysctl_max_unsig_bytes = (16 << 20); static unsigned long rds_ib_sysctl_max_unsig_bytes_min = 1; -- 1.5.4.rc3 -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir at lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax From okir at lst.de Thu Apr 24 02:11:56 2008 From: okir at lst.de (Olaf Kirch) Date: Thu, 24 Apr 2008 11:11:56 +0200 Subject: [ofa-general] Re: [PATCH 5/8]: RDS: Two small code reorgs in the connection code In-Reply-To: <200804241111.26726.okir@lst.de> References: <200804241106.57172.okir@lst.de> <200804241110.51026.okir@lst.de> <200804241111.26726.okir@lst.de> Message-ID: <200804241111.56693.okir@lst.de> From 2962a7fd8472d068913d0de74a12159d5438f408 Mon Sep 17 00:00:00 2001 From: Olaf Kirch Date: Thu, 24 Apr 2008 00:27:35 -0700 Subject: [PATCH] RDS: Two small code reorgs in the connection code This changes two things in the connection code 1. When we create a new connection, we need to set various fields of struct rds_connection to 0. Instead of doing them one by one, use memset. 2. The code for destroying a connection is currently inside a loop in rds_conn_exit. Move it to a separate function, because it's needed by a subsequent patch. Signed-off-by: Olaf Kirch --- net/rds/connection.c | 89 ++++++++++++++++++++++++-------------------------- 1 files changed, 43 insertions(+), 46 deletions(-) diff --git a/net/rds/connection.c b/net/rds/connection.c index ecf71b9..585123a 100644 --- a/net/rds/connection.c +++ b/net/rds/connection.c @@ -149,6 +149,8 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr, goto out; } + memset(conn, 0, sizeof(*conn)); + /* hash_node below */ conn->c_laddr = laddr; conn->c_faddr = faddr; @@ -156,21 +158,9 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr, conn->c_next_tx_seq = 1; init_MUTEX(&conn->c_send_sem); - conn->c_xmit_rm = NULL; - conn->c_xmit_sg = 0; - conn->c_xmit_hdr_off = 0; - conn->c_xmit_data_off = 0; - INIT_LIST_HEAD(&conn->c_send_queue); INIT_LIST_HEAD(&conn->c_retrans); - conn->c_next_rx_seq = 0; - - conn->c_map_queued = 0; - conn->c_map_offset = 0; - conn->c_map_bytes = 0; - conn->c_version = 0; - ret = rds_cong_get_maps(conn); if (ret) { kmem_cache_free(rds_conn_slab, conn); @@ -240,6 +230,46 @@ struct rds_connection *rds_conn_create_outgoing(__be32 laddr, __be32 faddr, EXPORT_SYMBOL_GPL(rds_conn_create); EXPORT_SYMBOL_GPL(rds_conn_create_outgoing); +static void __rds_conn_destroy(struct rds_connection *conn) +{ + struct rds_message *rm, *rtmp; + + rdsdebug("freeing conn %p for %u.%u.%u.%u -> " + "%u.%u.%u.%u\n", conn, NIPQUAD(conn->c_laddr), + NIPQUAD(conn->c_faddr)); + + /* wait for the rds thread to shut it down */ + atomic_set(&conn->c_state, RDS_CONN_ERROR); + cancel_delayed_work(&conn->c_conn_w); + queue_work(rds_wq, &conn->c_down_w); + flush_workqueue(rds_wq); + + /* tear down queued messages */ + list_for_each_entry_safe(rm, rtmp, + &conn->c_send_queue, + m_conn_item) { + list_del_init(&rm->m_conn_item); + BUG_ON(!list_empty(&rm->m_sock_item)); + rds_message_put(rm); + } + if (conn->c_xmit_rm) + rds_message_put(conn->c_xmit_rm); + + conn->c_trans->conn_free(conn->c_transport_data); + + /* + * The congestion maps aren't freed up here. They're + * freed by rds_cong_exit() after all the connections + * have been freed. + */ + rds_cong_remove_conn(conn); + + BUG_ON(!list_empty(&conn->c_retrans)); + kmem_cache_free(rds_conn_slab, conn); + + rds_conn_count--; +} + static void rds_conn_message_info(struct socket *sock, unsigned int len, struct rds_info_iterator *iter, struct rds_info_lengths *lens, @@ -376,7 +406,6 @@ void __exit rds_conn_exit(void) struct hlist_head *head; struct hlist_node *pos, *tmp; struct rds_connection *conn; - struct rds_message *rm, *rtmp; size_t i; for (i = 0, head = rds_conn_hash; i < ARRAY_SIZE(rds_conn_hash); @@ -385,40 +414,8 @@ void __exit rds_conn_exit(void) /* the conn won't reconnect once it's unhashed */ hlist_del_init(&conn->c_hash_node); - rds_conn_count--; - - rdsdebug("freeing conn %p for %u.%u.%u.%u -> " - "%u.%u.%u.%u\n", conn, NIPQUAD(conn->c_laddr), - NIPQUAD(conn->c_faddr)); - - /* wait for the rds thread to shut it down */ - atomic_set(&conn->c_state, RDS_CONN_ERROR); - cancel_delayed_work(&conn->c_conn_w); - queue_work(rds_wq, &conn->c_down_w); - flush_workqueue(rds_wq); - - /* tear down queued messages */ - list_for_each_entry_safe(rm, rtmp, - &conn->c_send_queue, - m_conn_item) { - list_del_init(&rm->m_conn_item); - BUG_ON(!list_empty(&rm->m_sock_item)); - rds_message_put(rm); - } - if (conn->c_xmit_rm) - rds_message_put(conn->c_xmit_rm); - - conn->c_trans->conn_free(conn->c_transport_data); - - /* - * The congestion maps aren't freed up here. They're - * freed by rds_cong_exit() after all the connections - * have been freed. - */ - rds_cong_remove_conn(conn); - BUG_ON(!list_empty(&conn->c_retrans)); - kmem_cache_free(rds_conn_slab, conn); + __rds_conn_destroy(conn); } } -- 1.5.4.rc3 -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir at lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax From okir at lst.de Thu Apr 24 02:12:19 2008 From: okir at lst.de (Olaf Kirch) Date: Thu, 24 Apr 2008 11:12:19 +0200 Subject: [ofa-general] Re: [PATCH 6/8]: RDS: Use IB for loopback In-Reply-To: <200804241111.56693.okir@lst.de> References: <200804241106.57172.okir@lst.de> <200804241111.26726.okir@lst.de> <200804241111.56693.okir@lst.de> Message-ID: <200804241112.19866.okir@lst.de> From 2a91ce118f8d4e7e644ea849f61bd8953faaacc6 Mon Sep 17 00:00:00 2001 From: Olaf Kirch Date: Thu, 24 Apr 2008 00:27:36 -0700 Subject: [PATCH] RDS: Use IB for loopback Currently, when an application wants to send to a RDS port on the local host, RDS will create a connection using the special loopback transport. In order to be able to test RDS (and RDS over RDMA) faithfully on standalone machines, we want loopback traffic to use the IB transport if possible. This patch makes the necessary changes. This turns out to be a little tricky, as we need two rds_connection objects with the same address pair. The current code doesn't really handle this, so we have to jump through some hoops. - loopback connections for IB are represented by two rds_connections; the "active" connection created when we initiate the connect, and a "passive" connection created when we accept the incoming RC. - The active connection is used to transmit packets, which are then received by the passive conn. - the passive conn is never added to the global hash table; instead it is kept in conn->c_passive. Signed-off-by: Olaf Kirch --- net/rds/connection.c | 42 +++++++++++++++++++++++++++++++++++------- net/rds/rds.h | 3 +++ net/rds/tcp.c | 1 + net/rds/threads.c | 10 +++++++++- 4 files changed, 48 insertions(+), 8 deletions(-) diff --git a/net/rds/connection.c b/net/rds/connection.c index 585123a..5d7788e 100644 --- a/net/rds/connection.c +++ b/net/rds/connection.c @@ -130,15 +130,26 @@ void rds_conn_reset(struct rds_connection *conn) */ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr, struct rds_transport *trans, gfp_t gfp, - int allow_loop_transport) + int is_outgoing) { - struct rds_connection *conn, *tmp; + struct rds_connection *conn, *tmp, *parent = NULL; struct hlist_head *head = rds_conn_bucket(laddr, faddr); unsigned long flags; int ret; spin_lock_irqsave(&rds_conn_lock, flags); conn = rds_conn_lookup(head, laddr, faddr, trans); + if (conn + && conn->c_loopback + && conn->c_trans != &rds_loop_transport + && !is_outgoing) { + /* This is a looped back IB connection, and we're + * called by the code handling the incoming connect. + * We need a second connection object into which we + * can stick the other QP. */ + parent = conn; + conn = parent->c_passive; + } spin_unlock_irqrestore(&rds_conn_lock, flags); if (conn) goto out; @@ -151,7 +162,7 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr, memset(conn, 0, sizeof(*conn)); - /* hash_node below */ + INIT_HLIST_NODE(&conn->c_hash_node); conn->c_laddr = laddr; conn->c_faddr = faddr; spin_lock_init(&conn->c_lock); @@ -173,8 +184,16 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr, * can bind to the destination address then we'd rather the messages * flow through loopback rather than either transport. */ - if (allow_loop_transport && rds_trans_get_preferred(faddr)) - trans = &rds_loop_transport; + if (rds_trans_get_preferred(faddr)) { + conn->c_loopback = 1; + if (is_outgoing && trans->t_prefer_loopback) { + /* "outgoing" connection - and the transport + * says it wants the connection handled by the + * loopback transport. This is what TCP does. + */ + trans = &rds_loop_transport; + } + } conn->c_trans = trans; @@ -198,14 +217,21 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr, NIPQUAD(laddr), NIPQUAD(faddr)); spin_lock_irqsave(&rds_conn_lock, flags); - tmp = rds_conn_lookup(head, laddr, faddr, trans); + if (parent == NULL) { + tmp = rds_conn_lookup(head, laddr, faddr, trans); + if (tmp == NULL) + hlist_add_head(&conn->c_hash_node, head); + } else { + if ((tmp = parent->c_passive) == NULL) + parent->c_passive = conn; + } + if (tmp) { trans->conn_free(conn->c_transport_data); kmem_cache_free(rds_conn_slab, conn); conn = tmp; } else { rds_cong_add_conn(conn); - hlist_add_head(&conn->c_hash_node, head); rds_conn_count++; } @@ -415,6 +441,8 @@ void __exit rds_conn_exit(void) /* the conn won't reconnect once it's unhashed */ hlist_del_init(&conn->c_hash_node); + if (conn->c_passive) + __rds_conn_destroy(conn->c_passive); __rds_conn_destroy(conn); } } diff --git a/net/rds/rds.h b/net/rds/rds.h index dc1ab4c..d5a966d 100644 --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -121,6 +121,8 @@ struct rds_connection { struct hlist_node c_hash_node; __be32 c_laddr; __be32 c_faddr; + unsigned int c_loopback : 1; + struct rds_connection * c_passive; spinlock_t c_lock; struct rds_cong_map *c_lcong; @@ -342,6 +344,7 @@ struct rds_transport { struct list_head t_item; struct module *t_owner; char *t_name; + unsigned int t_prefer_loopback : 1; int (*laddr_check)(__be32 addr); int (*conn_alloc)(struct rds_connection *conn, gfp_t gfp); void (*conn_free)(void *data); diff --git a/net/rds/tcp.c b/net/rds/tcp.c index baf876e..f4e6fce 100644 --- a/net/rds/tcp.c +++ b/net/rds/tcp.c @@ -252,6 +252,7 @@ struct rds_transport rds_tcp_transport = { .exit = rds_tcp_exit, .t_owner = THIS_MODULE, .t_name = "tcp", + .t_prefer_loopback = 1, }; int __init rds_tcp_init(void) diff --git a/net/rds/threads.c b/net/rds/threads.c index 2a5dc0b..b86fbc3 100644 --- a/net/rds/threads.c +++ b/net/rds/threads.c @@ -178,6 +178,11 @@ void rds_shutdown_worker(struct work_struct *work) up(&conn->c_send_sem); if (!rds_conn_transition(conn, RDS_CONN_DISCONNECTING, RDS_CONN_DOWN)) { + /* This can happen - eg when we're in the middle of tearing + * down the connection, and someone unloads the rds module. + * Quite reproduceable with loopback connections. + * Mostly harmless. + */ rds_conn_error(conn, "%s: failed to transition to state DOWN, " "current state is %d\n", @@ -187,7 +192,10 @@ void rds_shutdown_worker(struct work_struct *work) } } - /* then reconnect if it's still live */ + /* Then reconnect if it's still live. + * The passive side of an IB loopback connection is never added + * to the conn hash, so we never trigger a reconnect on this + * conn - the reconnect is always triggered by the active peer. */ cancel_delayed_work(&conn->c_conn_w); if (!hlist_unhashed(&conn->c_hash_node)) { rds_queue_reconnect(conn); -- 1.5.4.rc3 -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir at lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax From okir at lst.de Thu Apr 24 02:13:08 2008 From: okir at lst.de (Olaf Kirch) Date: Thu, 24 Apr 2008 11:13:08 +0200 Subject: [ofa-general] Re: [PATCH 7/8]: RDS: Implement rds ping In-Reply-To: <200804241112.19866.okir@lst.de> References: <200804241106.57172.okir@lst.de> <200804241111.56693.okir@lst.de> <200804241112.19866.okir@lst.de> Message-ID: <200804241113.08841.okir@lst.de> From 24000a7c11fedb519aab11807703d91ae49ac421 Mon Sep 17 00:00:00 2001 From: Olaf Kirch Date: Thu, 24 Apr 2008 00:27:36 -0700 Subject: [PATCH] RDS: Implement rds ping Several people have asked for a way to test reachability of remote nodes via RDS. This is it - rds ping. RDS ping is implemented by sending packets to port 0. As a matter of simplicity, we do not handle packet payloads at this time - the ping response is always an empty packet. Signed-off-by: Olaf Kirch --- net/rds/cong.c | 2 +- net/rds/rds.h | 5 ++++ net/rds/recv.c | 6 +++++ net/rds/send.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++- net/rds/stats.c | 2 + net/rds/sysctl.c | 10 +++++++++ 6 files changed, 78 insertions(+), 3 deletions(-) diff --git a/net/rds/cong.c b/net/rds/cong.c index 2db2362..4ec85ce 100644 --- a/net/rds/cong.c +++ b/net/rds/cong.c @@ -348,7 +348,7 @@ int rds_cong_wait(struct rds_cong_map *map, __be16 port, int nonblock, struct rd if (!rds_cong_test_bit(map, port)) return 0; if (nonblock) { - if (rs->rs_cong_monitor) { + if (rs && rs->rs_cong_monitor) { unsigned long flags; /* It would have been nice to have an atomic set_bit on diff --git a/net/rds/rds.h b/net/rds/rds.h index d5a966d..a0fb20c 100644 --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -487,6 +487,7 @@ struct rds_statistics { unsigned long s_recv_delayed_retry; unsigned long s_recv_ack_required; unsigned long s_recv_rdma_bytes; + unsigned long s_recv_ping; unsigned long s_send_queue_empty; unsigned long s_send_queue_full; unsigned long s_send_sem_contention; @@ -497,6 +498,7 @@ struct rds_statistics { unsigned long s_send_ack_required; unsigned long s_send_rdma; unsigned long s_send_rdma_bytes; + unsigned long s_send_pong; unsigned long s_page_remainder_hit; unsigned long s_page_remainder_miss; unsigned long s_cong_update_queued; @@ -570,6 +572,7 @@ rds_conn_up(struct rds_connection *conn) } /* message.c */ +struct rds_message *rds_message_alloc(unsigned int nents, gfp_t gfp); struct rds_message *rds_message_copy_from_user(struct iovec *first_iov, size_t total_len); void rds_message_populate_header(struct rds_header *hdr, __be16 sport, @@ -641,6 +644,7 @@ void rds_send_drop_acked(struct rds_connection *conn, u64 ack, is_acked_func is_acked); int rds_send_acked_before(struct rds_connection *conn, u64 seq); void rds_send_remove_from_sock(struct list_head *messages, int status); +int rds_send_pong(struct rds_connection *conn, __be16 dport); /* rdma.c */ void rds_rdma_unuse(struct rds_sock *rs, u32 r_key, int force); @@ -672,6 +676,7 @@ extern unsigned long rds_sysctl_reconnect_min_jiffies; extern unsigned long rds_sysctl_reconnect_max_jiffies; extern unsigned int rds_sysctl_max_unacked_packets; extern unsigned int rds_sysctl_max_unacked_bytes; +extern unsigned int rds_sysctl_ping_enable; /* threads.c */ int __init rds_threads_init(void); diff --git a/net/rds/recv.c b/net/rds/recv.c index 9adb24d..da3c879 100644 --- a/net/rds/recv.c +++ b/net/rds/recv.c @@ -196,6 +196,12 @@ void rds_recv_incoming(struct rds_connection *conn, __be32 saddr, __be32 daddr, } conn->c_next_rx_seq = be64_to_cpu(inc->i_hdr.h_sequence) + 1; + if (rds_sysctl_ping_enable && inc->i_hdr.h_dport == 0) { + rds_stats_inc(s_recv_ping); + rds_send_pong(conn, inc->i_hdr.h_sport); + goto out; + } + rs = rds_find_bound(daddr, inc->i_hdr.h_dport); if (rs == NULL) { rds_stats_inc(s_recv_drop_no_sock); diff --git a/net/rds/send.c b/net/rds/send.c index a2a5b2a..26e1e3e 100644 --- a/net/rds/send.c +++ b/net/rds/send.c @@ -700,8 +700,7 @@ int rds_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, if (msg->msg_namelen) { /* XXX fail non-unicast destination IPs? */ - if (msg->msg_namelen < sizeof(*usin) || usin->sin_family != AF_INET || - usin->sin_port == 0) { + if (msg->msg_namelen < sizeof(*usin) || usin->sin_family != AF_INET) { ret = -EINVAL; goto out; } @@ -820,3 +819,56 @@ out: rds_message_put(rm); return ret; } + +/* + * Reply to a ping packet. + */ +int +rds_send_pong(struct rds_connection *conn, __be16 dport) +{ + struct rds_message *rm; + unsigned long flags; + int ret = 0; + + rm = rds_message_alloc(0, GFP_ATOMIC); + if (rm == NULL) { + ret = -ENOMEM; + goto out; + } + + rm->m_daddr = conn->c_faddr; + + /* If the connection is down, trigger a connect. We may + * have scheduled a delayed reconnect however - in this case + * we should not interfere. + */ + if (rds_conn_state(conn) == RDS_CONN_DOWN + && !test_and_set_bit(RDS_RECONNECT_PENDING, &conn->c_flags)) + queue_delayed_work(rds_wq, &conn->c_conn_w, 0); + + ret = rds_cong_wait(conn->c_fcong, dport, 1, NULL); + if (ret) + goto out; + + spin_lock_irqsave(&conn->c_lock, flags); + list_add_tail(&rm->m_conn_item, &conn->c_send_queue); + set_bit(RDS_MSG_ON_CONN, &rm->m_flags); + rds_message_addref(rm); + rm->m_inc.i_conn = conn; + + rds_message_populate_header(&rm->m_inc.i_hdr, 0, dport, + conn->c_next_tx_seq); + conn->c_next_tx_seq++; + spin_unlock_irqrestore(&conn->c_lock, flags); + + rds_stats_inc(s_send_pong); + + queue_delayed_work(rds_wq, &conn->c_send_w, 0); + rds_message_put(rm); + return 0; + +out: + if (rm) + rds_message_put(rm); + return ret; +} diff --git a/net/rds/stats.c b/net/rds/stats.c index abf7103..0bd91fa 100644 --- a/net/rds/stats.c +++ b/net/rds/stats.c @@ -53,6 +53,7 @@ static char *rds_stat_names[] = { "recv_delayed_retry", "recv_ack_required", "recv_rdma_bytes", + "recv_ping", "send_queue_empty", "send_queue_full", "send_sem_contention", @@ -63,6 +64,7 @@ static char *rds_stat_names[] = { "send_ack_required", "send_rdma", "send_rdma_bytes", + "send_pong", "page_remainder_hit", "page_remainder_miss", "cong_update_queued", diff --git a/net/rds/sysctl.c b/net/rds/sysctl.c index 5f7ce37..7b18c0a 100644 --- a/net/rds/sysctl.c +++ b/net/rds/sysctl.c @@ -47,6 +47,8 @@ unsigned long rds_sysctl_reconnect_max_jiffies = HZ; unsigned int rds_sysctl_max_unacked_packets = 16; unsigned int rds_sysctl_max_unacked_bytes = (16 << 20); +unsigned int rds_sysctl_ping_enable = 1; + /* * These can change over time until they're official. Until that time we'll * give apps a way to figure out what the values are in a given machine. @@ -107,6 +109,14 @@ static ctl_table rds_sysctl_rds_table[] = { .mode = 0644, .proc_handler = &proc_dointvec, }, + { + .ctl_name = 10, + .procname = "ping_enable", + .data = &rds_sysctl_ping_enable, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, /* 100+ are reserved for transport subdirs */ { .ctl_name = 0} }; -- 1.5.4.rc3 -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir at lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax From okir at lst.de Thu Apr 24 02:14:50 2008 From: okir at lst.de (Olaf Kirch) Date: Thu, 24 Apr 2008 11:14:50 +0200 Subject: [ofa-general] [PATCH 0/8] RDS patch set Message-ID: <200804241114.51260.okir@lst.de> Hi all, here's another set of patches related to RDS. The patches can be found in git://git.openfabrics.org/ofed_1_3/linux-2.6 and git://git.openfabrics.org/ofed_1_3/rds-tools There are seven kernel patches. I would very much like to see the first four of them in OFED 1.3.1 if possible. On the remaining 3, I'm not particularly religious - I'm fine if they make it into 1.3.* at a later time. RDS: Fix IB max_unacked_* sysctls Straightforward bugfix. mthca/mlx4: avoid recycling old FMR R_Keys too soon This is a re-run of a mthca patch I posted a while back; Jack Morgenstein requested that I should make the same change in the mlx4 driver. Here it is; review and feedback much appreciated. Reduce struct rds_ib_send_work size RDS: Increase the default number of WRs These two patches go together; they shrink the size of the send work entry we allocate in favor of allocating more of them. I would very much like to see these in OFED 1.3.1 RDS: Two small code reorgs in the connection code RDS: Use IB for loopback These also go together. For loopback traffic, we need to use IB if available, instead of the special loopback transport currently used. The reason is that lots of our tests run on single hosts over loopback, and we want to stress things like RDMA. RDS: Implement rds ping This is really a new feature. Essentially, ping over RDS. There's a companion patch to rds-tools that implements the rds-ping user space utility that leverages the functionality added by the kernel patch above. Olaf -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir at lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax From holt at sgi.com Thu Apr 24 02:51:12 2008 From: holt at sgi.com (Robin Holt) Date: Thu, 24 Apr 2008 04:51:12 -0500 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080424064753.GH24536@duo.random> References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com> <20080423163713.GC24536@duo.random> <20080423221928.GV24536@duo.random> <20080424064753.GH24536@duo.random> Message-ID: <20080424095112.GC30298@sgi.com> I am not certain of this, but it seems like this patch leaves things in a somewhat asymetric state. At the very least, I think that asymetry should be documented in the comments of either mmu_notifier.h or .c. Before I do the first mmu_notifier_register, all places that test for mm_has_notifiers(mm) will return false and take the fast path. After I do some mmu_notifier_register()s and their corresponding mmu_notifier_unregister()s, The mm_has_notifiers(mm) will return true and the slow path will be taken. This, despite all registered notifiers having unregistered. It seems to me the work done by mmu_notifier_mm_destroy should really be done inside the mm_lock()/mm_unlock area of mmu_unregister and mm_notifier_release when we have removed the last entry. That would give the users job the same performance after they are done using the special device that they had prior to its use. On Thu, Apr 24, 2008 at 08:49:40AM +0200, Andrea Arcangeli wrote: ... > diff --git a/mm/memory.c b/mm/memory.c > --- a/mm/memory.c > +++ b/mm/memory.c ... > @@ -603,25 +605,39 @@ > * readonly mappings. The tradeoff is that copy_page_range is more > * efficient than faulting. > */ > + ret = 0; > if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_PFNMAP|VM_INSERTPAGE))) { > if (!vma->anon_vma) > - return 0; > + goto out; > } > > - if (is_vm_hugetlb_page(vma)) > - return copy_hugetlb_page_range(dst_mm, src_mm, vma); > + if (unlikely(is_vm_hugetlb_page(vma))) { > + ret = copy_hugetlb_page_range(dst_mm, src_mm, vma); > + goto out; > + } > > + if (is_cow_mapping(vma->vm_flags)) > + mmu_notifier_invalidate_range_start(src_mm, addr, end); > + > + ret = 0; I don't think this is needed. ... > +/* avoid memory allocations for mm_unlock to prevent deadlock */ > +void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data) > +{ > + if (mm->map_count) { > + if (data->nr_anon_vma_locks) > + mm_unlock_vfree(data->anon_vma_locks, > + data->nr_anon_vma_locks); > + if (data->i_mmap_locks) I think you really want data->nr_i_mmap_locks. ... > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c > new file mode 100644 > --- /dev/null > +++ b/mm/mmu_notifier.c ... > +/* > + * This function can't run concurrently against mmu_notifier_register > + * or any other mmu notifier method. mmu_notifier_register can only > + * run with mm->mm_users > 0 (and exit_mmap runs only when mm_users is > + * zero). All other tasks of this mm already quit so they can't invoke > + * mmu notifiers anymore. This can run concurrently only against > + * mmu_notifier_unregister and it serializes against it with the > + * unregister_lock in addition to RCU. struct mmu_notifier_mm can't go > + * away from under us as the exit_mmap holds a mm_count pin itself. > + * > + * The ->release method can't allow the module to be unloaded, the > + * module can only be unloaded after mmu_notifier_unregister run. This > + * is because the release method has to run the ret instruction to > + * return back here, and so it can't allow the ret instruction to be > + * freed. > + */ The second paragraph of this comment seems extraneous. ... > + /* > + * Wait ->release if mmu_notifier_unregister run list_del_rcu. > + * srcu can't go away from under us because one mm_count is > + * hold by exit_mmap. > + */ These two sentences don't make any sense to me. ... > +void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm) > +{ > + int before_release = 0, srcu; > + > + BUG_ON(atomic_read(&mm->mm_count) <= 0); > + > + srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu); > + spin_lock(&mm->mmu_notifier_mm->unregister_lock); > + if (!hlist_unhashed(&mn->hlist)) { > + hlist_del_rcu(&mn->hlist); > + before_release = 1; > + } > + spin_unlock(&mm->mmu_notifier_mm->unregister_lock); > + if (before_release) > + /* > + * exit_mmap will block in mmu_notifier_release to > + * guarantee ->release is called before freeing the > + * pages. > + */ > + mn->ops->release(mn, mm); I am not certain about the need to do the release callout when the driver has already told this subsystem it is done. For XPMEM, this callout would immediately return. I would expect it to be the same or GRU. Thanks, Robin From jlentini at netapp.com Thu Apr 24 06:50:48 2008 From: jlentini at netapp.com (James Lentini) Date: Thu, 24 Apr 2008 09:50:48 -0400 (EDT) Subject: [ofa-general] mapping IP addresses to GIDs across IP subnets In-Reply-To: <20080424054235.GA11416@obsidianresearch.com> References: <000401c8a4ca$c156a810$94248686@amr.corp.intel.com> <20080424054235.GA11416@obsidianresearch.com> Message-ID: On Wed, 23 Apr 2008, Jason Gunthorpe wrote: > On Wed, Apr 23, 2008 at 09:56:50AM -0400, James Lentini wrote: > > > I'm hoping that someone has a wonderfully brilliant idea for this > > > that would take about 1 day to implement. :) > > > > Is it time to bring back ATS? > > > > http://lists.openfabrics.org/pipermail/general/2005-August/010247.html > > Could you post this someplace where people who are not a member of the > DAT group can access it? Here's a publically accessible link: http://www.datcollaborative.org/ATS_v1.pdf From ogerlitz at voltaire.com Thu Apr 24 06:52:07 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 24 Apr 2008 16:52:07 +0300 Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup. In-Reply-To: <20080423133816.6c1b6315.weiny2@llnl.gov> References: <20080423133816.6c1b6315.weiny2@llnl.gov> Message-ID: <48109087.6030606@voltaire.com> Ira Weiny wrote: > The symptom is that nodes drop out of the IPoIB mcast group after a node > temporarily goes catatonic. The details are: > > 1) Issues on a node cause a soft lockup of the node. > 2) OpenSM does a normal light sweep. > 3) MADs to the node time out since the node is in a "bad state" > 4) OpenSM marks the node down and drops it from internal tables, including > mcast groups. > 5) Node recovers from soft lock up condition. > 6) A subsequent sweep causes OpenSM see the node and add it back to the > fabric. As Hal noted, client reregister is the way to go. In a similar discussion in the past the conclusion was that the SM should (maybe even according to the spec, but according to common sense is fine as well, I think) set the re-register bit where in that case IPoIB rejoins and we are done. At the time, I understood that openSM would do so (http://lists.openfabrics.org/pipermail/general/2007-September/041237.html), am I wrong, or maybe the case brought on that thread (switch/port going down and a whole sub fabric is removed from the SM point of view where the links remain up from the view point of the nodes) was different? the basic point is a case where a node link is UP and the SM lost this node for some time and now sees it again. We used to call it "the active/active" transition and an SM maybe need special logic for it. Or. From okir at lst.de Thu Apr 24 02:10:50 2008 From: okir at lst.de (Olaf Kirch) Date: Thu, 24 Apr 2008 11:10:50 +0200 Subject: [ofa-general] ***SPAM*** Re: [PATCH 3/8]: RDS: Reduce struct rds_ib_send_work size In-Reply-To: <200804241109.52448.okir@lst.de> References: <200804241106.57172.okir@lst.de> <200804241108.58748.okir@lst.de> <200804241109.52448.okir@lst.de> Message-ID: <200804241110.51026.okir@lst.de> From 8fcaa7d5000c8e3b2b7db235d2c279ccb98a6dec Mon Sep 17 00:00:00 2001 From: Olaf Kirch Date: Thu, 24 Apr 2008 00:27:35 -0700 Subject: [PATCH] RDS: reduce struct rds_ib_send_work size Currently, struct rds_ib_send_work contains an array of 29 ib_sge's, making the total size of each entry around 512 bytes. This severely limits the maximum size of the send WQ, as we allocate the array of work entries via one kmalloc() call. Another problem with this approach is that SENDs never use more than 2 SGEs anyway, so that fully utilize the SGE array for RDMA ops only, anyway. Change this to 8 SGEs, which seems to be a better balance. An alternative (but much more intrusive) patch would have been to replace the s_sge array with a pointer, and allocate the sge array dynamically. For OFED 1.3.1 I chose the more conservative approach. Signed-off-by: Olaf Kirch --- net/rds/ib.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/rds/ib.h b/net/rds/ib.h index eda4a68..fd0b2d8 100644 --- a/net/rds/ib.h +++ b/net/rds/ib.h @@ -10,7 +10,7 @@ #define RDS_FMR_SIZE 256 #define RDS_FMR_POOL_SIZE 2048 -#define RDS_IB_MAX_SGE 29 +#define RDS_IB_MAX_SGE 8 #define RDS_IB_RECV_SGE 2 /* -- 1.5.4.rc3 -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir at lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax From okir at lst.de Thu Apr 24 02:14:03 2008 From: okir at lst.de (Olaf Kirch) Date: Thu, 24 Apr 2008 11:14:03 +0200 Subject: [ofa-general] ***SPAM*** Re: [PATCH 8/8]: rds-tools: add new rds-ping utility In-Reply-To: <200804241113.08841.okir@lst.de> References: <200804241106.57172.okir@lst.de> <200804241112.19866.okir@lst.de> <200804241113.08841.okir@lst.de> Message-ID: <200804241114.03810.okir@lst.de> From 01d43fd80fe8ca463ec01c073bf3d8c03c7daa26 Mon Sep 17 00:00:00 2001 From: Olaf Kirch Date: Thu, 24 Apr 2008 00:49:37 -0700 Subject: [PATCH] Add new rds-ping utility This adds a new utility that acts a lot like the traditional ping command, but uses RDS instead of ICMP. Its main purpose is to have a simple tool to check the reachability of remote nodes. The required kernel patch is posted separately. Signed-off-by: Olaf Kirch --- Makefile.in | 3 +- rds-ping.1 | 69 +++++++++++ rds-ping.c | 385 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 456 insertions(+), 1 deletions(-) create mode 100644 rds-ping.1 create mode 100644 rds-ping.c diff --git a/Makefile.in b/Makefile.in index 7cad5f1..363bb58 100644 --- a/Makefile.in +++ b/Makefile.in @@ -24,7 +24,7 @@ else COMMON_OBJECTS = $(subst .c,.o,$(filter-out pfhack.c,$(COMMON_SOURCES))) endif -PROGRAMS = rds-gen rds-sink rds-info rds-stress +PROGRAMS = rds-gen rds-sink rds-info rds-stress rds-ping all-programs: $(PROGRAMS) @@ -65,6 +65,7 @@ EXTRA_DIST := rds-info.1 \ rds-gen.1 \ rds-sink.1 \ rds-stress.1 \ + rds-ping.1 \ rds.7 \ rds-rdma.7 \ Makefile.in \ diff --git a/rds-ping.1 b/rds-ping.1 new file mode 100644 index 0000000..ae06787 --- /dev/null +++ b/rds-ping.1 @@ -0,0 +1,69 @@ +.Dd Apr 22, 2008 +.Dt RDS-PING 1 +.Os +.Sh NAME +.Nm rds-ping +.Nd test reachability of remote node over RDS +.Pp +.Sh SYNOPSIS +.Nm rds-ping +.Bk -words +.Op Fl c Ar count +.Op Fl i Ar interval +.Op Fl I Ar local_addr +.Ar remote_addr + +.Sh DESCRIPTION +.Nm rds-ping +is used to test whether a remote node is reachable over RDS. +Its interface is designed to operate pretty much the standard +.Xr ping 8 +utility, even though the way it works is pretty different. +.Pp +.Nm rds-ping +opens several RDS sockets and sends packets to port 0 on +the indicated host. This is a special port number to which +no socket is bound; instead, the kernel processes incoming +packets and responds to them. +.Sh OPTIONS +The following options are available for use on the command line: +.Bl -tag -width Ds +.It Fl c Ar count +Causes +.Nm rds-ping +to exit after sending (and receiving) the specified number of +packets. +.It Fl I Ar address +By default, +.Nm rds-ping +will pick the local source address for the RDS socket based +on routing information for the destination address (i.e. if +packets to the given destination would be routed through interface +.Nm ib0 , +then it will use the IP address of +.Nm ib0 +as source address). +Using the +.Fl I +option, you can override this choice. +.It Fl i Ar timeout +By default, +.Nm rds-ping +will wait for one second between sending packets. Use this option +to specified a different interval. The timeout value is given in +seconds, and can be a floating point number. Optionally, append +.Nm msec +or +.Nm usec +to specify a timeout in milliseconds or microseconds, respectively. +.It +Specifying a timeout considerably smaller than the packet round-trip +time will produce unexpected results. +.El +.Sh AUTHORS +.Nm rds-ping +was written by Olaf Kirch . +.Sh SEE ALSO +.Xr rds 7 , +.Xr rds-info 1 , +.Xr rds-stress 1 . diff --git a/rds-ping.c b/rds-ping.c new file mode 100644 index 0000000..e9c88fc --- /dev/null +++ b/rds-ping.c @@ -0,0 +1,385 @@ +/* + * rds-ping utility + * + * Test reachability of a remote RDS node by sending a packet to port 0. + * + * Copyright (C) 2008 Oracle. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "net/rds.h" + +#ifdef DYNAMIC_PF_RDS +#include "pfhack.h" +#endif + +#define die(fmt...) do { \ + fprintf(stderr, fmt); \ + exit(1); \ +} while (0) + +#define die_errno(fmt, args...) do { \ + fprintf(stderr, fmt ", errno: %d (%s)\n", ##args , errno,\ + strerror(errno)); \ + exit(1); \ +} while (0) + +static struct timeval opt_wait = { 1, 1 }; /* 1s */ +static unsigned long opt_count; +static struct in_addr opt_srcaddr; +static struct in_addr opt_dstaddr; + +/* For reasons of simplicity, RDS ping does not use a packet + * payload that is being echoed, the way ICMP does. + * Instead, we open a number of sockets on different ports, and + * match packet sequence numbers with ports. + */ +#define NSOCKETS 8 + +struct socket { + int fd; + unsigned int sent_id; + struct timeval sent_ts; + unsigned int nreplies; +}; + + +static int do_ping(void); +static void report_packet(struct socket *sp, const struct timeval *now, + const struct in_addr *from, int err); +static void usage(const char *complaint); +static int rds_socket(struct in_addr *src, struct in_addr *dst); +static int parse_timeval(const char *, struct timeval *); +static int parse_long(const char *ptr, unsigned long *); +static int parse_addr(const char *ptr, struct in_addr *); + +int +main(int argc, char **argv) +{ + int c; + + while ((c = getopt(argc, argv, "c:i:I:")) != -1) { + switch (c) { + case 'c': + if (!parse_long(optarg, &opt_count)) + die("Bad packet count <%s>\n", optarg); + break; + + case 'I': + if (!parse_addr(optarg, &opt_srcaddr)) + die("Unknown source address <%s>\n", optarg); + break; + + case 'i': + if (!parse_timeval(optarg, &opt_wait)) + die("Bad wait time <%s>\n", optarg); + break; + + default: + usage("Unknown option"); + } + } + + if (optind + 1 != argc) + usage("Missing destination address"); + if (!parse_addr(argv[optind], &opt_dstaddr)) + die("Cannot parse destination address <%s>\n", argv[optind]); + + return do_ping(); +} + +/* returns a - b in usecs */ +static inline long +usec_sub(const struct timeval *a, const struct timeval *b) +{ + return ((long)(a->tv_sec - b->tv_sec) * 1000000UL) + a->tv_usec - b->tv_usec; +} + +static int +do_ping(void) +{ + struct sockaddr_in sin; + unsigned int sent = 0, recv = 0; + struct timeval next_ts; + struct socket socket[NSOCKETS]; + struct pollfd pfd[NSOCKETS]; + int i, next = 0; + + for (i = 0; i < NSOCKETS; ++i) { + int fd; + + fd = rds_socket(&opt_srcaddr, &opt_dstaddr); + + socket[i].fd = fd; + pfd[i].fd = fd; + pfd[i].events = POLLIN; + } + + memset(&sin, 0, sizeof(sin)); + sin.sin_family = AF_INET; + sin.sin_addr = opt_dstaddr; + + gettimeofday(&next_ts, NULL); + while (1) { + struct timeval now; + struct sockaddr_in from; + socklen_t alen = sizeof(from); + long deadline; + int ret; + + /* Fast way out - if we have received all packets, bail now. + * If we're still waiting for some to come back, we need + * to do the poll() below */ + if (opt_count && recv >= opt_count) + break; + + gettimeofday(&now, NULL); + if (timercmp(&now, &next_ts, >=)) { + struct socket *sp = &socket[next]; + int err = 0; + + if (opt_count && sent >= opt_count) + break; + + timeradd(&next_ts, &opt_wait, &next_ts); + if (sendto(sp->fd, NULL, 0, 0, (struct sockaddr *) &sin, sizeof(sin))) + err = errno; + sp->sent_id = ++sent; + sp->sent_ts = now; + sp->nreplies = 0; + next = (next + 1) % NSOCKETS; + + if (err) { + static unsigned int nerrs = 0; + + report_packet(sp, NULL, NULL, err); + if (err == EINVAL && nerrs++ == 0) + printf(" Maybe your kernel does not support rds ping yet\n"); + } + } + + deadline = usec_sub(&next_ts, &now); + ret = poll(pfd, NSOCKETS, deadline / 1000); + if (ret < 0) { + if (errno == EINTR) + continue; + die_errno("poll"); + } + if (ret == 0) + continue; + + for (i = 0; i < NSOCKETS; ++i) { + struct socket *sp = &socket[i]; + + if (!(pfd[i].revents & POLLIN)) + continue; + + ret = recvfrom(sp->fd, NULL, 0, MSG_DONTWAIT, + (struct sockaddr *) &from, &alen); + gettimeofday(&now, NULL); + + if (ret < 0) { + if (errno != EAGAIN && + errno != EINTR) + report_packet(sp, &now, NULL, errno); + } else { + report_packet(sp, &now, &from.sin_addr, 0); + recv++; + } + } + } + + /* Program exit code: signal success if we received any response. */ + return recv == 0; +} + +static void +report_packet(struct socket *sp, const struct timeval *now, + const struct in_addr *from_addr, int err) +{ + printf(" %3u:", sp->sent_id); + if (now) + printf(" %ld usec", usec_sub(now, &sp->sent_ts)); + if (from_addr && from_addr->s_addr != opt_dstaddr.s_addr) + printf(" (%s)", inet_ntoa(*from_addr)); + if (sp->nreplies) + printf(" DUP!"); + if (err) + printf(" ERROR: %s", strerror(err)); + printf("\n"); + + sp->nreplies++; +} + +static int +rds_socket(struct in_addr *src, struct in_addr *dst) +{ + struct sockaddr_in sin; + int fd; + + memset(&sin, 0, sizeof(sin)); + sin.sin_family = AF_INET; + + fd = socket(PF_RDS, SOCK_SEQPACKET, 0); + if (fd < 0) + die_errno("unable to create RDS socket"); + + /* Guess the local source addr if not given. */ + if (src->s_addr == 0) { + socklen_t alen; + int ufd; + + ufd = socket(PF_INET, SOCK_DGRAM, 0); + if (ufd < 0) + die_errno("unable to create UDP socket"); + sin.sin_addr = *dst; + sin.sin_port = htons(1); + if (connect(ufd, (struct sockaddr *) &sin, sizeof(sin)) < 0) + die_errno("unable to connect to %s", + inet_ntoa(*dst)); + + alen = sizeof(sin); + if (getsockname(ufd, (struct sockaddr *) &sin, &alen) < 0) + die_errno("getsockname failed"); + + *src = sin.sin_addr; + close(ufd); + } + + sin.sin_addr = *src; + sin.sin_port = 0; + + if (bind(fd, (struct sockaddr *) &sin, sizeof(sin))) + die_errno("bind() failed"); + + return fd; +} + +static void +usage(const char *complaint) +{ + fprintf(stderr, + "%s\nUsage: rds-ping [options] dst_addr\n" + "Options:\n" + " -c count limit packet count\n" + " -I interface source IP address\n", + complaint); + exit(1); +} + +static int +parse_timeval(const char *ptr, struct timeval *ret) +{ + double seconds; + char *endptr; + + seconds = strtod(ptr, &endptr); + if (!strcmp(endptr, "ms") + || !strcmp(endptr, "msec")) { + seconds *= 1e-3; + } else + if (!strcmp(endptr, "us") + || !strcmp(endptr, "usec")) { + seconds *= 1e-6; + } else if (*endptr) + return 0; + + ret->tv_sec = (long) seconds; + seconds -= ret->tv_sec; + + ret->tv_usec = (long) (seconds * 1e6); + return 1; +} + +static int +parse_long(const char *ptr, unsigned long *ret) +{ + unsigned long long val; + char *endptr; + + val = strtoull(ptr, &endptr, 0); + switch (*endptr) { + case 'k': case 'K': + val <<= 10; + endptr++; + break; + + case 'm': case 'M': + val <<= 20; + endptr++; + break; + + case 'g': case 'G': + val <<= 30; + endptr++; + break; + } + + if (*endptr) + return 0; + + *ret = val; + return 1; +} + +static int +parse_addr(const char *ptr, struct in_addr *ret) +{ + struct hostent *hent; + + hent = gethostbyname(ptr); + if (hent && + hent->h_addrtype == AF_INET && hent->h_length == sizeof(*ret)) { + memcpy(ret, hent->h_addr, sizeof(*ret)); + return 1; + } + + return 0; +} + +/* + * This are completely stupid. options.c should be removed. + */ +void print_usage(int durr) { } +void print_version() { } -- 1.5.4.rc3 -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir at lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax From andrea at qumranet.com Thu Apr 24 08:39:43 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Thu, 24 Apr 2008 17:39:43 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080424095112.GC30298@sgi.com> References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com> <20080423163713.GC24536@duo.random> <20080423221928.GV24536@duo.random> <20080424064753.GH24536@duo.random> <20080424095112.GC30298@sgi.com> Message-ID: <20080424153943.GJ24536@duo.random> On Thu, Apr 24, 2008 at 04:51:12AM -0500, Robin Holt wrote: > It seems to me the work done by mmu_notifier_mm_destroy should really > be done inside the mm_lock()/mm_unlock area of mmu_unregister and There's no mm_lock/unlock for mmu_unregister anymore. That's the whole point of using srcu so it becomes reliable and quick. > mm_notifier_release when we have removed the last entry. That would > give the users job the same performance after they are done using the > special device that they had prior to its use. That's not feasible. Otherwise mmu_notifier_mm will go away at any time under both _release from exit_mmap and under _unregister too. exit_mmap holds an mm_count implicit, so freeing mmu_notifier_mm after the last mmdrop makes it safe. mmu_notifier_unregister also holds the mm_count because mm_count was pinned by mmu_notifier_register. That solves the issue with mmu_notifier_mm going away from under mmu_notifier_unregister and _release and that's why it can only be freed after mm_count == 0. There's at least one small issue I noticed so far, that while _release don't need to care about _register, but _unregister definitely need to care about _register. I've to take the mmap_sem in addition or in replacement of the unregister_lock. The srcu_read_lock can also likely moved just before releasing the unregister_lock but that's just a minor optimization to make the code more strict. > On Thu, Apr 24, 2008 at 08:49:40AM +0200, Andrea Arcangeli wrote: > ... > > diff --git a/mm/memory.c b/mm/memory.c > > --- a/mm/memory.c > > +++ b/mm/memory.c > ... > > @@ -603,25 +605,39 @@ > > * readonly mappings. The tradeoff is that copy_page_range is more > > * efficient than faulting. > > */ > > + ret = 0; > > if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_PFNMAP|VM_INSERTPAGE))) { > > if (!vma->anon_vma) > > - return 0; > > + goto out; > > } > > > > - if (is_vm_hugetlb_page(vma)) > > - return copy_hugetlb_page_range(dst_mm, src_mm, vma); > > + if (unlikely(is_vm_hugetlb_page(vma))) { > > + ret = copy_hugetlb_page_range(dst_mm, src_mm, vma); > > + goto out; > > + } > > > > + if (is_cow_mapping(vma->vm_flags)) > > + mmu_notifier_invalidate_range_start(src_mm, addr, end); > > + > > + ret = 0; > > I don't think this is needed. It's not needed right, but I thought it was cleaner if they all use "ret" after I had to change the code at the end of the function. Anyway I'll delete this to make the patch shorter and only change the minimum, agreed. > ... > > +/* avoid memory allocations for mm_unlock to prevent deadlock */ > > +void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data) > > +{ > > + if (mm->map_count) { > > + if (data->nr_anon_vma_locks) > > + mm_unlock_vfree(data->anon_vma_locks, > > + data->nr_anon_vma_locks); > > + if (data->i_mmap_locks) > > I think you really want data->nr_i_mmap_locks. Indeed. It never happens that there are zero vmas with filebacked mappings, this is why this couldn't be triggered in practice, thanks! > The second paragraph of this comment seems extraneous. ok removed. > > + /* > > + * Wait ->release if mmu_notifier_unregister run list_del_rcu. > > + * srcu can't go away from under us because one mm_count is > > + * hold by exit_mmap. > > + */ > > These two sentences don't make any sense to me. Well that was a short explanation of why the mmu_notifier_mm structure can only be freed after the last mmdrop, which is what you asked at the top. I'll try to rephrase. > > +void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm) > > +{ > > + int before_release = 0, srcu; > > + > > + BUG_ON(atomic_read(&mm->mm_count) <= 0); > > + > > + srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu); > > + spin_lock(&mm->mmu_notifier_mm->unregister_lock); > > + if (!hlist_unhashed(&mn->hlist)) { > > + hlist_del_rcu(&mn->hlist); > > + before_release = 1; > > + } > > + spin_unlock(&mm->mmu_notifier_mm->unregister_lock); > > + if (before_release) > > + /* > > + * exit_mmap will block in mmu_notifier_release to > > + * guarantee ->release is called before freeing the > > + * pages. > > + */ > > + mn->ops->release(mn, mm); > > I am not certain about the need to do the release callout when the driver > has already told this subsystem it is done. For XPMEM, this callout > would immediately return. I would expect it to be the same or GRU. The point is that you don't want to run it twice. And without this you will have to serialize against ->release yourself in the driver. It's much more convenient if you know that ->release will be called just once, and before mmu_notifier_unregister returns. It could be called by _release even after you're already inside _unregister, _release may reach the spinlock before _unregister, you won't notice the difference. Solving this race in the driver looked too complex, I rather solve it once inside the mmu notifier code to be sure. Missing a release event is fatal because all sptes have to be dropped before _release returns. The requirement is the same for _unregister, all sptes have to be dropped before it returns. ->release should be able to sleep as long as it wants even with only 1/N applied. exit_mmap can sleep too, no problem. You can't unregister inside ->release first of all because 'ret' instruction must be still allocated to return to mmu notifier code. From ogerlitz at voltaire.com Thu Apr 24 08:53:48 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 24 Apr 2008 18:53:48 +0300 Subject: [ofa-general] Re: [PATCH 6/8]: RDS: Use IB for loopback In-Reply-To: <200804241112.19866.okir@lst.de> References: <200804241106.57172.okir@lst.de> <200804241111.26726.okir@lst.de> <200804241111.56693.okir@lst.de> <200804241112.19866.okir@lst.de> Message-ID: <4810AD0C.9000305@voltaire.com> Olaf Kirch wrote: > In order to be able to test RDS (and RDS over RDMA) faithfully > on standalone machines, we want loopback traffic to use the IB > transport if possible. Olaf, Beyond the details of this patch, one thing which should be on the table here is the IB RC LID matching rules stating that when an RC QP is configured it is set with and if a packet is received where its LRH does not match these lids, its dropped. In other words, if through the test, the standalone machine becomes attached to IB fabric and LIDs are set to the ports by the SM, the loopback connection would get broken. Generally speaking, with RDS this should not be big deal, since it will reconnect, but I just wanted to make sure we all aware to this. Or From xavier at tddft.org Thu Apr 24 09:46:27 2008 From: xavier at tddft.org (Xavier Andrade) Date: Thu, 24 Apr 2008 18:46:27 +0200 (CEST) Subject: [ofa-general] Loading of ib_mthca fails In-Reply-To: References: Message-ID: Hi, On Wed, 23 Apr 2008, Roland Dreier wrote: > Hmm, not sure... let's see what the Mellanox guys say (they're mostly on > vacation this week so it might be a few days). > > The only things I can think of to try are: > - go to mellanox.com and get latest FW and make sure there's not > anything strange about what's on your card (but given that it is seen > by the driver, the FW must at least have a valid checksum I think) > I can't locate the correct firmware, the PSID reported by mtsflint corresponds to an Intel one: Image type: Failsafe I.S. Version: 1 Chip Revision: A0 Description: Node Port1 Sys image GUIDs: 0002c9020022baa4 0002c9020022baa5 0002c9020022baa7 Board ID: (INT0010000001) VSD: PSID: INT0010000001 But I haven't been able to find any firmware in Intel's webpage. Do you think that I could use a Mellanox firmware? Which one? There are three different ones for the MT25204. > - if you're building your own kernel, try the Debian 2.6.24 generic > amd64 image and see if that's any different, because I definitely > have mt25204 HCAs working with that. > I tried with 2.6.18 (the default etch kernel) and it gave the same problem. Finally, I upgraded the BIOS of the machine (to version 85 from 79) and the module is loaded without problems and everything works correctly. So probably it was motherboard issue. Regards, Xavier From weiny2 at llnl.gov Thu Apr 24 09:57:52 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 24 Apr 2008 09:57:52 -0700 Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup. In-Reply-To: <1209000441.689.216.camel@hrosenstock-ws.xsigo.com> References: <20080423133816.6c1b6315.weiny2@llnl.gov> <1208995514.689.210.camel@hrosenstock-ws.xsigo.com> <1209000441.689.216.camel@hrosenstock-ws.xsigo.com> Message-ID: <20080424095752.416d5d55.weiny2@llnl.gov> On Wed, 23 Apr 2008 18:27:21 -0700 Hal Rosenstock wrote: > On Wed, 2008-04-23 at 17:05 -0700, Hal Rosenstock wrote: > > On Wed, 2008-04-23 at 13:38 -0700, Ira Weiny wrote: > > > Hey all, > > > > > > > > Thoughts? > > > > Having OpenSM request client reregistration (used in other places by > > OpenSM) of such nodes will resolve this issue. As little or as much > > policy can be built into OpenSM in determining "such" nodes to scope > > down the application of this mechanism for this case. > > One side comment on the non OpenSM aspect of this: > > Why is the node temporarily unavailable ? There is a "contract" that the > node makes with the SM that it clearly isn't honoring. Is any > investigation going on relative to this aspect of the issue ? > Yes, we are working on finding the root cause. I agree that the "contract" is not being honored. This is one of the reasons I was hesitant to implement any fix to be submitted. I don't think this is truly a bug in the stack. However, I could see this causing issues for people[*] and it might be nice to have a "fix". Ira [*] Particularly those who do not have any other connection to nodes other than IB. From michael.heinz at qlogic.com Thu Apr 24 10:15:30 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Thu, 24 Apr 2008 12:15:30 -0500 Subject: [ofa-general] [PATCH 1/1] RPM Spec files Message-ID: Installation of OFED 1.3.0.0.4 onto a Kusu/OCS cluster does not fully succeed because of some missing dependencies in the RPM spec files. This is because Kusu installs nodes over a network by presenting a pool of RPMs to be installed and letting RPM figure out the order to install them in. Without the dependencies we ended up with oddities like the kernel drivers being installed before the /usr/bin directory had been populated, causing the install script to fail. I was able to work around this by manually expanding some of the source RPM files, altering the spec file and repackaging the source RPM. This allowed me to build binary RPMs (via the install script) that could be installed on a Kusu cluster. Here are the proposed changes. If there is a better/preferred way of submitting this suggestion, please let me know. --- ../../original/ib-bonding.spec 2008-04-22 12:54:12.000000000 -0400 +++ ib-bonding.spec 2008-04-22 12:43:07.000000000 -0400 @@ -20,6 +20,7 @@ Group : Applications/System License : GPL BuildRoot: %{_tmppath}/%{name}-%{version}-root +PreReq : coreutils %description This package provides a bonding device which is capable of enslaving --- ../../original/ofa_kernel.spec 2008-04-22 12:54:13.000000000 -0400 +++ ofa_kernel.spec 2008-04-22 12:45:40.000000000 -0400 @@ -111,6 +111,9 @@ BuildRequires: sysfsutils-devel %package -n kernel-ib +PreReq: coreutils +PreReq: kernel +PreReq: pciutils Version: %{_version} Release: %{krelver} Summary: Infiniband Driver and ULPs kernel modules @@ -119,6 +122,10 @@ Core, HW and ULPs kernel modules %package -n kernel-ib-devel +PreReq: coreutils +PreReq: kernel +PreReq: pciutils +Requires: kernel-ib Version: %{_version} Release: %{krelver} Summary: Infiniband Driver and ULPs kernel modules sources --- ../../original/open-iscsi-generic.spec 2008-04-22 12:54:13.000000000 -0400 +++ open-iscsi-generic.spec 2008-04-22 12:42:33.000000000 -0400 @@ -21,6 +21,7 @@ %define kversion $(uname -r | sed "s/-ppc64\|-smp//") %package -n iscsi-initiator-utils +PreReq: coreutils Summary : iSCSI daemon and utility programs Group : System Environment/Daemons %description -n iscsi-initiator-utils @@ -30,6 +31,7 @@ Protocol networks. %package -n open-iscsi +PreReq: coreutils Summary : Linux* Open-iSCSI Software Initiator Group : Productivity/Networking/Other %description -n open-iscsi -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania From andrea at qumranet.com Thu Apr 24 10:41:45 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Thu, 24 Apr 2008 19:41:45 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080424153943.GJ24536@duo.random> References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com> <20080423163713.GC24536@duo.random> <20080423221928.GV24536@duo.random> <20080424064753.GH24536@duo.random> <20080424095112.GC30298@sgi.com> <20080424153943.GJ24536@duo.random> Message-ID: <20080424174145.GM24536@duo.random> On Thu, Apr 24, 2008 at 05:39:43PM +0200, Andrea Arcangeli wrote: > There's at least one small issue I noticed so far, that while _release > don't need to care about _register, but _unregister definitely need to > care about _register. I've to take the mmap_sem in addition or in In the end the best is to use the spinlock around those list_add/list_del they all run in O(1) with the hlist and they take a few asm insn. This also avoids to take the mmap_sem in exit_mmap, at exit_mmap time nobody should need to use mmap_sem anymore, it might work but this looks cleaner. The lock is dynamically allocated only when the notifiers are registered, so the few bytes taken by it aren't relevant. A full new update will some become visible here: http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.25/mmu-notifier-v14-pre3/ Please have a close look again. Your help is extremely appreciated and very helpful as usual! Thanks a lot. diff -urN xxx/include/linux/mmu_notifier.h xx/include/linux/mmu_notifier.h --- xxx/include/linux/mmu_notifier.h 2008-04-24 19:41:15.000000000 +0200 +++ xx/include/linux/mmu_notifier.h 2008-04-24 19:38:37.000000000 +0200 @@ -15,7 +15,7 @@ struct hlist_head list; struct srcu_struct srcu; /* to serialize mmu_notifier_unregister against mmu_notifier_release */ - spinlock_t unregister_lock; + spinlock_t lock; }; struct mmu_notifier_ops { diff -urN xxx/mm/memory.c xx/mm/memory.c --- xxx/mm/memory.c 2008-04-24 19:41:15.000000000 +0200 +++ xx/mm/memory.c 2008-04-24 19:38:37.000000000 +0200 @@ -605,16 +605,13 @@ * readonly mappings. The tradeoff is that copy_page_range is more * efficient than faulting. */ - ret = 0; if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_PFNMAP|VM_INSERTPAGE))) { if (!vma->anon_vma) - goto out; + return 0; } - if (unlikely(is_vm_hugetlb_page(vma))) { - ret = copy_hugetlb_page_range(dst_mm, src_mm, vma); - goto out; - } + if (is_vm_hugetlb_page(vma)) + return copy_hugetlb_page_range(dst_mm, src_mm, vma); if (is_cow_mapping(vma->vm_flags)) mmu_notifier_invalidate_range_start(src_mm, addr, end); @@ -636,7 +633,6 @@ if (is_cow_mapping(vma->vm_flags)) mmu_notifier_invalidate_range_end(src_mm, vma->vm_start, end); -out: return ret; } diff -urN xxx/mm/mmap.c xx/mm/mmap.c --- xxx/mm/mmap.c 2008-04-24 19:41:15.000000000 +0200 +++ xx/mm/mmap.c 2008-04-24 19:38:37.000000000 +0200 @@ -2381,7 +2381,7 @@ if (data->nr_anon_vma_locks) mm_unlock_vfree(data->anon_vma_locks, data->nr_anon_vma_locks); - if (data->i_mmap_locks) + if (data->nr_i_mmap_locks) mm_unlock_vfree(data->i_mmap_locks, data->nr_i_mmap_locks); } diff -urN xxx/mm/mmu_notifier.c xx/mm/mmu_notifier.c --- xxx/mm/mmu_notifier.c 2008-04-24 19:41:15.000000000 +0200 +++ xx/mm/mmu_notifier.c 2008-04-24 19:31:23.000000000 +0200 @@ -24,22 +24,16 @@ * zero). All other tasks of this mm already quit so they can't invoke * mmu notifiers anymore. This can run concurrently only against * mmu_notifier_unregister and it serializes against it with the - * unregister_lock in addition to RCU. struct mmu_notifier_mm can't go - * away from under us as the exit_mmap holds a mm_count pin itself. - * - * The ->release method can't allow the module to be unloaded, the - * module can only be unloaded after mmu_notifier_unregister run. This - * is because the release method has to run the ret instruction to - * return back here, and so it can't allow the ret instruction to be - * freed. + * mmu_notifier_mm->lock in addition to RCU. struct mmu_notifier_mm + * can't go away from under us as exit_mmap holds a mm_count pin + * itself. */ void __mmu_notifier_release(struct mm_struct *mm) { struct mmu_notifier *mn; int srcu; - srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu); - spin_lock(&mm->mmu_notifier_mm->unregister_lock); + spin_lock(&mm->mmu_notifier_mm->lock); while (unlikely(!hlist_empty(&mm->mmu_notifier_mm->list))) { mn = hlist_entry(mm->mmu_notifier_mm->list.first, struct mmu_notifier, @@ -52,23 +46,28 @@ */ hlist_del_init(&mn->hlist); /* + * SRCU here will block mmu_notifier_unregister until + * ->release returns. + */ + srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu); + spin_unlock(&mm->mmu_notifier_mm->lock); + /* * if ->release runs before mmu_notifier_unregister it * must be handled as it's the only way for the driver - * to flush all existing sptes before the pages in the - * mm are freed. + * to flush all existing sptes and stop the driver + * from establishing any more sptes before all the + * pages in the mm are freed. */ - spin_unlock(&mm->mmu_notifier_mm->unregister_lock); - /* SRCU will block mmu_notifier_unregister */ mn->ops->release(mn, mm); - spin_lock(&mm->mmu_notifier_mm->unregister_lock); + srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu); + spin_lock(&mm->mmu_notifier_mm->lock); } - spin_unlock(&mm->mmu_notifier_mm->unregister_lock); - srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu); + spin_unlock(&mm->mmu_notifier_mm->lock); /* - * Wait ->release if mmu_notifier_unregister run list_del_rcu. - * srcu can't go away from under us because one mm_count is - * hold by exit_mmap. + * Wait ->release if mmu_notifier_unregister is running it. + * The mmu_notifier_mm can't go away from under us because one + * mm_count is hold by exit_mmap. */ synchronize_srcu(&mm->mmu_notifier_mm->srcu); } @@ -177,11 +176,19 @@ goto out_unlock; } INIT_HLIST_HEAD(&mm->mmu_notifier_mm->list); - spin_lock_init(&mm->mmu_notifier_mm->unregister_lock); + spin_lock_init(&mm->mmu_notifier_mm->lock); } atomic_inc(&mm->mm_count); + /* + * Serialize the update against mmu_notifier_unregister. A + * side note: mmu_notifier_release can't run concurrently with + * us because we hold the mm_users pin (either implicitly as + * current->mm or explicitly with get_task_mm() or similar). + */ + spin_lock(&mm->mmu_notifier_mm->lock); hlist_add_head_rcu(&mn->hlist, &mm->mmu_notifier_mm->list); + spin_unlock(&mm->mmu_notifier_mm->lock); out_unlock: mm_unlock(mm, &data); out: @@ -215,23 +222,32 @@ BUG_ON(atomic_read(&mm->mm_count) <= 0); - srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu); - spin_lock(&mm->mmu_notifier_mm->unregister_lock); + spin_lock(&mm->mmu_notifier_mm->lock); if (!hlist_unhashed(&mn->hlist)) { hlist_del_rcu(&mn->hlist); before_release = 1; } - spin_unlock(&mm->mmu_notifier_mm->unregister_lock); if (before_release) /* + * SRCU here will force exit_mmap to wait ->release to finish + * before freeing the pages. + */ + srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu); + spin_unlock(&mm->mmu_notifier_mm->lock); + if (before_release) { + /* * exit_mmap will block in mmu_notifier_release to * guarantee ->release is called before freeing the * pages. */ mn->ops->release(mn, mm); - srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu); + srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu); + } - /* wait any running method to finish, including ->release */ + /* + * Wait any running method to finish, of course including + * ->release if it was run by mmu_notifier_relase instead of us. + */ synchronize_srcu(&mm->mmu_notifier_mm->srcu); BUG_ON(atomic_read(&mm->mm_count) <= 0); From hrosenstock at xsigo.com Thu Apr 24 12:07:03 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Thu, 24 Apr 2008 12:07:03 -0700 Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup. In-Reply-To: <20080424095752.416d5d55.weiny2@llnl.gov> References: <20080423133816.6c1b6315.weiny2@llnl.gov> <1208995514.689.210.camel@hrosenstock-ws.xsigo.com> <1209000441.689.216.camel@hrosenstock-ws.xsigo.com> <20080424095752.416d5d55.weiny2@llnl.gov> Message-ID: <1209064023.689.249.camel@hrosenstock-ws.xsigo.com> On Thu, 2008-04-24 at 09:57 -0700, Ira Weiny wrote: > > One side comment on the non OpenSM aspect of this: > > > > Why is the node temporarily unavailable ? There is a "contract" that the > > node makes with the SM that it clearly isn't honoring. Is any > > investigation going on relative to this aspect of the issue ? > > > > Yes, we are working on finding the root cause. I agree that the "contract" is > not being honored. This is one of the reasons I was hesitant to implement any > fix to be submitted. I think the two issues can be tackled in parallel. > I don't think this is truly a bug in the stack. Any ideas on what it is ? If not, would you be willing to try something assuming the end node issue is easily reproducible ? > However, I could see this causing issues for people[*] and it might be nice to > have a "fix". Sure; both are issues which should be understood better and fixed IMO. -- Hal > Ira > > [*] Particularly those who do not have any other connection to nodes other than > IB. From Brian.Murrell at Sun.COM Thu Apr 24 12:28:00 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Thu, 24 Apr 2008 15:28:00 -0400 Subject: [ofa-general] kernel-ib on rhel5 Message-ID: <1209065280.18036.216.camel@pc.ilinx> I wonder, what is the strategy for kernel-ib to exist on a machine with the standard RHEL5 kernel installed. The standard RHEL5 kernel of course includes an OFED release and as such modules of the same name as the OFED ones. I can see that by default, the ofa_kernel.spec installs it's modules into /lib/modules/%{KVERSION}/updates but how does that insure than when a kernel module is loaded with modprobe that the one in /lib/modules/%{KVERSION}/updates will be preferred over the one in lib/modules/%{KVERSION}/ (i.e. provided by the RHEL5 kernel RPM)? Thanx, b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From swise at opengridcomputing.com Thu Apr 24 12:57:30 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 24 Apr 2008 14:57:30 -0500 Subject: [ofa-general] dapl bug? Message-ID: <4810E62A.6070807@opengridcomputing.com> Hey Arlin, Have you ever seen this? I hit this 100% of the time trying the 1.2 version of dapltest on an ofed-1.3 system. The debug info below was obtained by builting the src rpm with debug enabled... > (gdb) r -T T -d -s vic11-10g -D chelsio -i 10 client SR 256 server SR > 256 client SR 256 server SR 256 > Starting program: /usr/bin/dapltest -T T -d -s vic11-10g -D chelsio -i > 10 client SR 256 server SR 256 client SR 256 server SR 256 > [Thread debugging using libthread_db enabled] > [New Thread 46912498371600 (LWP 6654)] > ------------------------------------- > TransCmd.server_name : vic11-10g > TransCmd.num_iterations : 10 > TransCmd.num_threads : 1 > TransCmd.eps_per_thread : 1 > TransCmd.validate : 0 > TransCmd.dapl_name : chelsio > TransCmd.num_ops : 4 > TransCmd.op[0].transfer_type : SEND_RECV (client) > TransCmd.op[0].seg_size : 256 > TransCmd.op[0].num_segs : 1 > TransCmd.op[0].reap_send_on_recv : 0 > TransCmd.op[1].transfer_type : SEND_RECV (server) > TransCmd.op[1].seg_size : 256 > TransCmd.op[1].num_segs : 1 > TransCmd.op[1].reap_send_on_recv : 0 > TransCmd.op[2].transfer_type : SEND_RECV (client) > TransCmd.op[2].seg_size : 256 > TransCmd.op[2].num_segs : 1 > TransCmd.op[2].reap_send_on_recv : 0 > TransCmd.op[3].transfer_type : SEND_RECV (server) > TransCmd.op[3].seg_size : 256 > TransCmd.op[3].num_segs : 1 > TransCmd.op[3].reap_send_on_recv : 0 > Server Name: vic11-10g > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 46912498371600 (LWP 6654)] > 0x00000032f04760b0 in strlen () from /lib64/libc.so.6 > (gdb) bt > #0 0x00000032f04760b0 in strlen () from /lib64/libc.so.6 > #1 0x00000032f044602b in vfprintf () from /lib64/libc.so.6 > #2 0x00000032f044bdea in printf () from /lib64/libc.so.6 > #3 0x0000000000403900 in DT_NetAddrLookupHostAddress > (to_netaddr=0x7e16f88, hostname=0x7e1658c "vic11-10g") at > cmd/dapl_netaddr.c:136 > #4 0x00000000004026cb in DT_Params_Parse (argc=, > argv=, params_ptr=0x7e16580) at cmd/dapl_params.c:205 > #5 0x000000000040211f in dapltest (argc=22, argv=0x7fff48e9b5f8) at > cmd/dapl_main.c:88 > #6 0x00000032f041d8a4 in __libc_start_main () from /lib64/libc.so.6 > #7 0x0000000000401f59 in _start () > (gdb) Its hurling in DT_Mdep_printf() here: > 134 /* Pull out IP address and print it as a sanity check */ > 135 DT_Mdep_printf ("Server Name: %s \n", hostname); > 136 DT_Mdep_printf ("Server Net Address: %s\n", > 137 inet_ntoa(((struct sockaddr_in > *)target->ai_addr)->sin_addr)); The ai_addr looks ok though: > (gdb) p/x *((struct sockaddr_in *)target->ai_addr) > $3 = {sin_family = 0x2, sin_port = 0x0, sin_addr = {s_addr = > 0x8846a8c0}, sin_zero = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}} > (gdb) > Ever seen this? Steve. From Jeffrey.C.Becker at nasa.gov Thu Apr 24 13:00:24 2008 From: Jeffrey.C.Becker at nasa.gov (Jeff Becker) Date: Thu, 24 Apr 2008 13:00:24 -0700 Subject: [ofa-general] Re: FW: [ewg] SPAM emails In-Reply-To: <55CE0347B98FCA468923E5FBC25CB4DC036FCB37@orsmsx413.amr.corp.intel.com> References: <55CE0347B98FCA468923E5FBC25CB4DC036FCB37@orsmsx413.amr.corp.intel.com> Message-ID: <4810E6D8.5020708@nasa.gov> I did see it, and have also been unhappy about the recent increase in spam. We do run spamassassin and amavis (for virus checking) on the server. However, configuring these is an ongoing project, as the attackers figure out how to get around whatever rules we do have. It's also somewhat hit and miss, e.g., sometimes, I'll clamp down the rules and then perfectly valid and important posts get blocked. I'm happy to look at this again, but it might be useful to inquire at John Companies if they can supply a hardware firewall such as Barracuda. I think that's what NASA uses, as I see these spams in my quarantine daily. -jeff Ryan, Jim wrote: > > Not sure you would have seen this. The amount of spam has increased > dramatically. I wasn’t aware of viruses, but will trust HB’s judgment > on that. I know I get called on frequently to block email that’s spam. > I get several such requests per day > > > > Thanks, Jim > > > > ------------------------------------------------------------------------ > > *From:* ewg-bounces at lists.openfabrics.org > [mailto:ewg-bounces at lists.openfabrics.org] *On Behalf Of *Head Bubba > *Sent:* Thursday, April 24, 2008 10:37 AM > *To:* ewg at lists.openfabrics.org > *Subject:* [ewg] SPAM emails > > > > can we use a SPAM filter (two of the SPAMs, so far, had links to a > known virus that our internal email filtering caught, and 1 last week > was a virus that our internal email filtering also caught)- now that > virus are starting to be sent, can something be done before some one > puts in a legitimate subject line and sends a virus ? > > > > h.b. > > > > ============================================================================== > Please access the attached hyperlink for an important electronic communications disclaimer: > > http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html > ============================================================================== From PHF at zurich.ibm.com Thu Apr 24 13:05:02 2008 From: PHF at zurich.ibm.com (Philip Frey1) Date: Thu, 24 Apr 2008 22:05:02 +0200 Subject: [ofa-general] AE kernel messages decrypted (Chelsio RNIC T3) Message-ID: Hi, I sometimes see async events reported through /var/log/messages of the form post_qp_event - AE qpid 0x240 opcode 0 status 0x6 type 0 wrid.hi 0xff650000 wrid.lo 0x0 and alike. I am now looking for a more meaningful explanation of what is going wrong. After some grepping I ended up in ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch_ev.c where this messages is written to the log. Since there is still no explanation of what status 0x6 means, I continued my search and found cxio_wr.h. Can you point out to me what enums are printed here? Talking about async events: What would be the recommended way of surfacing those AEs at the user application? Many thanks and best regards, Philip -------------- next part -------------- An HTML attachment was scrubbed... URL: From arlin.r.davis at intel.com Thu Apr 24 13:21:48 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Thu, 24 Apr 2008 13:21:48 -0700 Subject: [ofa-general] RE: dapl bug [PATCH] dapltest: include definitions for inet_ntoa. In-Reply-To: <4810E62A.6070807@opengridcomputing.com> References: <4810E62A.6070807@opengridcomputing.com> Message-ID: Steve, Sorry, this was fixed in v2.0 library but apparently it didn't get pushed back v1.2. dapltest: include definitions for inet_ntoa. At load time the symbol was resolved but with the default definition of int, instead of char*, it caused segfault. Add correct include files in dapl_mdep_user.h for linux. Signed-off by: Arlin Davis --- test/dapltest/mdep/linux/dapl_mdep_user.h | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/test/dapltest/mdep/linux/dapl_mdep_user.h b/test/dapltest/mdep/linux/dapl_mdep_user.h index 7fadbea..16170a7 100755 --- a/test/dapltest/mdep/linux/dapl_mdep_user.h +++ b/test/dapltest/mdep/linux/dapl_mdep_user.h @@ -43,6 +43,11 @@ #include #include +/* inet_ntoa */ +#include +#include +#include + /* Default Device Name */ #define DT_MdepDeviceName "OpenIB-cma" -- 1.5.2.5 >-----Original Message----- >From: Steve Wise [mailto:swise at opengridcomputing.com] >Sent: Thursday, April 24, 2008 12:58 PM >To: Arlin Davis >Cc: OpenFabrics General >Subject: dapl bug? > >Hey Arlin, > >Have you ever seen this? I hit this 100% of the time trying the 1.2 >version of dapltest on an ofed-1.3 system. The debug info below was >obtained by builting the src rpm with debug enabled... > >> (gdb) r -T T -d -s vic11-10g -D chelsio -i 10 client SR 256 >server SR >> 256 client SR 256 server SR 256 >> Starting program: /usr/bin/dapltest -T T -d -s vic11-10g -D >chelsio -i >> 10 client SR 256 server SR 256 client SR 256 server SR 256 >> [Thread debugging using libthread_db enabled] >> [New Thread 46912498371600 (LWP 6654)] >> ------------------------------------- >> TransCmd.server_name : vic11-10g >> TransCmd.num_iterations : 10 >> TransCmd.num_threads : 1 >> TransCmd.eps_per_thread : 1 >> TransCmd.validate : 0 >> TransCmd.dapl_name : chelsio >> TransCmd.num_ops : 4 >> TransCmd.op[0].transfer_type : SEND_RECV (client) >> TransCmd.op[0].seg_size : 256 >> TransCmd.op[0].num_segs : 1 >> TransCmd.op[0].reap_send_on_recv : 0 >> TransCmd.op[1].transfer_type : SEND_RECV (server) >> TransCmd.op[1].seg_size : 256 >> TransCmd.op[1].num_segs : 1 >> TransCmd.op[1].reap_send_on_recv : 0 >> TransCmd.op[2].transfer_type : SEND_RECV (client) >> TransCmd.op[2].seg_size : 256 >> TransCmd.op[2].num_segs : 1 >> TransCmd.op[2].reap_send_on_recv : 0 >> TransCmd.op[3].transfer_type : SEND_RECV (server) >> TransCmd.op[3].seg_size : 256 >> TransCmd.op[3].num_segs : 1 >> TransCmd.op[3].reap_send_on_recv : 0 >> Server Name: vic11-10g >> >> Program received signal SIGSEGV, Segmentation fault. >> [Switching to Thread 46912498371600 (LWP 6654)] >> 0x00000032f04760b0 in strlen () from /lib64/libc.so.6 >> (gdb) bt >> #0 0x00000032f04760b0 in strlen () from /lib64/libc.so.6 >> #1 0x00000032f044602b in vfprintf () from /lib64/libc.so.6 >> #2 0x00000032f044bdea in printf () from /lib64/libc.so.6 >> #3 0x0000000000403900 in DT_NetAddrLookupHostAddress >> (to_netaddr=0x7e16f88, hostname=0x7e1658c "vic11-10g") at >> cmd/dapl_netaddr.c:136 >> #4 0x00000000004026cb in DT_Params_Parse (argc=optimized out>, >> argv=, params_ptr=0x7e16580) at >cmd/dapl_params.c:205 >> #5 0x000000000040211f in dapltest (argc=22, argv=0x7fff48e9b5f8) at >> cmd/dapl_main.c:88 >> #6 0x00000032f041d8a4 in __libc_start_main () from /lib64/libc.so.6 >> #7 0x0000000000401f59 in _start () >> (gdb) > >Its hurling in DT_Mdep_printf() here: > >> 134 /* Pull out IP address and print it as a sanity check */ >> 135 DT_Mdep_printf ("Server Name: %s \n", hostname); >> 136 DT_Mdep_printf ("Server Net Address: %s\n", >> 137 inet_ntoa(((struct sockaddr_in >> *)target->ai_addr)->sin_addr)); > >The ai_addr looks ok though: >> (gdb) p/x *((struct sockaddr_in *)target->ai_addr) >> $3 = {sin_family = 0x2, sin_port = 0x0, sin_addr = {s_addr = >> 0x8846a8c0}, sin_zero = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}} >> (gdb) >> > >Ever seen this? > >Steve. > From hrosenstock at xsigo.com Thu Apr 24 14:02:59 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Thu, 24 Apr 2008 14:02:59 -0700 Subject: [ofa-general] kernel-ib on rhel5 In-Reply-To: <1209065280.18036.216.camel@pc.ilinx> References: <1209065280.18036.216.camel@pc.ilinx> Message-ID: <1209070979.689.258.camel@hrosenstock-ws.xsigo.com> On Thu, 2008-04-24 at 15:28 -0400, Brian J. Murrell wrote: > I wonder, what is the strategy for kernel-ib to exist on a machine with > the standard RHEL5 kernel installed. The standard RHEL5 kernel of > course includes an OFED release and as such modules of the same name as > the OFED ones. > > I can see that by default, the ofa_kernel.spec installs it's modules > into /lib/modules/%{KVERSION}/updates but how does that insure than when > a kernel module is loaded with modprobe that the one > in /lib/modules/%{KVERSION}/updates will be preferred over the one in > lib/modules/%{KVERSION}/ (i.e. provided by the RHEL5 kernel RPM)? module-init-tools and modutils have supported this precedence for some time now. For modutils, see: https://rhn.redhat.com/errata/RHBA-2003-327.html -- Hal > Thanx, > b. > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From weiny2 at llnl.gov Thu Apr 24 14:31:25 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 24 Apr 2008 14:31:25 -0700 Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup. In-Reply-To: <48109087.6030606@voltaire.com> References: <20080423133816.6c1b6315.weiny2@llnl.gov> <48109087.6030606@voltaire.com> Message-ID: <20080424143125.2aad1db8.weiny2@llnl.gov> On Thu, 24 Apr 2008 16:52:07 +0300 Or Gerlitz wrote: > Ira Weiny wrote: > > The symptom is that nodes drop out of the IPoIB mcast group after a node > > temporarily goes catatonic. The details are: > > > > 1) Issues on a node cause a soft lockup of the node. > > 2) OpenSM does a normal light sweep. > > 3) MADs to the node time out since the node is in a "bad state" > > 4) OpenSM marks the node down and drops it from internal tables, including > > mcast groups. > > 5) Node recovers from soft lock up condition. > > 6) A subsequent sweep causes OpenSM see the node and add it back to the > > fabric. > As Hal noted, client reregister is the way to go. > > In a similar discussion in the past the conclusion was that the SM > should (maybe even according to the spec, but according to common sense > is fine as well, I think) set the re-register bit where in that case > IPoIB rejoins and we are done. At the time, I understood that openSM > would do so > (http://lists.openfabrics.org/pipermail/general/2007-September/041237.html), > am I wrong, or maybe the case brought on that thread (switch/port going > down and a whole sub fabric is removed from the SM point of view where > the links remain up from the view point of the nodes) was different? the > basic point is a case where a node link is UP and the SM lost this node > for some time and now sees it again. We used to call it "the > active/active" transition and an SM maybe need special logic for it. > I have set up the following as a test situation switch B / \ (link X) switch A switch C / / \ Node1 node2 node3 (SM) When I down link X and re-enable it node 2 and 3 do _not_ rejoin the mcast group. Debug output from OpenSM indicates it is setting the rereg bit but I don't see the rejoin in the debug output from the node 2's IPoIB mcast layer. Perhaps there is a bug to be squashed here? Just in case anyone is curious, this is with OFED 1.2.5 on a RHEL 5.1 based kernel, and OpenSM 3.2.1-8341058-dirty. I am in the process of tracking this down, Ira From swise at opengridcomputing.com Thu Apr 24 14:32:45 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 24 Apr 2008 16:32:45 -0500 Subject: [ofa-general] Re: dapl bug [PATCH] dapltest: include definitions for inet_ntoa. In-Reply-To: References: <4810E62A.6070807@opengridcomputing.com> Message-ID: <4810FC7D.7040005@opengridcomputing.com> Davis, Arlin R wrote: > Steve, > > Sorry, this was fixed in v2.0 library but apparently it didn't get > pushed back v1.2. > > No worries. Glad you already had seen it. But should I really be using dapltest on 1.2? IE is it used by folks to regression test udapl and their provider? Steve. From weiny2 at llnl.gov Thu Apr 24 14:35:55 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 24 Apr 2008 14:35:55 -0700 Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup. In-Reply-To: <1209064023.689.249.camel@hrosenstock-ws.xsigo.com> References: <20080423133816.6c1b6315.weiny2@llnl.gov> <1208995514.689.210.camel@hrosenstock-ws.xsigo.com> <1209000441.689.216.camel@hrosenstock-ws.xsigo.com> <20080424095752.416d5d55.weiny2@llnl.gov> <1209064023.689.249.camel@hrosenstock-ws.xsigo.com> Message-ID: <20080424143555.1daf93fe.weiny2@llnl.gov> On Thu, 24 Apr 2008 12:07:03 -0700 Hal Rosenstock wrote: > On Thu, 2008-04-24 at 09:57 -0700, Ira Weiny wrote: > > > > One side comment on the non OpenSM aspect of this: > > > > > > Why is the node temporarily unavailable ? There is a "contract" that the > > > node makes with the SM that it clearly isn't honoring. Is any > > > investigation going on relative to this aspect of the issue ? > > > > > > > Yes, we are working on finding the root cause. I agree that the "contract" is > > not being honored. This is one of the reasons I was hesitant to implement any > > fix to be submitted. > > I think the two issues can be tackled in parallel. > > > I don't think this is truly a bug in the stack. > > Any ideas on what it is ? If not, would you be willing to try something > assuming the end node issue is easily reproducible ? The root cause is something to do with a users job causing this "soft lockup" in the kernel. We believe sometimes they will run the node (diskless/no swap) out of memory. Under the OOM condition I don't think the node can be trusted. Unfortunately, this is another case where we can't seem to reproduce the issue without the users job. :-( As per a previous email I was excited about Or mentioning perhaps another way to simulate this condition on the IB side. I have set that up and see some issues there. I will see what I can find. > > > However, I could see this causing issues for people[*] and it might be nice to > > have a "fix". > > Sure; both are issues which should be understood better and fixed IMO. Agreed, I have spoken with our other developer and he is still trying to get a reproducer. Ira From arlin.r.davis at intel.com Thu Apr 24 15:22:25 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Thu, 24 Apr 2008 15:22:25 -0700 Subject: [ofa-general] RE: dapl bug [PATCH] dapltest: include definitions for inet_ntoa. In-Reply-To: <4810FC7D.7040005@opengridcomputing.com> References: <4810E62A.6070807@opengridcomputing.com> <4810FC7D.7040005@opengridcomputing.com> Message-ID: >But should I really be using dapltest on 1.2? IE is it used >by folks to >regression test udapl and their provider? Most MPI vendors are still running on top of uDAPL 1.2 so you should continue to regression test using dapltest 1.2 for now. From or.gerlitz at gmail.com Thu Apr 24 15:23:56 2008 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Fri, 25 Apr 2008 01:23:56 +0300 Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup. In-Reply-To: <20080424143125.2aad1db8.weiny2@llnl.gov> References: <20080423133816.6c1b6315.weiny2@llnl.gov> <48109087.6030606@voltaire.com> <20080424143125.2aad1db8.weiny2@llnl.gov> Message-ID: <15ddcffd0804241523p19559580vc3a1293c1fe097b1@mail.gmail.com> On 4/25/08, Ira Weiny wrote: > > When I down link X and re-enable it node 2 and 3 do _not_ rejoin the mcast > group. bad! Just in case anyone is curious, this is with OFED 1.2.5 on a RHEL 5.1 based > kernel, and OpenSM 3.2.1-8341058-dirty. and what is the hca device and fw version at the nodes? maybe you send the list ipoib (debug_level=1 && multicast_debug_level=1) debug output? Or. -------------- next part -------------- An HTML attachment was scrubbed... URL: From weiny2 at llnl.gov Thu Apr 24 18:16:57 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 24 Apr 2008 18:16:57 -0700 Subject: [PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting rereg bit (Was: Re: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup.) In-Reply-To: <15ddcffd0804241523p19559580vc3a1293c1fe097b1@mail.gmail.com> References: <20080423133816.6c1b6315.weiny2@llnl.gov> <48109087.6030606@voltaire.com> <20080424143125.2aad1db8.weiny2@llnl.gov> <15ddcffd0804241523p19559580vc3a1293c1fe097b1@mail.gmail.com> Message-ID: <20080424181657.28d58a29.weiny2@llnl.gov> On Fri, 25 Apr 2008 01:23:56 +0300 "Or Gerlitz" wrote: > On 4/25/08, Ira Weiny wrote: > > > > When I down link X and re-enable it node 2 and 3 do _not_ rejoin the mcast > > group. > > > bad! > > Just in case anyone is curious, this is with OFED 1.2.5 on a RHEL 5.1 based > > kernel, and OpenSM 3.2.1-8341058-dirty. > > > and what is the hca device and fw version at the nodes? maybe you send the > list ipoib (debug_level=1 && multicast_debug_level=1) debug output? > I did not get any output with multicast_debug_level! But I added some more debugging and finally realized that the set was not being sent. :-( I put a debug statement in OpenSM where the flag was set and therefore thought that OpenSM had set the rereg bit. However, since no other data had changed the "set" MAD was not sent. (I am getting a bit tongue tied reading this back. I hope that all makes sense.) Here is a patch which fixes the problem. (At least with the partial sub-nets configuration I explained before.) I will have to verify this fixes the problem I originally reported. Ira >From 2e5511d6daf9c586c39698416e4bd36e24b13e62 Mon Sep 17 00:00:00 2001 From: Ira K. Weiny Date: Thu, 24 Apr 2008 18:05:01 -0700 Subject: [PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting rereg bit Signed-off-by: Ira K. Weiny --- opensm/opensm/osm_lid_mgr.c | 9 +++++++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/opensm/opensm/osm_lid_mgr.c b/opensm/opensm/osm_lid_mgr.c index ab23929..4d628d2 100644 --- a/opensm/opensm/osm_lid_mgr.c +++ b/opensm/opensm/osm_lid_mgr.c @@ -1099,9 +1099,14 @@ __osm_lid_mgr_set_physp_pi(IN osm_lid_mgr_t * const p_mgr, if ((p_mgr->p_subn->first_time_master_sweep == TRUE || p_port->is_new) && !p_mgr->p_subn->opt.no_clients_rereg && ((p_old_pi->capability_mask & IB_PORT_CAP_HAS_CLIENT_REREG) != - 0)) + 0)) { + OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG, + "Seting client rereg on %s, port %d\n", + p_port->p_node->print_desc, + p_port->p_physp->port_num); ib_port_info_set_client_rereg(p_pi, 1); - else + send_set = TRUE; + } else ib_port_info_set_client_rereg(p_pi, 0); /* We need to send the PortInfo Set request with the new sm_lid -- 1.5.1 From a-b-j at adaptive-eyecare.com Fri Apr 25 04:31:46 2008 From: a-b-j at adaptive-eyecare.com (Maryanne Krause) Date: Fri, 25 Apr 2008 12:31:46 +0100 Subject: [ofa-general] good luck Message-ID: <01c8a6d0$533d6500$ac43214f@a-b-j> Hello! I am tired this evening. I am nice girl that would like to chat with you. Email me at Erika at nextnoungen.cn only, because I am using my friend's email to write this. I will reply with my pics From andrea at qumranet.com Fri Apr 25 09:56:40 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Fri, 25 Apr 2008 18:56:40 +0200 Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen In-Reply-To: <200804221506.26226.rusty@rustcorp.com.au> References: <200804221506.26226.rusty@rustcorp.com.au> Message-ID: <20080425165639.GA23300@duo.random> I somehow lost missed this email in my inbox, found it now because it was strangely still unread... Sorry for the late reply! On Tue, Apr 22, 2008 at 03:06:24PM +1000, Rusty Russell wrote: > On Wednesday 09 April 2008 01:44:04 Andrea Arcangeli wrote: > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -1050,6 +1050,15 @@ > > unsigned long addr, unsigned long len, > > unsigned long flags, struct page **pages); > > > > +struct mm_lock_data { > > + spinlock_t **i_mmap_locks; > > + spinlock_t **anon_vma_locks; > > + unsigned long nr_i_mmap_locks; > > + unsigned long nr_anon_vma_locks; > > +}; > > +extern struct mm_lock_data *mm_lock(struct mm_struct * mm); > > +extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data); > > As far as I can tell you don't actually need to expose this struct at all? Yes, it should be possible to only expose 'struct mm_lock_data;'. > > + data->i_mmap_locks = vmalloc(nr_i_mmap_locks * > > + sizeof(spinlock_t)); > > This is why non-typesafe allocators suck. You want 'sizeof(spinlock_t *)' > here. > > > + data->anon_vma_locks = vmalloc(nr_anon_vma_locks * > > + sizeof(spinlock_t)); > > and here. Great catch! (it was temporarily wasting some ram which isn't nice at all) > > + err = -EINTR; > > + i_mmap_lock_last = NULL; > > + nr_i_mmap_locks = 0; > > + for (;;) { > > + spinlock_t *i_mmap_lock = (spinlock_t *) -1UL; > > + for (vma = mm->mmap; vma; vma = vma->vm_next) { > ... > > + data->i_mmap_locks[nr_i_mmap_locks++] = i_mmap_lock; > > + } > > + data->nr_i_mmap_locks = nr_i_mmap_locks; > > How about you track your running counter in data->nr_i_mmap_locks, leave > nr_i_mmap_locks alone, and BUG_ON(data->nr_i_mmap_locks != nr_i_mmap_locks)? > > Even nicer would be to wrap this in a "get_sorted_mmap_locks()" function. I'll try to clean this up further and I'll make a further update for review. > Unfortunately, I just don't think we can fail locking like this. In your next > patch unregistering a notifier can fail because of it: that not usable. Fortunately I figured out we don't really need mm_lock in unregister because it's ok to unregister in the middle of the range_begin/end critical section (that's definitely not ok for register that's why register needs mm_lock). And it's perfectly ok to fail in register(). Also it wasn't ok to unpin the module count in ->release as ->release needs to 'ret' to get back to the mmu notifier code. And without any unregister at all, the module can't be unloaded at all which is quite unacceptable... The logic is to prevent mmu_notifier_register to race with mmu_notifier_release because it takes the mm_users pin (implicit or explicit, and then mmput just after mmu_notifier_register returns). Then _register serializes against all the mmu notifier methods (except ->release) with srcu (->release can't run thanks to the mm_users pin). The mmu_notifier_mm->lock then serializes the modification on the list (register vs unregister) and it ensures one and only one between _unregister and _releases calls ->release before _unregister returns. All other methods runs freely with srcu. Having the guarante that ->release is called just before all pages are freed or inside _unregister, allows the module to zap and freeze its secondary mmu inside ->release with the race condition of exit() against mmu_notifier_unregister internally by the mmu notifier code and without dependency on exit_files/exit_mm ordering depending if the fd of the driver is open the filetables or in the vma only. The mmu_notifier_mm can be reset to 0 only after the last mmdrop. About the mm_count refcounting for _release and _unregiste: no mmu notifier and not even mmu_notifier_unregister and _release can cope with mmu_notfier_mm list and srcu structures going away out of order. exit_mmap is safe as it holds an mm_count implicitly because mmdrop is run after exit_mmap returns. mmu_notifier_unregister is safe too as _register takes the mm_count pin. We can't prevent mmu_notifer_mm to go away with mm_users as that will screwup the vma filedescriptor closure that only happens inside exit_mmap (mm_users pinned prevents exit_mmap to run, and it can only be taken temporarily until _register returns). From andrea at qumranet.com Fri Apr 25 10:04:25 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Fri, 25 Apr 2008 19:04:25 +0200 Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen In-Reply-To: <20080425165639.GA23300@duo.random> References: <200804221506.26226.rusty@rustcorp.com.au> <20080425165639.GA23300@duo.random> Message-ID: <20080425170425.GB23300@duo.random> On Fri, Apr 25, 2008 at 06:56:39PM +0200, Andrea Arcangeli wrote: > > > + data->i_mmap_locks = vmalloc(nr_i_mmap_locks * > > > + sizeof(spinlock_t)); > > > > This is why non-typesafe allocators suck. You want 'sizeof(spinlock_t *)' > > here. > > > > > + data->anon_vma_locks = vmalloc(nr_anon_vma_locks * > > > + sizeof(spinlock_t)); > > > > and here. > > Great catch! (it was temporarily wasting some ram which isn't nice at all) As I went into the editor I just found the above already fixed in #v14-pre3. And I can't move the structure into the file anymore without kmallocing it. Exposing that structure avoids the ERR_PTR/PTR_ERR on the retvals and one kmalloc so I think it makes the code simpler in the end to keep it as it is now. I'd rather avoid further changes to the 1/N patch, as long as they don't make any difference at runtime and as long as they involve more than cut-and-pasting a structure from .h to .c file. From holt at sgi.com Fri Apr 25 12:25:32 2008 From: holt at sgi.com (Robin Holt) Date: Fri, 25 Apr 2008 14:25:32 -0500 Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen In-Reply-To: <20080425165639.GA23300@duo.random> References: <200804221506.26226.rusty@rustcorp.com.au> <20080425165639.GA23300@duo.random> Message-ID: <20080425192532.GA19717@sgi.com> On Fri, Apr 25, 2008 at 06:56:40PM +0200, Andrea Arcangeli wrote: > Fortunately I figured out we don't really need mm_lock in unregister > because it's ok to unregister in the middle of the range_begin/end > critical section (that's definitely not ok for register that's why > register needs mm_lock). And it's perfectly ok to fail in register(). I think you still need mm_lock (unless I miss something). What happens when one callout is scanning mmu_notifier_invalidate_range_start() and you unlink. That list next pointer with LIST_POISON1 which is a really bad address for the processor to track. Maybe I misunderstood your description. Thanks, Robin From rdreier at cisco.com Fri Apr 25 14:30:33 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 25 Apr 2008 14:30:33 -0700 Subject: [ofa-general][PATCH 2/12 v1] mlx4: HW queues resource management In-Reply-To: <480F4D7F.8000707@mellanox.co.il> (Yevgeny Petrilin's message of "Wed, 23 Apr 2008 17:53:51 +0300") References: <480F4D7F.8000707@mellanox.co.il> Message-ID: thanks, applied. From rdreier at cisco.com Fri Apr 25 14:33:28 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 25 Apr 2008 14:33:28 -0700 Subject: [ofa-general] Re: [PATCH 2/8]: mthca/mlx4: avoid recycling old FMR R_Keys too soon In-Reply-To: <200804241109.52448.okir@lst.de> (Olaf Kirch's message of "Thu, 24 Apr 2008 11:09:51 +0200") References: <200804241106.57172.okir@lst.de> <200804241108.58748.okir@lst.de> <200804241109.52448.okir@lst.de> Message-ID: Looks mostly OK... the only thing I worry about is in the Sinai optimization case, do we run into trouble with bits getting carried into the top bits of the key? Can someone from Mellanox review this more carefully? - R. From rdreier at cisco.com Fri Apr 25 14:53:23 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 25 Apr 2008 14:53:23 -0700 Subject: [ofa-general][PATCH 12/12 v1] mlx4: QP to ready In-Reply-To: <480F519D.6060101@mellanox.co.il> (Yevgeny Petrilin's message of "Wed, 23 Apr 2008 18:11:25 +0300") References: <480F519D.6060101@mellanox.co.il> Message-ID: thanks, applied From andrea at qumranet.com Fri Apr 25 17:57:26 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Sat, 26 Apr 2008 02:57:26 +0200 Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen In-Reply-To: <20080425192532.GA19717@sgi.com> References: <200804221506.26226.rusty@rustcorp.com.au> <20080425165639.GA23300@duo.random> <20080425192532.GA19717@sgi.com> Message-ID: <20080426005726.GA9514@duo.random> On Fri, Apr 25, 2008 at 02:25:32PM -0500, Robin Holt wrote: > I think you still need mm_lock (unless I miss something). What happens > when one callout is scanning mmu_notifier_invalidate_range_start() and > you unlink. That list next pointer with LIST_POISON1 which is a really > bad address for the processor to track. Ok, _release list_del_init qcan't race with that because it happens in exit_mmap when no other mmu notifier can trigger anymore. _unregister can run concurrently but it does list_del_rcu, that only overwrites the pprev pointer with LIST_POISON2. The mmu_notifier_invalidate_range_start won't crash on LIST_POISON1 thanks to srcu. Actually I did more changes than necessary, for example I noticed the mmu_notifier_register can return a list_add_head instead of list_add_head_rcu. _register can't race against _release thanks to the mm_users temporary or implicit pin. _register can't race against _unregister thanks to the mmu_notifier_mm->lock. And register can't race against all other mmu notifiers thanks to the mm_lock. At this time I've no other pending patches on top of v14-pre3 other than the below micro-optimizing cleanup. It'd be great to have confirmation that v14-pre3 passes GRU/XPMEM regressions tests as well as my KVM testing already passed successfully on it. I'll forward v14-pre3 mmu-notifier-core plus the below to Andrew tomorrow, I'm trying to be optimistic here! ;) diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -187,7 +187,7 @@ int mmu_notifier_register(struct mmu_not * current->mm or explicitly with get_task_mm() or similar). */ spin_lock(&mm->mmu_notifier_mm->lock); - hlist_add_head_rcu(&mn->hlist, &mm->mmu_notifier_mm->list); + hlist_add_head(&mn->hlist, &mm->mmu_notifier_mm->list); spin_unlock(&mm->mmu_notifier_mm->lock); out_unlock: mm_unlock(mm, &data); From pukras at tipower.com Sat Apr 26 02:59:57 2008 From: pukras at tipower.com (Josh Sanders) Date: Sat, 26 Apr 2008 09:59:57 -0000 Subject: [ofa-general] Photoshop CS3 Message-ID: <000301c8a782$c9020900$0100007f@ksfains> Adobe CS3 Master Collection for PC or MAC includes: ~ InDesign CS3 ~ Photoshop CS3 ~ Illustrator CS3 ~ Acrobat 8 Professional ~ Flash CS3 Professional ~ Dreamweaver CS3 ~ Fireworks CS3 ~ Contribute CS3 ~ After Effects CS3 Professional ~ Premiere Pro CS3 ~ Encore DVD CS3 ~ Soundbooth CS3 ~ softcheapnew. com in Internet browser System Requirements For PC: ~ Intel Pentium 4 (1.4GHz processor for DV; 3.4GHz processor for HDV), Intel Centrino, Intel Xeon, (dual 2.8GHz processors for HD), or Intel Core ~ Duo (or compatible) processor; SSE2-enabled processor required for AMD systems ~ Microsoft Windows XP with Service Pack 2 or Microsoft Windows Vista Home Premium, Business, Ultimate, or Enterprise (certified for 32-bit editions) ~ 1GB of RAM for DV; 2GB of RAM for HDV and HD; more RAM recommended when running multiple components ~ 38GB of available hard-disk space (additional free space required during installation) ~ Dedicated 7,200 RPM hard drive for DV and HDV editing; striped disk array storage (RAID 0) for HD; SCSI disk subsystem preferred ~ Microsoft DirectX compatible sound card (multichannel ASIO-compatible sound card recommended) ~ 1,280x1,024 monitor resolution with 32-bit color adapter ~ DVD-ROM drive For MAC: ~ PowerPC G4 or G5 or multicore Intel processor (Adobe Premiere Pro, Encore, and Soundbooth require a multicore Intel processor; Adobe OnLocation CS3 is a Windows application and may be used with Boot Camp) ~ Mac OS X v.10.4.9; Java Runtime Environment 1.5 required for Adobe Version Cue CS3 Server ~ 1GB of RAM for DV; 2GB of RAM for HDV and HD; more RAM recommended when running multiple components ~ 36GB of available hard-disk space (additional free space required during installation) ~ Dedicated 7,200 RPM hard drive for DV and HDV editing; striped disk array storage (RAID 0) for HD; SCSI disk subsystem preferred ~ Core Audio compatible sound card ~ 1,280x1,024 monitor resolution with 32-bit color adapter ~ DVD-ROM drive~ DVD+-R burner required for DVD creation Popular native rock bands topping the charts in France this year are writing and singing in English. At the country's oldest and biggest rock festival this week, the young talent section's performance featured only English lyrics. Francophiles are calling it a threat to the French language and culture. Intel Experts: Video Shows Nuclear Activity in Syria From a-007 at abilenetx.com Sat Apr 26 03:02:39 2008 From: a-007 at abilenetx.com (Ana Krueger) Date: Sat, 26 Apr 2008 12:02:39 +0200 Subject: [ofa-general] hope you remember me Message-ID: <122509485.19843882038351@abilenetx.com> Hello! I am bored this evening. I am nice girl that would like to chat with you. Email me at Helena at whypapeal.cn only, because I am using my friend's email to write this. Hope you like my pictures. From holt at sgi.com Sat Apr 26 06:17:34 2008 From: holt at sgi.com (Robin Holt) Date: Sat, 26 Apr 2008 08:17:34 -0500 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080424174145.GM24536@duo.random> References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com> <20080423163713.GC24536@duo.random> <20080423221928.GV24536@duo.random> <20080424064753.GH24536@duo.random> <20080424095112.GC30298@sgi.com> <20080424153943.GJ24536@duo.random> <20080424174145.GM24536@duo.random> Message-ID: <20080426131734.GB19717@sgi.com> On Thu, Apr 24, 2008 at 07:41:45PM +0200, Andrea Arcangeli wrote: > A full new update will some become visible here: > > http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.25/mmu-notifier-v14-pre3/ I grabbed these and built them. Only change needed was another include. After that, everything built fine and xpmem regression tests ran through the first four sets. The fifth is the oversubscription test which trips my xpmem bug. This is as good as the v12 runs from before. Since this include and the one for mm_types.h both are build breakages for ia64, I think you need to apply your ia64_cpumask and the following (possibly as a single patch) first or in your patch 1. Without that, ia64 doing a git-bisect could hit a build failure. Index: mmu_v14_pre3_xpmem_v003_v1/include/linux/srcu.h =================================================================== --- mmu_v14_pre3_xpmem_v003_v1.orig/include/linux/srcu.h 2008-04-26 06:41:54.000000000 -0500 +++ mmu_v14_pre3_xpmem_v003_v1/include/linux/srcu.h 2008-04-26 07:01:17.292071827 -0500 @@ -27,6 +27,8 @@ #ifndef _LINUX_SRCU_H #define _LINUX_SRCU_H +#include + struct srcu_struct_array { int c[2]; }; From andrea at qumranet.com Sat Apr 26 07:04:06 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Sat, 26 Apr 2008 16:04:06 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080426131734.GB19717@sgi.com> References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com> <20080423163713.GC24536@duo.random> <20080423221928.GV24536@duo.random> <20080424064753.GH24536@duo.random> <20080424095112.GC30298@sgi.com> <20080424153943.GJ24536@duo.random> <20080424174145.GM24536@duo.random> <20080426131734.GB19717@sgi.com> Message-ID: <20080426140406.GH9514@duo.random> On Sat, Apr 26, 2008 at 08:17:34AM -0500, Robin Holt wrote: > Since this include and the one for mm_types.h both are build breakages > for ia64, I think you need to apply your ia64_cpumask and the following > (possibly as a single patch) first or in your patch 1. Without that, > ia64 doing a git-bisect could hit a build failure. Agreed, so it doesn't risk to break ia64 compilation, thanks for the great XPMEM feedback! Also note, I figured out that mmu_notifier_release can actually run concurrently against other mmu notifiers in case there's a vmtruncate (->release could already run concurrently if invoked by _unregister, the only guarantee is that ->release will be called one time and only one time and that no mmu notifier will ever run after _unregister returns). In short I can't keep the list_del_init in _release and I need a list_del_init_rcu instead to fix this minor issue. So this won't really make much difference after all. I'll release #v14 with all this after a bit of kvm testing with it... diff --git a/include/linux/list.h b/include/linux/list.h --- a/include/linux/list.h +++ b/include/linux/list.h @@ -755,6 +755,14 @@ static inline void hlist_del_init(struct } } +static inline void hlist_del_init_rcu(struct hlist_node *n) +{ + if (!hlist_unhashed(n)) { + __hlist_del(n); + n->pprev = NULL; + } +} + /** * hlist_replace_rcu - replace old entry by new one * @old : the element to be replaced diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -22,7 +22,10 @@ struct mmu_notifier_ops { /* * Called either by mmu_notifier_unregister or when the mm is * being destroyed by exit_mmap, always before all pages are - * freed. It's mandatory to implement this method. + * freed. It's mandatory to implement this method. This can + * run concurrently to other mmu notifier methods and it + * should teardown all secondary mmu mappings and freeze the + * secondary mmu. */ void (*release)(struct mmu_notifier *mn, struct mm_struct *mm); diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -19,12 +19,13 @@ /* * This function can't run concurrently against mmu_notifier_register - * or any other mmu notifier method. mmu_notifier_register can only - * run with mm->mm_users > 0 (and exit_mmap runs only when mm_users is - * zero). All other tasks of this mm already quit so they can't invoke - * mmu notifiers anymore. This can run concurrently only against - * mmu_notifier_unregister and it serializes against it with the - * mmu_notifier_mm->lock in addition to RCU. struct mmu_notifier_mm + * because mm->mm_users > 0 during mmu_notifier_register and exit_mmap + * runs with mm_users == 0. Other tasks may still invoke mmu notifiers + * in parallel despite there's no task using this mm anymore, through + * the vmas outside of the exit_mmap context, like with + * vmtruncate. This serializes against mmu_notifier_unregister with + * the mmu_notifier_mm->lock in addition to SRCU and it serializes + * against the other mmu notifiers with SRCU. struct mmu_notifier_mm * can't go away from under us as exit_mmap holds a mm_count pin * itself. */ @@ -44,7 +45,7 @@ void __mmu_notifier_release(struct mm_st * to wait ->release to finish and * mmu_notifier_unregister to return. */ - hlist_del_init(&mn->hlist); + hlist_del_init_rcu(&mn->hlist); /* * SRCU here will block mmu_notifier_unregister until * ->release returns. @@ -185,6 +186,8 @@ int mmu_notifier_register(struct mmu_not * side note: mmu_notifier_release can't run concurrently with * us because we hold the mm_users pin (either implicitly as * current->mm or explicitly with get_task_mm() or similar). + * We can't race against any other mmu notifiers either thanks + * to mm_lock(). */ spin_lock(&mm->mmu_notifier_mm->lock); hlist_add_head(&mn->hlist, &mm->mmu_notifier_mm->list); From andrea at qumranet.com Sat Apr 26 09:46:38 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Sat, 26 Apr 2008 18:46:38 +0200 Subject: [ofa-general] mmu notifier #v14 Message-ID: <20080426164511.GJ9514@duo.random> Hello everyone, here it is the mmu notifier #v14. http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.25/mmu-notifier-v14/ Please everyone involved review and (hopefully ;) ack that this is safe to go in 2.6.26, the most important is to verify that this is a noop when disarmed regardless of MMU_NOTIFIER=y or =n. http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.25/mmu-notifier-v14/mmu-notifier-core I'll be sending that patch to Andrew inbox. Signed-off-by: Andrea Arcangeli diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 8d45fab..ce3251c 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -21,6 +21,7 @@ config KVM tristate "Kernel-based Virtual Machine (KVM) support" depends on HAVE_KVM select PREEMPT_NOTIFIERS + select MMU_NOTIFIER select ANON_INODES ---help--- Support hosting fully virtualized guest machines using hardware diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 2ad6f54..853087a 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -663,6 +663,108 @@ static void rmap_write_protect(struct kvm *kvm, u64 gfn) account_shadowed(kvm, gfn); } +static void kvm_unmap_spte(struct kvm *kvm, u64 *spte) +{ + struct page *page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT); + get_page(page); + rmap_remove(kvm, spte); + set_shadow_pte(spte, shadow_trap_nonpresent_pte); + kvm_flush_remote_tlbs(kvm); + put_page(page); +} + +static void kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp) +{ + u64 *spte, *curr_spte; + + spte = rmap_next(kvm, rmapp, NULL); + while (spte) { + BUG_ON(!(*spte & PT_PRESENT_MASK)); + rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", spte, *spte); + curr_spte = spte; + spte = rmap_next(kvm, rmapp, spte); + kvm_unmap_spte(kvm, curr_spte); + } +} + +void kvm_unmap_hva(struct kvm *kvm, unsigned long hva) +{ + int i; + + /* + * If mmap_sem isn't taken, we can look the memslots with only + * the mmu_lock by skipping over the slots with userspace_addr == 0. + */ + for (i = 0; i < kvm->nmemslots; i++) { + struct kvm_memory_slot *memslot = &kvm->memslots[i]; + unsigned long start = memslot->userspace_addr; + unsigned long end; + + /* mmu_lock protects userspace_addr */ + if (!start) + continue; + + end = start + (memslot->npages << PAGE_SHIFT); + if (hva >= start && hva < end) { + gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT; + kvm_unmap_rmapp(kvm, &memslot->rmap[gfn_offset]); + } + } +} + +static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp) +{ + u64 *spte; + int young = 0; + + spte = rmap_next(kvm, rmapp, NULL); + while (spte) { + int _young; + u64 _spte = *spte; + BUG_ON(!(_spte & PT_PRESENT_MASK)); + _young = _spte & PT_ACCESSED_MASK; + if (_young) { + young = !!_young; + set_shadow_pte(spte, _spte & ~PT_ACCESSED_MASK); + } + spte = rmap_next(kvm, rmapp, spte); + } + return young; +} + +int kvm_age_hva(struct kvm *kvm, unsigned long hva) +{ + int i; + int young = 0; + + /* + * If mmap_sem isn't taken, we can look the memslots with only + * the mmu_lock by skipping over the slots with userspace_addr == 0. + */ + spin_lock(&kvm->mmu_lock); + for (i = 0; i < kvm->nmemslots; i++) { + struct kvm_memory_slot *memslot = &kvm->memslots[i]; + unsigned long start = memslot->userspace_addr; + unsigned long end; + + /* mmu_lock protects userspace_addr */ + if (!start) + continue; + + end = start + (memslot->npages << PAGE_SHIFT); + if (hva >= start && hva < end) { + gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT; + young |= kvm_age_rmapp(kvm, &memslot->rmap[gfn_offset]); + } + } + spin_unlock(&kvm->mmu_lock); + + if (young) + kvm_flush_remote_tlbs(kvm); + + return young; +} + #ifdef MMU_DEBUG static int is_empty_shadow_page(u64 *spt) { @@ -1200,6 +1302,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn) int r; int largepage = 0; pfn_t pfn; + int mmu_seq; down_read(¤t->mm->mmap_sem); if (is_largepage_backed(vcpu, gfn & ~(KVM_PAGES_PER_HPAGE-1))) { @@ -1207,6 +1310,8 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn) largepage = 1; } + mmu_seq = atomic_read(&vcpu->kvm->arch.mmu_notifier_seq); + /* implicit mb(), we'll read before PT lock is unlocked */ pfn = gfn_to_pfn(vcpu->kvm, gfn); up_read(¤t->mm->mmap_sem); @@ -1217,6 +1322,11 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn) } spin_lock(&vcpu->kvm->mmu_lock); + if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_count))) + goto out_unlock; + smp_rmb(); + if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_seq) != mmu_seq)) + goto out_unlock; kvm_mmu_free_some_pages(vcpu); r = __direct_map(vcpu, v, write, largepage, gfn, pfn, PT32E_ROOT_LEVEL); @@ -1224,6 +1334,11 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn) return r; + +out_unlock: + spin_unlock(&vcpu->kvm->mmu_lock); + kvm_release_pfn_clean(pfn); + return 0; } @@ -1355,6 +1470,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, int r; int largepage = 0; gfn_t gfn = gpa >> PAGE_SHIFT; + int mmu_seq; ASSERT(vcpu); ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa)); @@ -1368,6 +1484,8 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, gfn &= ~(KVM_PAGES_PER_HPAGE-1); largepage = 1; } + mmu_seq = atomic_read(&vcpu->kvm->arch.mmu_notifier_seq); + /* implicit mb(), we'll read before PT lock is unlocked */ pfn = gfn_to_pfn(vcpu->kvm, gfn); up_read(¤t->mm->mmap_sem); if (is_error_pfn(pfn)) { @@ -1375,12 +1493,22 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, return 1; } spin_lock(&vcpu->kvm->mmu_lock); + if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_count))) + goto out_unlock; + smp_rmb(); + if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_seq) != mmu_seq)) + goto out_unlock; kvm_mmu_free_some_pages(vcpu); r = __direct_map(vcpu, gpa, error_code & PFERR_WRITE_MASK, largepage, gfn, pfn, TDP_ROOT_LEVEL); spin_unlock(&vcpu->kvm->mmu_lock); return r; + +out_unlock: + spin_unlock(&vcpu->kvm->mmu_lock); + kvm_release_pfn_clean(pfn); + return 0; } static void nonpaging_free(struct kvm_vcpu *vcpu) @@ -1643,11 +1771,11 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, int r; u64 gpte = 0; pfn_t pfn; - - vcpu->arch.update_pte.largepage = 0; + int mmu_seq; + int largepage; if (bytes != 4 && bytes != 8) - return; + goto out_lock; /* * Assume that the pte write on a page table of the same type @@ -1660,7 +1788,7 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, if ((bytes == 4) && (gpa % 4 == 0)) { r = kvm_read_guest(vcpu->kvm, gpa & ~(u64)7, &gpte, 8); if (r) - return; + goto out_lock; memcpy((void *)&gpte + (gpa % 8), new, 4); } else if ((bytes == 8) && (gpa % 8 == 0)) { memcpy((void *)&gpte, new, 8); @@ -1670,23 +1798,35 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, memcpy((void *)&gpte, new, 4); } if (!is_present_pte(gpte)) - return; + goto out_lock; gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT; + largepage = 0; down_read(¤t->mm->mmap_sem); if (is_large_pte(gpte) && is_largepage_backed(vcpu, gfn)) { gfn &= ~(KVM_PAGES_PER_HPAGE-1); - vcpu->arch.update_pte.largepage = 1; + largepage = 1; } + mmu_seq = atomic_read(&vcpu->kvm->arch.mmu_notifier_seq); + /* implicit mb(), we'll read before PT lock is unlocked */ pfn = gfn_to_pfn(vcpu->kvm, gfn); up_read(¤t->mm->mmap_sem); - if (is_error_pfn(pfn)) { - kvm_release_pfn_clean(pfn); - return; - } + if (is_error_pfn(pfn)) + goto out_release_and_lock; + + spin_lock(&vcpu->kvm->mmu_lock); + BUG_ON(!is_error_pfn(vcpu->arch.update_pte.pfn)); vcpu->arch.update_pte.gfn = gfn; vcpu->arch.update_pte.pfn = pfn; + vcpu->arch.update_pte.largepage = largepage; + vcpu->arch.update_pte.mmu_seq = mmu_seq; + return; + +out_release_and_lock: + kvm_release_pfn_clean(pfn); +out_lock: + spin_lock(&vcpu->kvm->mmu_lock); } void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, @@ -1711,7 +1851,6 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes); mmu_guess_page_from_pte_write(vcpu, gpa, new, bytes); - spin_lock(&vcpu->kvm->mmu_lock); kvm_mmu_free_some_pages(vcpu); ++vcpu->kvm->stat.mmu_pte_write; kvm_mmu_audit(vcpu, "pre pte write"); @@ -1790,11 +1929,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, } } kvm_mmu_audit(vcpu, "post pte write"); - spin_unlock(&vcpu->kvm->mmu_lock); if (!is_error_pfn(vcpu->arch.update_pte.pfn)) { kvm_release_pfn_clean(vcpu->arch.update_pte.pfn); vcpu->arch.update_pte.pfn = bad_pfn; } + spin_unlock(&vcpu->kvm->mmu_lock); } int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 156fe10..4ac73a6 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -263,6 +263,12 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page, pfn = vcpu->arch.update_pte.pfn; if (is_error_pfn(pfn)) return; + if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_count))) + return; + smp_rmb(); + if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_seq) != + vcpu->arch.update_pte.mmu_seq)) + return; kvm_get_pfn(pfn); mmu_set_spte(vcpu, spte, page->role.access, pte_access, 0, 0, gpte & PT_DIRTY_MASK, NULL, largepage, gpte_to_gfn(gpte), @@ -380,6 +386,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, int r; pfn_t pfn; int largepage = 0; + int mmu_seq; pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code); kvm_mmu_audit(vcpu, "pre page fault"); @@ -413,6 +420,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, largepage = 1; } } + mmu_seq = atomic_read(&vcpu->kvm->arch.mmu_notifier_seq); + /* implicit mb(), we'll read before PT lock is unlocked */ pfn = gfn_to_pfn(vcpu->kvm, walker.gfn); up_read(¤t->mm->mmap_sem); @@ -424,6 +433,11 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, } spin_lock(&vcpu->kvm->mmu_lock); + if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_count))) + goto out_unlock; + smp_rmb(); + if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_seq) != mmu_seq)) + goto out_unlock; kvm_mmu_free_some_pages(vcpu); shadow_pte = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault, largepage, &write_pt, pfn); @@ -439,6 +453,11 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, spin_unlock(&vcpu->kvm->mmu_lock); return write_pt; + +out_unlock: + spin_unlock(&vcpu->kvm->mmu_lock); + kvm_release_pfn_clean(pfn); + return 0; } static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0ce5563..860559a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include @@ -3859,15 +3860,152 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) free_page((unsigned long)vcpu->arch.pio_data); } +static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn) +{ + struct kvm_arch *kvm_arch; + kvm_arch = container_of(mn, struct kvm_arch, mmu_notifier); + return container_of(kvm_arch, struct kvm, arch); +} + +static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long address) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + /* + * When ->invalidate_page runs, the linux pte has been zapped + * already but the page is still allocated until + * ->invalidate_page returns. So if we increase the sequence + * here the kvm page fault will notice if the spte can't be + * established because the page is going to be freed. If + * instead the kvm page fault establishes the spte before + * ->invalidate_page runs, kvm_unmap_hva will release it + * before returning. + + * No need of memory barriers as the sequence increase only + * need to be seen at spin_unlock time, and not at spin_lock + * time. + * + * Increasing the sequence after the spin_unlock would be + * unsafe because the kvm page fault could then establish the + * pte after kvm_unmap_hva returned, without noticing the page + * is going to be freed. + */ + atomic_inc(&kvm->arch.mmu_notifier_seq); + spin_lock(&kvm->mmu_lock); + kvm_unmap_hva(kvm, address); + spin_unlock(&kvm->mmu_lock); +} + +static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + + /* + * The count increase must become visible at unlock time as no + * spte can be established without taking the mmu_lock and + * count is also read inside the mmu_lock critical section. + */ + atomic_inc(&kvm->arch.mmu_notifier_count); + + spin_lock(&kvm->mmu_lock); + for (; start < end; start += PAGE_SIZE) + kvm_unmap_hva(kvm, start); + spin_unlock(&kvm->mmu_lock); +} + +static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + /* + * + * This sequence increase will notify the kvm page fault that + * the page that is going to be mapped in the spte could have + * been freed. + * + * There's also an implicit mb() here in this comment, + * provided by the last PT lock taken to zap pagetables, and + * that the read side has to take too in follow_page(). The + * sequence increase in the worst case will become visible to + * the kvm page fault after the spin_lock of the last PT lock + * of the last PT-lock-protected critical section preceeding + * invalidate_range_end. So if the kvm page fault is about to + * establish the spte inside the mmu_lock, while we're freeing + * the pages, it will have to backoff and when it retries, it + * will have to take the PT lock before it can check the + * pagetables again. And after taking the PT lock it will + * re-establish the pte even if it will see the already + * increased sequence number before calling gfn_to_pfn. + */ + atomic_inc(&kvm->arch.mmu_notifier_seq); + /* + * The sequence increase must be visible before count + * decrease. The page fault has to read count before sequence + * for this write order to be effective. + */ + wmb(); + atomic_dec(&kvm->arch.mmu_notifier_count); + BUG_ON(atomic_read(&kvm->arch.mmu_notifier_count) < 0); +} + +static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long address) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + return kvm_age_hva(kvm, address); +} + +static void kvm_free_vcpus(struct kvm *kvm); +/* This must zap all the sptes because all pages will be freed then */ +static void kvm_mmu_notifier_release(struct mmu_notifier *mn, + struct mm_struct *mm) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + BUG_ON(mm != kvm->mm); + + kvm_destroy_common_vm(kvm); + + kvm_free_pit(kvm); + kfree(kvm->arch.vpic); + kfree(kvm->arch.vioapic); + kvm_free_vcpus(kvm); + kvm_free_physmem(kvm); + if (kvm->arch.apic_access_page) + put_page(kvm->arch.apic_access_page); +} + +static const struct mmu_notifier_ops kvm_mmu_notifier_ops = { + .release = kvm_mmu_notifier_release, + .invalidate_page = kvm_mmu_notifier_invalidate_page, + .invalidate_range_start = kvm_mmu_notifier_invalidate_range_start, + .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end, + .clear_flush_young = kvm_mmu_notifier_clear_flush_young, +}; + struct kvm *kvm_arch_create_vm(void) { struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL); + int err; if (!kvm) return ERR_PTR(-ENOMEM); INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); + kvm->arch.mmu_notifier.ops = &kvm_mmu_notifier_ops; + err = mmu_notifier_register(&kvm->arch.mmu_notifier, current->mm); + if (err) { + kfree(kvm); + return ERR_PTR(err); + } + return kvm; } @@ -3899,13 +4037,12 @@ static void kvm_free_vcpus(struct kvm *kvm) void kvm_arch_destroy_vm(struct kvm *kvm) { - kvm_free_pit(kvm); - kfree(kvm->arch.vpic); - kfree(kvm->arch.vioapic); - kvm_free_vcpus(kvm); - kvm_free_physmem(kvm); - if (kvm->arch.apic_access_page) - put_page(kvm->arch.apic_access_page); + /* + * kvm_mmu_notifier_release() will be called before + * mmu_notifier_unregister returns, if it didn't run + * already. + */ + mmu_notifier_unregister(&kvm->arch.mmu_notifier, kvm->mm); kfree(kvm); } diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h index 9d963cd..f07e321 100644 --- a/include/asm-x86/kvm_host.h +++ b/include/asm-x86/kvm_host.h @@ -13,6 +13,7 @@ #include #include +#include #include #include @@ -247,6 +248,7 @@ struct kvm_vcpu_arch { gfn_t gfn; /* presumed gfn during guest pte update */ pfn_t pfn; /* pfn corresponding to that gfn */ int largepage; + int mmu_seq; } update_pte; struct i387_fxsave_struct host_fx_image; @@ -314,6 +316,10 @@ struct kvm_arch{ struct page *apic_access_page; gpa_t wall_clock; + + struct mmu_notifier mmu_notifier; + atomic_t mmu_notifier_seq; + atomic_t mmu_notifier_count; }; struct kvm_vm_stat { @@ -434,6 +440,8 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu); int kvm_mmu_setup(struct kvm_vcpu *vcpu); void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte); +void kvm_unmap_hva(struct kvm *kvm, unsigned long hva); +int kvm_age_hva(struct kvm *kvm, unsigned long hva); int kvm_mmu_reset_context(struct kvm_vcpu *vcpu); void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot); void kvm_mmu_zap_all(struct kvm *kvm); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 4e16682..f089edc 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -267,6 +267,7 @@ void kvm_arch_check_processor_compat(void *rtn); int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu); void kvm_free_physmem(struct kvm *kvm); +void kvm_destroy_common_vm(struct kvm *kvm); struct kvm *kvm_arch_create_vm(void); void kvm_arch_destroy_vm(struct kvm *kvm); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f095b73..4beae7a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -231,15 +231,19 @@ void kvm_free_physmem(struct kvm *kvm) kvm_free_physmem_slot(&kvm->memslots[i], NULL); } -static void kvm_destroy_vm(struct kvm *kvm) +void kvm_destroy_common_vm(struct kvm *kvm) { - struct mm_struct *mm = kvm->mm; - spin_lock(&kvm_lock); list_del(&kvm->vm_list); spin_unlock(&kvm_lock); kvm_io_bus_destroy(&kvm->pio_bus); kvm_io_bus_destroy(&kvm->mmio_bus); +} + +static void kvm_destroy_vm(struct kvm *kvm) +{ + struct mm_struct *mm = kvm->mm; + kvm_arch_destroy_vm(kvm); mmdrop(mm); } As usual you also need the kvm-mmu-notifier-lock patch to read the memslots with only the mmu_lock. Signed-off-by: Andrea Arcangeli diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c7ad235..8be6551 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3871,16 +3871,23 @@ int kvm_arch_set_memory_region(struct kvm *kvm, */ if (!user_alloc) { if (npages && !old.rmap) { + unsigned long userspace_addr; + down_write(¤t->mm->mmap_sem); - memslot->userspace_addr = do_mmap(NULL, 0, - npages * PAGE_SIZE, - PROT_READ | PROT_WRITE, - MAP_SHARED | MAP_ANONYMOUS, - 0); + userspace_addr = do_mmap(NULL, 0, + npages * PAGE_SIZE, + PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_ANONYMOUS, + 0); up_write(¤t->mm->mmap_sem); - if (IS_ERR((void *)memslot->userspace_addr)) - return PTR_ERR((void *)memslot->userspace_addr); + if (IS_ERR((void *)userspace_addr)) + return PTR_ERR((void *)userspace_addr); + + /* set userspace_addr atomically for kvm_hva_to_rmapp */ + spin_lock(&kvm->mmu_lock); + memslot->userspace_addr = userspace_addr; + spin_unlock(&kvm->mmu_lock); } else { if (!old.user_alloc && old.rmap) { int ret; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 6a52c08..97bcc8d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -342,7 +342,15 @@ int __kvm_set_memory_region(struct kvm *kvm, memset(new.rmap, 0, npages * sizeof(*new.rmap)); new.user_alloc = user_alloc; - new.userspace_addr = mem->userspace_addr; + /* + * hva_to_rmmap() serialzies with the mmu_lock and to be + * safe it has to ignore memslots with !user_alloc && + * !userspace_addr. + */ + if (user_alloc) + new.userspace_addr = mem->userspace_addr; + else + new.userspace_addr = 0; } if (npages && !new.lpage_info) { int largepages = npages / KVM_PAGES_PER_HPAGE; @@ -374,14 +382,18 @@ int __kvm_set_memory_region(struct kvm *kvm, memset(new.dirty_bitmap, 0, dirty_bytes); } + spin_lock(&kvm->mmu_lock); if (mem->slot >= kvm->nmemslots) kvm->nmemslots = mem->slot + 1; *memslot = new; + spin_unlock(&kvm->mmu_lock); r = kvm_arch_set_memory_region(kvm, mem, old, user_alloc); if (r) { + spin_lock(&kvm->mmu_lock); *memslot = old; + spin_unlock(&kvm->mmu_lock); goto out_free; } From aliguori at us.ibm.com Sat Apr 26 11:59:23 2008 From: aliguori at us.ibm.com (Anthony Liguori) Date: Sat, 26 Apr 2008 13:59:23 -0500 Subject: [ofa-general] Re: mmu notifier #v14 In-Reply-To: <20080426164511.GJ9514@duo.random> References: <20080426164511.GJ9514@duo.random> Message-ID: <48137B8B.7010202@us.ibm.com> Andrea Arcangeli wrote: > Hello everyone, > > here it is the mmu notifier #v14. > > http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.25/mmu-notifier-v14/ > > Please everyone involved review and (hopefully ;) ack that this is > safe to go in 2.6.26, the most important is to verify that this is a > noop when disarmed regardless of MMU_NOTIFIER=y or =n. > > http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.25/mmu-notifier-v14/mmu-notifier-core > > I'll be sending that patch to Andrew inbox. > > Signed-off-by: Andrea Arcangeli > > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig > index 8d45fab..ce3251c 100644 > --- a/arch/x86/kvm/Kconfig > +++ b/arch/x86/kvm/Kconfig > @@ -21,6 +21,7 @@ config KVM > tristate "Kernel-based Virtual Machine (KVM) support" > depends on HAVE_KVM > select PREEMPT_NOTIFIERS > + select MMU_NOTIFIER > select ANON_INODES > ---help--- > Support hosting fully virtualized guest machines using hardware > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 2ad6f54..853087a 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -663,6 +663,108 @@ static void rmap_write_protect(struct kvm *kvm, u64 gfn) > account_shadowed(kvm, gfn); > } > > +static void kvm_unmap_spte(struct kvm *kvm, u64 *spte) > +{ > + struct page *page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT); > + get_page(page); > You should not assume a struct page exists for any given spte. Instead, use kvm_get_pfn() and kvm_release_pfn_clean(). > static void nonpaging_free(struct kvm_vcpu *vcpu) > @@ -1643,11 +1771,11 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, > int r; > u64 gpte = 0; > pfn_t pfn; > - > - vcpu->arch.update_pte.largepage = 0; > + int mmu_seq; > + int largepage; > > if (bytes != 4 && bytes != 8) > - return; > + goto out_lock; > > /* > * Assume that the pte write on a page table of the same type > @@ -1660,7 +1788,7 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, > if ((bytes == 4) && (gpa % 4 == 0)) { > r = kvm_read_guest(vcpu->kvm, gpa & ~(u64)7, &gpte, 8); > if (r) > - return; > + goto out_lock; > memcpy((void *)&gpte + (gpa % 8), new, 4); > } else if ((bytes == 8) && (gpa % 8 == 0)) { > memcpy((void *)&gpte, new, 8); > @@ -1670,23 +1798,35 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, > memcpy((void *)&gpte, new, 4); > } > if (!is_present_pte(gpte)) > - return; > + goto out_lock; > gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT; > > + largepage = 0; > down_read(¤t->mm->mmap_sem); > if (is_large_pte(gpte) && is_largepage_backed(vcpu, gfn)) { > gfn &= ~(KVM_PAGES_PER_HPAGE-1); > - vcpu->arch.update_pte.largepage = 1; > + largepage = 1; > } > + mmu_seq = atomic_read(&vcpu->kvm->arch.mmu_notifier_seq); > + /* implicit mb(), we'll read before PT lock is unlocked */ > pfn = gfn_to_pfn(vcpu->kvm, gfn); > up_read(¤t->mm->mmap_sem); > > - if (is_error_pfn(pfn)) { > - kvm_release_pfn_clean(pfn); > - return; > - } > + if (is_error_pfn(pfn)) > + goto out_release_and_lock; > + > + spin_lock(&vcpu->kvm->mmu_lock); > + BUG_ON(!is_error_pfn(vcpu->arch.update_pte.pfn)); > vcpu->arch.update_pte.gfn = gfn; > vcpu->arch.update_pte.pfn = pfn; > + vcpu->arch.update_pte.largepage = largepage; > + vcpu->arch.update_pte.mmu_seq = mmu_seq; > + return; > + > +out_release_and_lock: > + kvm_release_pfn_clean(pfn); > +out_lock: > + spin_lock(&vcpu->kvm->mmu_lock); > } > Perhaps I just have a weak stomach but I am uneasy having a function that takes a lock on exit. I walked through the logic and it doesn't appear to be wrong but it also is pretty clear that you could defer the acquisition of the lock to the caller (in this case, kvm_mmu_pte_write) by moving the update_pte assignment into kvm_mmu_pte_write. > void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, > @@ -1711,7 +1851,6 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, > > pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes); > mmu_guess_page_from_pte_write(vcpu, gpa, new, bytes); > Worst case, you pass 4 more pointer arguments here and, take the spin lock, and then depending on the result of mmu_guess_page_from_pte_write, update vcpu->arch.update_pte. > @@ -3899,13 +4037,12 @@ static void kvm_free_vcpus(struct kvm *kvm) > > void kvm_arch_destroy_vm(struct kvm *kvm) > { > - kvm_free_pit(kvm); > - kfree(kvm->arch.vpic); > - kfree(kvm->arch.vioapic); > - kvm_free_vcpus(kvm); > - kvm_free_physmem(kvm); > - if (kvm->arch.apic_access_page) > - put_page(kvm->arch.apic_access_page); > + /* > + * kvm_mmu_notifier_release() will be called before > + * mmu_notifier_unregister returns, if it didn't run > + * already. > + */ > + mmu_notifier_unregister(&kvm->arch.mmu_notifier, kvm->mm); > kfree(kvm); > } > Why move the destruction of the vm to the MMU notifier unregister hook? Does anything else ever call mmu_notifier_unregister that would implicitly destroy the VM? Regards, Anthony Liguori From dks at mediaweb.com Sat Apr 26 15:52:12 2008 From: dks at mediaweb.com (DK Smith) Date: Sat, 26 Apr 2008 15:52:12 -0700 Subject: [ofa-general] install.sh question In-Reply-To: <1207688301.1661.86.camel@localhost> References: <1207688301.1661.86.camel@localhost> Message-ID: <4813B21C.4020901@mediaweb.com> Bump +1 That's a good question. Frank Leers wrote: > Hi all, > > I'd like to be able to use the provided install.sh from cluster nodes to > install from a build which is shared over nfs, while utilizing an > ofed_net.conf The Install Guide talks about this, but I must be missing > something in the detail. > > Is there a way to not check if a build needs to be (re)done and simply > install the rpm's that were created during the original build, then > create the ifcfg-ib? devices based on the template file passed in with > -net ? I prefer not to have kernel sources, compiler, > etc. on these compute nodes, nor should I have to recompile for each > homogeneous node. > > thanks, > > -frank > Cheers, DK From andrea at qumranet.com Sat Apr 26 17:20:19 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Sun, 27 Apr 2008 02:20:19 +0200 Subject: [ofa-general] Re: mmu notifier #v14 In-Reply-To: <48137B8B.7010202@us.ibm.com> References: <20080426164511.GJ9514@duo.random> <48137B8B.7010202@us.ibm.com> Message-ID: <20080427002019.GL9514@duo.random> On Sat, Apr 26, 2008 at 01:59:23PM -0500, Anthony Liguori wrote: >> +static void kvm_unmap_spte(struct kvm *kvm, u64 *spte) >> +{ >> + struct page *page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >> >> PAGE_SHIFT); >> + get_page(page); >> > > You should not assume a struct page exists for any given spte. Instead, use > kvm_get_pfn() and kvm_release_pfn_clean(). Last email from muli at ibm in my inbox argues it's useless to build rmap on mmio regions, so the above is more efficient so put_page runs directly on the page without going back and forth between spte -> pfn -> page -> pfn -> page in a single function. Certainly if we start building rmap on mmio regions we'll have to change that. > Perhaps I just have a weak stomach but I am uneasy having a function that > takes a lock on exit. I walked through the logic and it doesn't appear to > be wrong but it also is pretty clear that you could defer the acquisition > of the lock to the caller (in this case, kvm_mmu_pte_write) by moving the > update_pte assignment into kvm_mmu_pte_write. I agree out_lock is an uncommon exit path, the problem is that the code was buggy, and I tried to fix it with the smallest possible change and that resulting in an out_lock. That section likely need a refactoring, all those update_pte fields should be at least returned by the function guess_.... but I tried to reduce the changes to make the issue more readable, I didn't want to rewrite certain functions just to take a spinlock a few instructions ahead. > Worst case, you pass 4 more pointer arguments here and, take the spin lock, > and then depending on the result of mmu_guess_page_from_pte_write, update > vcpu->arch.update_pte. Yes that was my same idea, but that's left for a later patch. Fixing this bug mixed with the mmu notifier patch was perhaps excessive already ;). > Why move the destruction of the vm to the MMU notifier unregister hook? > Does anything else ever call mmu_notifier_unregister that would implicitly > destroy the VM? mmu notifier ->release can run at anytime before the filehandle is closed. ->release has to zap all sptes and freeze the mmu (hence all vcpus) to prevent any further page fault. After ->release returns all pages are freed (we'll never relay on the page pin to avoid the rmap_remove put_page to be a relevant unpin event). So the idea is that I wanted to maintain the same ordering of the current code in the vm destroy event, I didn't want to leave a partially shutdown VM on the vmlist. If the ordering is entirely irrelevant and the kvm_arch_destroy_vm can run well before kvm_destroy_vm is called, then I can avoid changes to kvm_main.c but I doubt. I've done it in a way that archs not needing mmu notifiers like s390 can simply add the kvm_destroy_common_vm at the top of their kvm_arch_destroy_vm. All others using mmu_notifiers have to invoke kvm_destroy_common_vm in the ->release of the mmu notifiers. This will ensure that everything will be ok regardless if exit_mmap is called before/after exit_files, and it won't make a whole lot of difference anymore, if the driver fd is pinned through vmas->vm_file released in exit_mmap or through the task filedescriptors relased in exit_files etc... Infact this allows to call mmu_notifier_unregister at anytime later after the task has already been killed, without any trouble (like if the mmu notifier owner isn't registering in current->mm but some other tasks mm). From anthony at codemonkey.ws Sat Apr 26 18:54:23 2008 From: anthony at codemonkey.ws (Anthony Liguori) Date: Sat, 26 Apr 2008 20:54:23 -0500 Subject: [ofa-general] Re: [kvm-devel] mmu notifier #v14 In-Reply-To: <20080427002019.GL9514@duo.random> References: <20080426164511.GJ9514@duo.random> <48137B8B.7010202@us.ibm.com> <20080427002019.GL9514@duo.random> Message-ID: <4813DCCF.3020201@codemonkey.ws> Andrea Arcangeli wrote: > On Sat, Apr 26, 2008 at 01:59:23PM -0500, Anthony Liguori wrote: > >>> +static void kvm_unmap_spte(struct kvm *kvm, u64 *spte) >>> +{ >>> + struct page *page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >> >>> PAGE_SHIFT); >>> + get_page(page); >>> >>> >> You should not assume a struct page exists for any given spte. Instead, use >> kvm_get_pfn() and kvm_release_pfn_clean(). >> > > Last email from muli at ibm in my inbox argues it's useless to build rmap > on mmio regions, so the above is more efficient so put_page runs > directly on the page without going back and forth between spte -> pfn > -> page -> pfn -> page in a single function. > Avi can correct me if I'm wrong, but I don't think the consensus of that discussion was that we're going to avoid putting mmio pages in the rmap. Practically speaking, replacing: + struct page *page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT); + get_page(page); With: unsigned long pfn = (*spte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT; kvm_get_pfn(pfn); Results in exactly the same code except the later allows mmio pfns in the rmap. So ignoring the whole mmio thing, using accessors that are already there and used elsewhere seems like a good idea :-) > Certainly if we start building rmap on mmio regions we'll have to > change that. > > >> Perhaps I just have a weak stomach but I am uneasy having a function that >> takes a lock on exit. I walked through the logic and it doesn't appear to >> be wrong but it also is pretty clear that you could defer the acquisition >> of the lock to the caller (in this case, kvm_mmu_pte_write) by moving the >> update_pte assignment into kvm_mmu_pte_write. >> > > I agree out_lock is an uncommon exit path, the problem is that the > code was buggy, and I tried to fix it with the smallest possible > change and that resulting in an out_lock. That section likely need a > refactoring, all those update_pte fields should be at least returned > by the function guess_.... but I tried to reduce the changes to make > the issue more readable, I didn't want to rewrite certain functions > just to take a spinlock a few instructions ahead. > I appreciate the desire to minimize changes, but taking a lock on return seems to take that to a bit of an extreme. It seems like a simple thing to fix though, no? >> Why move the destruction of the vm to the MMU notifier unregister hook? >> Does anything else ever call mmu_notifier_unregister that would implicitly >> destroy the VM? >> > > mmu notifier ->release can run at anytime before the filehandle is > closed. ->release has to zap all sptes and freeze the mmu (hence all > vcpus) to prevent any further page fault. After ->release returns all > pages are freed (we'll never relay on the page pin to avoid the > rmap_remove put_page to be a relevant unpin event). So the idea is > that I wanted to maintain the same ordering of the current code in the > vm destroy event, I didn't want to leave a partially shutdown VM on > the vmlist. If the ordering is entirely irrelevant and the > kvm_arch_destroy_vm can run well before kvm_destroy_vm is called, then > I can avoid changes to kvm_main.c but I doubt. > > I've done it in a way that archs not needing mmu notifiers like s390 > can simply add the kvm_destroy_common_vm at the top of their > kvm_arch_destroy_vm. All others using mmu_notifiers have to invoke > kvm_destroy_common_vm in the ->release of the mmu notifiers. > > This will ensure that everything will be ok regardless if exit_mmap is > called before/after exit_files, and it won't make a whole lot of > difference anymore, if the driver fd is pinned through vmas->vm_file > released in exit_mmap or through the task filedescriptors relased in > exit_files etc... Infact this allows to call mmu_notifier_unregister > at anytime later after the task has already been killed, without any > trouble (like if the mmu notifier owner isn't registering in > current->mm but some other tasks mm). > I see. It seems a little strange to me as a KVM guest isn't really tied to the current mm. It seems like the net effect of this is that we are now tying a KVM guest to an mm. For instance, if you create a guest, but didn't assign any memory to it, you could transfer the fd to another process and then close the fd (without destroying the guest). The other process then could assign memory to it and presumably run the guest. With your change, as soon as the first process exits, the guest will be destroyed. I'm not sure this behavioral difference really matters but it is a behavioral difference. Regards, Anthony Liguori > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > kvm-devel mailing list > kvm-devel at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/kvm-devel > From andrea at qumranet.com Sat Apr 26 20:05:14 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Sun, 27 Apr 2008 05:05:14 +0200 Subject: [ofa-general] Re: [kvm-devel] mmu notifier #v14 In-Reply-To: <4813DCCF.3020201@codemonkey.ws> References: <20080426164511.GJ9514@duo.random> <48137B8B.7010202@us.ibm.com> <20080427002019.GL9514@duo.random> <4813DCCF.3020201@codemonkey.ws> Message-ID: <20080427030514.GM9514@duo.random> On Sat, Apr 26, 2008 at 08:54:23PM -0500, Anthony Liguori wrote: > Avi can correct me if I'm wrong, but I don't think the consensus of that > discussion was that we're going to avoid putting mmio pages in the rmap. My first impression on that discussion was that pci-passthrough mmio can't be swapped, can't require write throttling etc.. ;). From a linux VM pagetable point of view rmap on mmio looks weird. However thinking some more, it's not like in the linux kernel where write protect through rmap is needed only for write-throttling MAP_SHARED which clearly is strictly RAM, for sptes we need it for every cr3 touch too to trap pagetable updates (think ioremap done by guest kernel). So I think Avi's take that we need rmap for everything mapped by sptes is probably the only feasible way to go. > Practically speaking, replacing: > > + struct page *page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >> > PAGE_SHIFT); > + get_page(page); > > > With: > > unsigned long pfn = (*spte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT; > kvm_get_pfn(pfn); > > Results in exactly the same code except the later allows mmio pfns in the > rmap. So ignoring the whole mmio thing, using accessors that are already > there and used elsewhere seems like a good idea :-) Agreed especially at the light of the above. I didn't actually touch that function for a while (I clearly wrote it before we started moving the kvm mmu code from page to pfn), and it was still safe to use to test the locking of the mmu notifier methods. My current main focus in the last few days was to get the locking right against the last mmu notifier code #v14 ;). Now that I look into it more closely, the get_page/put_page are unnecessary by now (it was necessary with the older patches that didn't implement range_begin and that relied on page pinning). Not just in that function, but all reference counting inside kvm is now entirely useless and can be removed. NOTE: it is safe to flush the tlb outside the mmu_lock if done inside the mmu_notifier methods. But only mmu notifiers can defer the tlb flush after releasing mmu_lock because the page can't be freed by the VM until we return. All other kvm code must instead definitely flush the tlb inside the mmu_lock, otherwise when the mmu notifier code runs, it will see the spte nonpresent and so the mmu notifier code will do nothing (it will not wait kvm to drop the mmu_lock before allowing the main linux VM to free the page). The tlb flush must happen before the page is freed, and doing it inside mmu_lock everywhere (except in mmu-notifier contex where it can be done after releasing mmu_lock) guarantees it. The positive side of the tradeoff of having to do the tlb flush inside the mmu_lock, is that KVM can now safely zap and unmap as many sptes at it wants and do a single tlb flush at the end. The pages can't be freed as long as the mmu_lock is hold (this is why the tlb flush has to be done inside the mmu_lock). This model reduces heavily the tlb flush frequency for large spte-mangling, and tlb flushes here are quite expensive because of ipis. > I appreciate the desire to minimize changes, but taking a lock on return > seems to take that to a bit of an extreme. It seems like a simple thing to > fix though, no? I agree it needs to be rewritten as a cleaner fix but probably in a separate patch (which has to be incremental as that code will reject on the mmu notifier patch). I didn't see as a big issue however to apply my quick fix first and cleanup with an incremental update. > I see. It seems a little strange to me as a KVM guest isn't really tied to > the current mm. It seems like the net effect of this is that we are now > tying a KVM guest to an mm. > > For instance, if you create a guest, but didn't assign any memory to it, > you could transfer the fd to another process and then close the fd (without > destroying the guest). The other process then could assign memory to it > and presumably run the guest. Passing the anon kvm vm fd through unix sockets to another task is exactly why we need things like ->release not dependent on fd->release vma->vm_file->release ordering in the do_exit path to teardown the VM. The guest itself is definitely tied to a "mm", the guest runs using get_user_pages and get_user_pages is meaningless without an mm. But the fd where we run the ioctl isn't tied to the mm, it's just an fd that can be passed across tasks with unix sockets. > With your change, as soon as the first process exits, the guest will be > destroyed. I'm not sure this behavioral difference really matters but it > is a behavioral difference. The guest-mode of the cpu, can't run safely on any task but the one with the "mm" tracked by the mmu notifiers and where the memory is allocated from. The sptes points to the memory allocated in that "mm". It's definitely memory-corrupting to leave any spte established when the last thread of that "mm" exists as the memory supposedly pointed by the orphaned sptes will go immediately in the freelist and reused by the kernel. Keep in mind that there's no page pin on the memory pointed by the sptes. The ioctl of the qemu userland could run in any other task with a mm different than the one of the guest and ->release allows this to work fine without memory corruption and without requiring page pinning. As far a I can tell your example explains why we need this fix ;). Here an updated patch that passes my swap test (the only missing thing is the out_lock cleanup). Signed-off-by: Andrea Arcangeli diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 8d45fab..ce3251c 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -21,6 +21,7 @@ config KVM tristate "Kernel-based Virtual Machine (KVM) support" depends on HAVE_KVM select PREEMPT_NOTIFIERS + select MMU_NOTIFIER select ANON_INODES ---help--- Support hosting fully virtualized guest machines using hardware diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 2ad6f54..330eaed 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -663,6 +663,101 @@ static void rmap_write_protect(struct kvm *kvm, u64 gfn) account_shadowed(kvm, gfn); } +static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp) +{ + u64 *spte, *curr_spte; + int need_tlb_flush = 0; + + spte = rmap_next(kvm, rmapp, NULL); + while (spte) { + BUG_ON(!(*spte & PT_PRESENT_MASK)); + rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", spte, *spte); + curr_spte = spte; + spte = rmap_next(kvm, rmapp, spte); + rmap_remove(kvm, curr_spte); + set_shadow_pte(curr_spte, shadow_trap_nonpresent_pte); + need_tlb_flush = 1; + } + return need_tlb_flush; +} + +int kvm_unmap_hva(struct kvm *kvm, unsigned long hva) +{ + int i; + int need_tlb_flush = 0; + + /* + * If mmap_sem isn't taken, we can look the memslots with only + * the mmu_lock by skipping over the slots with userspace_addr == 0. + */ + for (i = 0; i < kvm->nmemslots; i++) { + struct kvm_memory_slot *memslot = &kvm->memslots[i]; + unsigned long start = memslot->userspace_addr; + unsigned long end; + + /* mmu_lock protects userspace_addr */ + if (!start) + continue; + + end = start + (memslot->npages << PAGE_SHIFT); + if (hva >= start && hva < end) { + gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT; + need_tlb_flush |= kvm_unmap_rmapp(kvm, + &memslot->rmap[gfn_offset]); + } + } + + return need_tlb_flush; +} + +static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp) +{ + u64 *spte; + int young = 0; + + spte = rmap_next(kvm, rmapp, NULL); + while (spte) { + int _young; + u64 _spte = *spte; + BUG_ON(!(_spte & PT_PRESENT_MASK)); + _young = _spte & PT_ACCESSED_MASK; + if (_young) { + young = !!_young; + set_shadow_pte(spte, _spte & ~PT_ACCESSED_MASK); + } + spte = rmap_next(kvm, rmapp, spte); + } + return young; +} + +int kvm_age_hva(struct kvm *kvm, unsigned long hva) +{ + int i; + int young = 0; + + /* + * If mmap_sem isn't taken, we can look the memslots with only + * the mmu_lock by skipping over the slots with userspace_addr == 0. + */ + for (i = 0; i < kvm->nmemslots; i++) { + struct kvm_memory_slot *memslot = &kvm->memslots[i]; + unsigned long start = memslot->userspace_addr; + unsigned long end; + + /* mmu_lock protects userspace_addr */ + if (!start) + continue; + + end = start + (memslot->npages << PAGE_SHIFT); + if (hva >= start && hva < end) { + gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT; + young |= kvm_age_rmapp(kvm, &memslot->rmap[gfn_offset]); + } + } + + return young; +} + #ifdef MMU_DEBUG static int is_empty_shadow_page(u64 *spt) { @@ -1200,6 +1295,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn) int r; int largepage = 0; pfn_t pfn; + int mmu_seq; down_read(¤t->mm->mmap_sem); if (is_largepage_backed(vcpu, gfn & ~(KVM_PAGES_PER_HPAGE-1))) { @@ -1207,6 +1303,8 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn) largepage = 1; } + mmu_seq = atomic_read(&vcpu->kvm->arch.mmu_notifier_seq); + /* implicit mb(), we'll read before PT lock is unlocked */ pfn = gfn_to_pfn(vcpu->kvm, gfn); up_read(¤t->mm->mmap_sem); @@ -1217,6 +1315,11 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn) } spin_lock(&vcpu->kvm->mmu_lock); + if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_count))) + goto out_unlock; + smp_rmb(); + if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_seq) != mmu_seq)) + goto out_unlock; kvm_mmu_free_some_pages(vcpu); r = __direct_map(vcpu, v, write, largepage, gfn, pfn, PT32E_ROOT_LEVEL); @@ -1224,6 +1327,11 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn) return r; + +out_unlock: + spin_unlock(&vcpu->kvm->mmu_lock); + kvm_release_pfn_clean(pfn); + return 0; } @@ -1355,6 +1463,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, int r; int largepage = 0; gfn_t gfn = gpa >> PAGE_SHIFT; + int mmu_seq; ASSERT(vcpu); ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa)); @@ -1368,6 +1477,8 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, gfn &= ~(KVM_PAGES_PER_HPAGE-1); largepage = 1; } + mmu_seq = atomic_read(&vcpu->kvm->arch.mmu_notifier_seq); + /* implicit mb(), we'll read before PT lock is unlocked */ pfn = gfn_to_pfn(vcpu->kvm, gfn); up_read(¤t->mm->mmap_sem); if (is_error_pfn(pfn)) { @@ -1375,12 +1486,22 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, return 1; } spin_lock(&vcpu->kvm->mmu_lock); + if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_count))) + goto out_unlock; + smp_rmb(); + if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_seq) != mmu_seq)) + goto out_unlock; kvm_mmu_free_some_pages(vcpu); r = __direct_map(vcpu, gpa, error_code & PFERR_WRITE_MASK, largepage, gfn, pfn, TDP_ROOT_LEVEL); spin_unlock(&vcpu->kvm->mmu_lock); return r; + +out_unlock: + spin_unlock(&vcpu->kvm->mmu_lock); + kvm_release_pfn_clean(pfn); + return 0; } static void nonpaging_free(struct kvm_vcpu *vcpu) @@ -1643,11 +1764,11 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, int r; u64 gpte = 0; pfn_t pfn; - - vcpu->arch.update_pte.largepage = 0; + int mmu_seq; + int largepage; if (bytes != 4 && bytes != 8) - return; + goto out_lock; /* * Assume that the pte write on a page table of the same type @@ -1660,7 +1781,7 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, if ((bytes == 4) && (gpa % 4 == 0)) { r = kvm_read_guest(vcpu->kvm, gpa & ~(u64)7, &gpte, 8); if (r) - return; + goto out_lock; memcpy((void *)&gpte + (gpa % 8), new, 4); } else if ((bytes == 8) && (gpa % 8 == 0)) { memcpy((void *)&gpte, new, 8); @@ -1670,23 +1791,35 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, memcpy((void *)&gpte, new, 4); } if (!is_present_pte(gpte)) - return; + goto out_lock; gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT; + largepage = 0; down_read(¤t->mm->mmap_sem); if (is_large_pte(gpte) && is_largepage_backed(vcpu, gfn)) { gfn &= ~(KVM_PAGES_PER_HPAGE-1); - vcpu->arch.update_pte.largepage = 1; + largepage = 1; } + mmu_seq = atomic_read(&vcpu->kvm->arch.mmu_notifier_seq); + /* implicit mb(), we'll read before PT lock is unlocked */ pfn = gfn_to_pfn(vcpu->kvm, gfn); up_read(¤t->mm->mmap_sem); - if (is_error_pfn(pfn)) { - kvm_release_pfn_clean(pfn); - return; - } + if (is_error_pfn(pfn)) + goto out_release_and_lock; + + spin_lock(&vcpu->kvm->mmu_lock); + BUG_ON(!is_error_pfn(vcpu->arch.update_pte.pfn)); vcpu->arch.update_pte.gfn = gfn; vcpu->arch.update_pte.pfn = pfn; + vcpu->arch.update_pte.largepage = largepage; + vcpu->arch.update_pte.mmu_seq = mmu_seq; + return; + +out_release_and_lock: + kvm_release_pfn_clean(pfn); +out_lock: + spin_lock(&vcpu->kvm->mmu_lock); } void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, @@ -1711,7 +1844,6 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes); mmu_guess_page_from_pte_write(vcpu, gpa, new, bytes); - spin_lock(&vcpu->kvm->mmu_lock); kvm_mmu_free_some_pages(vcpu); ++vcpu->kvm->stat.mmu_pte_write; kvm_mmu_audit(vcpu, "pre pte write"); @@ -1790,11 +1922,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, } } kvm_mmu_audit(vcpu, "post pte write"); - spin_unlock(&vcpu->kvm->mmu_lock); if (!is_error_pfn(vcpu->arch.update_pte.pfn)) { kvm_release_pfn_clean(vcpu->arch.update_pte.pfn); vcpu->arch.update_pte.pfn = bad_pfn; } + spin_unlock(&vcpu->kvm->mmu_lock); } int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 156fe10..4ac73a6 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -263,6 +263,12 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page, pfn = vcpu->arch.update_pte.pfn; if (is_error_pfn(pfn)) return; + if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_count))) + return; + smp_rmb(); + if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_seq) != + vcpu->arch.update_pte.mmu_seq)) + return; kvm_get_pfn(pfn); mmu_set_spte(vcpu, spte, page->role.access, pte_access, 0, 0, gpte & PT_DIRTY_MASK, NULL, largepage, gpte_to_gfn(gpte), @@ -380,6 +386,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, int r; pfn_t pfn; int largepage = 0; + int mmu_seq; pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code); kvm_mmu_audit(vcpu, "pre page fault"); @@ -413,6 +420,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, largepage = 1; } } + mmu_seq = atomic_read(&vcpu->kvm->arch.mmu_notifier_seq); + /* implicit mb(), we'll read before PT lock is unlocked */ pfn = gfn_to_pfn(vcpu->kvm, walker.gfn); up_read(¤t->mm->mmap_sem); @@ -424,6 +433,11 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, } spin_lock(&vcpu->kvm->mmu_lock); + if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_count))) + goto out_unlock; + smp_rmb(); + if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_seq) != mmu_seq)) + goto out_unlock; kvm_mmu_free_some_pages(vcpu); shadow_pte = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault, largepage, &write_pt, pfn); @@ -439,6 +453,11 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, spin_unlock(&vcpu->kvm->mmu_lock); return write_pt; + +out_unlock: + spin_unlock(&vcpu->kvm->mmu_lock); + kvm_release_pfn_clean(pfn); + return 0; } static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0ce5563..a026cb7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include @@ -3859,15 +3860,173 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) free_page((unsigned long)vcpu->arch.pio_data); } +static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn) +{ + struct kvm_arch *kvm_arch; + kvm_arch = container_of(mn, struct kvm_arch, mmu_notifier); + return container_of(kvm_arch, struct kvm, arch); +} + +static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long address) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + int need_tlb_flush; + + /* + * When ->invalidate_page runs, the linux pte has been zapped + * already but the page is still allocated until + * ->invalidate_page returns. So if we increase the sequence + * here the kvm page fault will notice if the spte can't be + * established because the page is going to be freed. If + * instead the kvm page fault establishes the spte before + * ->invalidate_page runs, kvm_unmap_hva will release it + * before returning. + + * No need of memory barriers as the sequence increase only + * need to be seen at spin_unlock time, and not at spin_lock + * time. + * + * Increasing the sequence after the spin_unlock would be + * unsafe because the kvm page fault could then establish the + * pte after kvm_unmap_hva returned, without noticing the page + * is going to be freed. + */ + atomic_inc(&kvm->arch.mmu_notifier_seq); + spin_lock(&kvm->mmu_lock); + need_tlb_flush = kvm_unmap_hva(kvm, address); + spin_unlock(&kvm->mmu_lock); + + /* we've to flush the tlb before the pages can be freed */ + if (need_tlb_flush) + kvm_flush_remote_tlbs(kvm); + +} + +static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + int need_tlb_flush = 0; + + /* + * The count increase must become visible at unlock time as no + * spte can be established without taking the mmu_lock and + * count is also read inside the mmu_lock critical section. + */ + atomic_inc(&kvm->arch.mmu_notifier_count); + + spin_lock(&kvm->mmu_lock); + for (; start < end; start += PAGE_SIZE) + need_tlb_flush |= kvm_unmap_hva(kvm, start); + spin_unlock(&kvm->mmu_lock); + + /* we've to flush the tlb before the pages can be freed */ + if (need_tlb_flush) + kvm_flush_remote_tlbs(kvm); +} + +static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + /* + * + * This sequence increase will notify the kvm page fault that + * the page that is going to be mapped in the spte could have + * been freed. + * + * There's also an implicit mb() here in this comment, + * provided by the last PT lock taken to zap pagetables, and + * that the read side has to take too in follow_page(). The + * sequence increase in the worst case will become visible to + * the kvm page fault after the spin_lock of the last PT lock + * of the last PT-lock-protected critical section preceeding + * invalidate_range_end. So if the kvm page fault is about to + * establish the spte inside the mmu_lock, while we're freeing + * the pages, it will have to backoff and when it retries, it + * will have to take the PT lock before it can check the + * pagetables again. And after taking the PT lock it will + * re-establish the pte even if it will see the already + * increased sequence number before calling gfn_to_pfn. + */ + atomic_inc(&kvm->arch.mmu_notifier_seq); + /* + * The sequence increase must be visible before count + * decrease. The page fault has to read count before sequence + * for this write order to be effective. + */ + wmb(); + atomic_dec(&kvm->arch.mmu_notifier_count); + BUG_ON(atomic_read(&kvm->arch.mmu_notifier_count) < 0); +} + +static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long address) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + int young; + + spin_lock(&kvm->mmu_lock); + young = kvm_age_hva(kvm, address); + spin_unlock(&kvm->mmu_lock); + + if (young) + kvm_flush_remote_tlbs(kvm); + + return young; +} + +static void kvm_free_vcpus(struct kvm *kvm); +/* This must zap all the sptes because all pages will be freed then */ +static void kvm_mmu_notifier_release(struct mmu_notifier *mn, + struct mm_struct *mm) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + BUG_ON(mm != kvm->mm); + + kvm_destroy_common_vm(kvm); + + kvm_free_pit(kvm); + kfree(kvm->arch.vpic); + kfree(kvm->arch.vioapic); + kvm_free_vcpus(kvm); + kvm_free_physmem(kvm); + if (kvm->arch.apic_access_page) + put_page(kvm->arch.apic_access_page); +} + +static const struct mmu_notifier_ops kvm_mmu_notifier_ops = { + .release = kvm_mmu_notifier_release, + .invalidate_page = kvm_mmu_notifier_invalidate_page, + .invalidate_range_start = kvm_mmu_notifier_invalidate_range_start, + .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end, + .clear_flush_young = kvm_mmu_notifier_clear_flush_young, +}; + struct kvm *kvm_arch_create_vm(void) { struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL); + int err; if (!kvm) return ERR_PTR(-ENOMEM); INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); + kvm->arch.mmu_notifier.ops = &kvm_mmu_notifier_ops; + err = mmu_notifier_register(&kvm->arch.mmu_notifier, current->mm); + if (err) { + kfree(kvm); + return ERR_PTR(err); + } + return kvm; } @@ -3899,13 +4058,12 @@ static void kvm_free_vcpus(struct kvm *kvm) void kvm_arch_destroy_vm(struct kvm *kvm) { - kvm_free_pit(kvm); - kfree(kvm->arch.vpic); - kfree(kvm->arch.vioapic); - kvm_free_vcpus(kvm); - kvm_free_physmem(kvm); - if (kvm->arch.apic_access_page) - put_page(kvm->arch.apic_access_page); + /* + * kvm_mmu_notifier_release() will be called before + * mmu_notifier_unregister returns, if it didn't run + * already. + */ + mmu_notifier_unregister(&kvm->arch.mmu_notifier, kvm->mm); kfree(kvm); } diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h index 9d963cd..7b8deea 100644 --- a/include/asm-x86/kvm_host.h +++ b/include/asm-x86/kvm_host.h @@ -13,6 +13,7 @@ #include #include +#include #include #include @@ -247,6 +248,7 @@ struct kvm_vcpu_arch { gfn_t gfn; /* presumed gfn during guest pte update */ pfn_t pfn; /* pfn corresponding to that gfn */ int largepage; + int mmu_seq; } update_pte; struct i387_fxsave_struct host_fx_image; @@ -314,6 +316,10 @@ struct kvm_arch{ struct page *apic_access_page; gpa_t wall_clock; + + struct mmu_notifier mmu_notifier; + atomic_t mmu_notifier_seq; + atomic_t mmu_notifier_count; }; struct kvm_vm_stat { @@ -434,6 +440,8 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu); int kvm_mmu_setup(struct kvm_vcpu *vcpu); void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte); +int kvm_unmap_hva(struct kvm *kvm, unsigned long hva); +int kvm_age_hva(struct kvm *kvm, unsigned long hva); int kvm_mmu_reset_context(struct kvm_vcpu *vcpu); void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot); void kvm_mmu_zap_all(struct kvm *kvm); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 4e16682..f089edc 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -267,6 +267,7 @@ void kvm_arch_check_processor_compat(void *rtn); int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu); void kvm_free_physmem(struct kvm *kvm); +void kvm_destroy_common_vm(struct kvm *kvm); struct kvm *kvm_arch_create_vm(void); void kvm_arch_destroy_vm(struct kvm *kvm); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f095b73..4beae7a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -231,15 +231,19 @@ void kvm_free_physmem(struct kvm *kvm) kvm_free_physmem_slot(&kvm->memslots[i], NULL); } -static void kvm_destroy_vm(struct kvm *kvm) +void kvm_destroy_common_vm(struct kvm *kvm) { - struct mm_struct *mm = kvm->mm; - spin_lock(&kvm_lock); list_del(&kvm->vm_list); spin_unlock(&kvm_lock); kvm_io_bus_destroy(&kvm->pio_bus); kvm_io_bus_destroy(&kvm->mmio_bus); +} + +static void kvm_destroy_vm(struct kvm *kvm) +{ + struct mm_struct *mm = kvm->mm; + kvm_arch_destroy_vm(kvm); mmdrop(mm); } From erezz at voltaire.com Sat Apr 26 23:20:55 2008 From: erezz at voltaire.com (Erez Zilber) Date: Sun, 27 Apr 2008 09:20:55 +0300 Subject: [ofa-general] [PATCH 1/1] RPM Spec files In-Reply-To: References: Message-ID: <48141B47.4080408@voltaire.com> Mike Heinz wrote: > Installation of OFED 1.3.0.0.4 onto a Kusu/OCS cluster does not fully > succeed because of some missing dependencies in the RPM spec files. This > is because Kusu installs nodes over a network by presenting a pool of > RPMs to be installed and letting RPM figure out the order to install > them in. Without the dependencies we ended up with oddities like the > kernel drivers being installed before the /usr/bin directory had been > populated, causing the install script to fail. > > I was able to work around this by manually expanding some of the source > RPM files, altering the spec file and repackaging the source RPM. This > allowed me to build binary RPMs (via the install script) that could be > installed on a Kusu cluster. > > Here are the proposed changes. If there is a better/preferred way of > submitting this suggestion, please let me know. > Some general comments: * OFED issues are discussed in the ewg list. You should send patches to that list. * You have patches for multiple git trees (bonding, open-iscsi etc). You should separate them to multiple patches. Each patch should have a separate e-mail message (and add the maintainer to the thread). The best thing to do is to create a patch set. * Please create the patches against the relevant git trees. It will make it easier to apply them. See more comments below. > > --- ../../original/ib-bonding.spec 2008-04-22 12:54:12.000000000 > -0400 > +++ ib-bonding.spec 2008-04-22 12:43:07.000000000 -0400 > @@ -20,6 +20,7 @@ > Group : Applications/System > License : GPL > BuildRoot: %{_tmppath}/%{name}-%{version}-root > +PreReq : coreutils > > %description > This package provides a bonding device which is capable of enslaving > --- ../../original/ofa_kernel.spec 2008-04-22 12:54:13.000000000 > -0400 > +++ ofa_kernel.spec 2008-04-22 12:45:40.000000000 -0400 > @@ -111,6 +111,9 @@ > BuildRequires: sysfsutils-devel > > %package -n kernel-ib > +PreReq: coreutils > +PreReq: kernel > +PreReq: pciutils > Version: %{_version} > Release: %{krelver} > Summary: Infiniband Driver and ULPs kernel modules > @@ -119,6 +122,10 @@ > Core, HW and ULPs kernel modules > > %package -n kernel-ib-devel > +PreReq: coreutils > +PreReq: kernel > +PreReq: pciutils > +Requires: kernel-ib > Version: %{_version} > Release: %{krelver} > Summary: Infiniband Driver and ULPs kernel modules sources > --- ../../original/open-iscsi-generic.spec 2008-04-22 > If this change is relevant for open-iscsi.git, it is also relevant for open-iscsi-rh4.git. BTW - you can see the list of git trees here: http://www.openfabrics.org/git/ Erez From vlad at dev.mellanox.co.il Sat Apr 26 23:35:45 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Sun, 27 Apr 2008 09:35:45 +0300 Subject: [ofa-general] install.sh question In-Reply-To: <1207688301.1661.86.camel@localhost> References: <1207688301.1661.86.camel@localhost> Message-ID: <48141EC1.7010801@dev.mellanox.co.il> Frank Leers wrote: > Hi all, > > I'd like to be able to use the provided install.sh from cluster nodes to > install from a build which is shared over nfs, while utilizing an > ofed_net.conf The Install Guide talks about this, but I must be missing > something in the detail. > > Is there a way to not check if a build needs to be (re)done and simply > install the rpm's that were created during the original build, then > create the ifcfg-ib? devices based on the template file passed in with > -net ? I prefer not to have kernel sources, compiler, > etc. on these compute nodes, nor should I have to recompile for each > homogeneous node. > > thanks, > > -frank > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > Hi Frank, install.sh checks if there are binary RPMS for all selected packages under OFED-x.x.x/RPMS directory. If you have created binary RPMs on one of the nodes (by install.sh script), then make sure that the OFED-x.x.x/ofed.conf file includes only these packages. Then run on all cluster nodes (no kernel sources, compilers, ... required on these nodes): > ./install.sh -c ofed.conf -net ofed_net.conf Note: If there are no RPMs for one or more of the packages selected (package_name=y)in the ofed.conf file then install.sh will run the RPM build process. Regards, Vladimir From admin at cnwhhk.com Sun Apr 27 00:33:32 2008 From: admin at cnwhhk.com (=?gb2312?B?gVfGtLKrqWfQobn5?=) Date: Sun, 27 Apr 2008 15:33:32 +0800 Subject: [ofa-general] =?gb2312?b?zuS6ur6pwte/xry8uPjE+rXEzfjVvr2oyejXysHP?= Message-ID: <20080427.CGHSQRYGJRWCKRPA@cnwhhk.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 武汉京伦科技开发有限公司网站建设资料.doc Type: application/msword Size: 194560 bytes Desc: not available URL: From vlad at dev.mellanox.co.il Sun Apr 27 01:07:11 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Sun, 27 Apr 2008 11:07:11 +0300 Subject: [ofa-general] [PATCH 1/1] RPM Spec files In-Reply-To: References: Message-ID: <4814342F.2050509@dev.mellanox.co.il> Mike Heinz wrote: ... > --- ../../original/ofa_kernel.spec 2008-04-22 12:54:13.000000000 > -0400 > +++ ofa_kernel.spec 2008-04-22 12:45:40.000000000 -0400 > @@ -111,6 +111,9 @@ > BuildRequires: sysfsutils-devel > > %package -n kernel-ib > +PreReq: coreutils > +PreReq: kernel > +PreReq: pciutils > Version: %{_version} > Release: %{krelver} > Summary: Infiniband Driver and ULPs kernel modules > @@ -119,6 +122,10 @@ > Core, HW and ULPs kernel modules > > %package -n kernel-ib-devel > +PreReq: coreutils > +PreReq: kernel > +PreReq: pciutils > +Requires: kernel-ib > Version: %{_version} > Release: %{krelver} > Summary: Infiniband Driver and ULPs kernel modules sources ... > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > Applied to ofa_kernel.spec. Regards, Vladimir From sashak at voltaire.com Sun Apr 27 04:38:01 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 27 Apr 2008 11:38:01 +0000 Subject: [ofa-general] madrpc_init and reseting performance counters In-Reply-To: <47FFCF16.6020302@isomerica.net> References: <200804101027456.SM08116@[66.94.32.4]> <1207837970.15625.626.camel@hrosenstock-ws.xsigo.com> <47FFCF16.6020302@isomerica.net> Message-ID: <20080427113801.GC22406@sashak.voltaire.com> Hi Dan, On 16:50 Fri 11 Apr , Dan Noe wrote: > > The solution Joel had mentioned was to use madrpc_init() and then call > port_performance_reset() to reset the port. But madrpc_init keeps a static > file descriptor (mad_portid) that is used for subsequent calls (such as is > eventually used when port_performance_reset() is called). And, there does > not seem to be any method to close this file descriptor. > > So, it is impossible to extend this method to multiple devices (or even > multiple ports). With a single call to madrpc_init one can perpetually > reset the performance counters in the polling loop but this approach > doesn't work with multiple devices. Why do you need to open multiple devices/ports? Are you using this tool for multiple IB subnets handling? > If madrpc_init is called more than > once, it leaks a file descriptor. Yes, madrpc_init() is old and it works in this way. There are newer mad_rpc_open_port() and mad_rpc_close_port() functions in libibmad which support multiple devices/ports you can use it instead. Sasha From ogerlitz at voltaire.com Sun Apr 27 01:47:54 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 27 Apr 2008 11:47:54 +0300 Subject: [PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting rereg bit (Was: Re: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup.) In-Reply-To: <20080424181657.28d58a29.weiny2@llnl.gov> References: <20080423133816.6c1b6315.weiny2@llnl.gov> <48109087.6030606@voltaire.com> <20080424143125.2aad1db8.weiny2@llnl.gov> <15ddcffd0804241523p19559580vc3a1293c1fe097b1@mail.gmail.com> <20080424181657.28d58a29.weiny2@llnl.gov> Message-ID: <48143DBA.3080701@voltaire.com> Ira Weiny wrote: > > I did not get any output with multicast_debug_level! why should you, as from the node's point of view nothing has happened (the exact param name is mcast_debug_level) > > Here is a patch which fixes the problem. (At least with the partial sub-nets > configuration I explained before.) I will have to verify this fixes the problem > I originally reported. OK, good. Does this problem exist in the released openSM? if yes, what would be the trigger for the SM to "really discover" (i.e do PortInfo SET) this sub-fabric and how much time would it take to reach this trigger, worst case wise? The failure configuration you have set to reproduce the problem is very untypical, I think. Since under common clos etc topologies which don't have a 1:n blocking nature, failure of such link would cause re-route etc by the SM which would not (and should not) be noted by the nodes (I hope I am not falling into another problem here...) Or. From sashak at voltaire.com Sun Apr 27 06:53:36 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 27 Apr 2008 13:53:36 +0000 Subject: [ofa-general] Re: [PATCH] opensm/configure.in: Fix the QOS and prefix routes config file default locations In-Reply-To: <20080422140601.64764e18.weiny2@llnl.gov> References: <20080422140601.64764e18.weiny2@llnl.gov> Message-ID: <20080427135336.GH22406@sashak.voltaire.com> On 14:06 Tue 22 Apr , Ira Weiny wrote: > From ef37654c0917875129fa2bad2e8ee0dd0d3f8859 Mon Sep 17 00:00:00 2001 > From: Ira K. Weiny > Date: Fri, 18 Apr 2008 15:51:58 -0700 > Subject: [PATCH] opensm/configure.in: Fix the QOS and prefix routes config file default > locations > > Signed-off-by: Ira K. Weiny Applied. Thanks. Sasha From andrea at qumranet.com Sun Apr 27 05:27:27 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Sun, 27 Apr 2008 14:27:27 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080426131734.GB19717@sgi.com> References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com> <20080423163713.GC24536@duo.random> <20080423221928.GV24536@duo.random> <20080424064753.GH24536@duo.random> <20080424095112.GC30298@sgi.com> <20080424153943.GJ24536@duo.random> <20080424174145.GM24536@duo.random> <20080426131734.GB19717@sgi.com> Message-ID: <20080427122727.GO9514@duo.random> On Sat, Apr 26, 2008 at 08:17:34AM -0500, Robin Holt wrote: > the first four sets. The fifth is the oversubscription test which trips > my xpmem bug. This is as good as the v12 runs from before. Now that mmu-notifier-core #v14 seems finished and hopefully will appear in 2.6.26 ;), I started exercising more the kvm-mmu-notifier code with the full patchset applied and not only with mmu-notifier-core. I soon found the full patchset has a swap deadlock bug. Then I tried without using kvm (so with mmu notifier disarmed) and I could still reproduce the crashes. After grabbing a few stack traces I tracked it down to a bug in the i_mmap_lock->i_mmap_sem conversion. If you oversubscription means swapping, you should retest with this applied on #v14 i_mmap_sem patch as it would eventually deadlock with all tasks allocating memory in D state without this. Now the full patchset is as rock solid as with only mmu-notifier-core applied. It's swapping 2G memhog on top of a 3G VM with 2G of ram for the last hours without a problem. Everything is working great with KVM at least. Talking about post 2.6.26: the refcount with rcu in the anon-vma conversion seems unnecessary and may explain part of the AIM slowdown too. The rest looks ok and probably we should switch the code to a compile-time decision between rwlock and rwsem (so obsoleting the current spinlock). diff --git a/mm/rmap.c b/mm/rmap.c --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1008,7 +1008,7 @@ static int try_to_unmap_file(struct page list_for_each_entry(vma, &mapping->i_mmap_nonlinear, shared.vm_set.list) vma->vm_private_data = NULL; out: - up_write(&mapping->i_mmap_sem); + up_read(&mapping->i_mmap_sem); return ret; } From dorfman.eli at gmail.com Sun Apr 27 05:49:33 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Sun, 27 Apr 2008 15:49:33 +0300 Subject: [ofa-general] [PATCH 0/2] IB/iSER: Calculating the VA in iSER header Message-ID: <694d48600804270549p1945a618t9ff3aac21c9f6114@mail.gmail.com> The following patch set includes a bug fix for the VA value in the iSER header. The current value is incorrect according to the iSER spec. This patch set includes a bug fix for the initiator code that was made against the 2.6.26 branch and a fix for the iSER code in STGT. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dorfman.eli at gmail.com Sun Apr 27 05:53:19 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Sun, 27 Apr 2008 15:53:19 +0300 Subject: [ofa-general] [PATCH 1/2] IB/iSER: Do not add unsolicited data offset to VA in iSER header Message-ID: <694d48600804270553u36b776ame9695a8858dd278@mail.gmail.com> iSER initiator sends a VA (in the iSER header) which includes an offset for the unsolicited data (which is wrong according to the spec). Signed-off-by: Eli Dorfman Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/iser_initiator.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index 08dc81c..5c2bbc6 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -154,12 +154,12 @@ iser_prepare_write_cmd(struct iscsi_cmd_task *ctask, if (unsol_sz < edtl) { hdr->flags |= ISER_WSV; hdr->write_stag = cpu_to_be32(regd_buf->reg.rkey); - hdr->write_va = cpu_to_be64(regd_buf->reg.va + unsol_sz); + hdr->write_va = cpu_to_be64(regd_buf->reg.va); iser_dbg("Cmd itt:%d, WRITE tags, RKEY:%#.4X " - "VA:%#llX + unsol:%d\n", + "VA:%#llX\n", ctask->itt, regd_buf->reg.rkey, - (unsigned long long)regd_buf->reg.va, unsol_sz); + (unsigned long long)regd_buf->reg.va); } if (imm_sz > 0) { -- 1.5.5 From dorfman.eli at gmail.com Sun Apr 27 05:55:00 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Sun, 27 Apr 2008 15:55:00 +0300 Subject: [ofa-general] [PATCH 2/2] IB/iSER: Use offset from r2t header for rdma Message-ID: <694d48600804270555i6ee55843x51c416294fec6397@mail.gmail.com> Use offset from r2t header for rdma instead of using internal offset counter. Signed-off-by: Eli Dorfman --- usr/iscsi/iscsi_rdma.c | 16 +++++----------- 1 files changed, 5 insertions(+), 11 deletions(-) diff --git a/usr/iscsi/iscsi_rdma.c b/usr/iscsi/iscsi_rdma.c index d46ddff..84f5949 100644 --- a/usr/iscsi/iscsi_rdma.c +++ b/usr/iscsi/iscsi_rdma.c @@ -1447,28 +1447,22 @@ static int iscsi_rdma_rdma_read(struct iscsi_connection *conn) struct iscsi_r2t_rsp *r2t = (struct iscsi_r2t_rsp *) &conn->rsp.bhs; uint8_t *buf; uint32_t len; + uint32_t offset; int ret; buf = (uint8_t *) task->data + task->offset; len = be32_to_cpu(r2t->data_length); + offset = be32_to_cpu(r2t->data_offset); - dprintf("len %u stag %x va %llx\n", + dprintf("len %u stag %x va %llx offset %x\n", len, itask->rem_write_stag, - (unsigned long long) itask->rem_write_va); + (unsigned long long) itask->rem_write_va, offset); ret = iser_post_rdma_wr(ci, task, buf, len, IBV_WR_RDMA_READ, - itask->rem_write_va, itask->rem_write_stag); + itask->rem_write_va + offset, itask->rem_write_stag); if (ret < 0) return ret; - /* - * Initiator registers the entire buffer, but gives us a VA that - * is advanced by immediate + unsolicited data amounts. Advance - * rem_va as we read, knowing that the target always grabs segments - * in order. - */ - itask->rem_write_va += len; - return 0; } -- 1.5.5 From sashak at voltaire.com Sun Apr 27 10:11:40 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 27 Apr 2008 17:11:40 +0000 Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup. In-Reply-To: <20080423133816.6c1b6315.weiny2@llnl.gov> References: <20080423133816.6c1b6315.weiny2@llnl.gov> Message-ID: <20080427171140.GI22406@sashak.voltaire.com> Hi Ira, On 13:38 Wed 23 Apr , Ira Weiny wrote: > > The symptom is that nodes drop out of the IPoIB mcast group after a node > temporarily goes catatonic. The details are: > > 1) Issues on a node cause a soft lockup of the node. > 2) OpenSM does a normal light sweep. > 3) MADs to the node time out since the node is in a "bad state" Normally during light sweep OpenSM will not query nodes. I think OpenSM should not detect such soft lockup unless ib link state was changed and heavy sweep was triggered. Is this the case? > 4) OpenSM marks the node down and drops it from internal tables, including > mcast groups. > 5) Node recovers from soft lock up condition. > 6) A subsequent sweep causes OpenSM see the node and add it back to the > fabric. > 7) Node is fully functional on the verbs layer but IPoIB never knew anything > was wrong so it does _not_ rejoin the mcast groups. (This is different > from the condition where the link actually goes down.) If my approach above is correct it should be same as port down/up handling. And as was noted already in this thread OpenSM should ask for reregistration (by setting client reregistration bit). I see your patch - seems this part is buggy in OpenSM now, will see closer to this. Sasha From tziporet at dev.mellanox.co.il Sun Apr 27 07:17:09 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Sun, 27 Apr 2008 17:17:09 +0300 Subject: [ewg] Re: [ofa-general] Agenda for the OFED meeting today In-Reply-To: <480E43C0.6080107@opengridcomputing.com> References: <6C2C79E72C305246B504CBA17B5500C903D375E4@mtlexch01.mtl.com> <480E43C0.6080107@opengridcomputing.com> Message-ID: <48148AE5.4020801@mellanox.co.il> Steve Wise wrote: > >> Note: daily builds of 1.3.1 are already available at: >> _http://www.openfabrics.org/builds/ofed-1.3.1_ >> > > Is there a new git repos for the 1.3.1 kernel? Or just using the 1.3 > repos? > We use the same git tree as for 1.3 Tziporet From swise at opengridcomputing.com Sun Apr 27 08:54:56 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Sun, 27 Apr 2008 10:54:56 -0500 Subject: [ofa-general] [PATCH 2.6.26 0/3] RDMA/cxgb3: fixes and enhancements for 2.6.26 Message-ID: <20080427155456.31018.22282.stgit@dell3.ogc.int> The following series fixes some bugs as well as enabling peer-2-peer applications including OpenMPI and HPMPI. I hope this can make 2.6.26. NOTE: The changes in patch 3 require a new firmware version. I added the version change to drivers/net/cxgb3/version.h in this patch so that the changes that require the new firmware as well as the version bump are all in one git commit. This keeps things like 'git bisect' from leaving the driver broken. -- Steve. From swise at opengridcomputing.com Sun Apr 27 09:00:06 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Sun, 27 Apr 2008 11:00:06 -0500 Subject: [ofa-general] [PATCH 2.6.26 1/3] RDMA/cxgb3: Correctly serialize peer abort path. In-Reply-To: <20080427155456.31018.22282.stgit@dell3.ogc.int> References: <20080427155456.31018.22282.stgit@dell3.ogc.int> Message-ID: <20080427160006.31018.66715.stgit@dell3.ogc.int> OpenMPI and other stress testing exposed a few bad bugs in handling aborts in the middle of a normal close. - serialize abort reply and peer abort processing with disconnect processing - warn (and ignore) if ep timer is stopped when it wasn't running - cleaned up disconnect path to correctly deal with aborting and dead endpoints - in iwch_modify_qp(), add a ref to the ep before releasing the qp lock if iwch_ep_disconnect() will be called. The dref after calling disconnect. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_cm.c | 98 ++++++++++++++++++++++----------- drivers/infiniband/hw/cxgb3/iwch_cm.h | 1 drivers/infiniband/hw/cxgb3/iwch_qp.c | 6 ++ 3 files changed, 71 insertions(+), 34 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index 99f2f2a..1627bff 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -125,6 +125,12 @@ static void start_ep_timer(struct iwch_ep *ep) static void stop_ep_timer(struct iwch_ep *ep) { PDBG("%s ep %p\n", __FUNCTION__, ep); + if (!timer_pending(&ep->timer)) { + printk(KERN_ERR "%s timer stopped when its not running! ep %p state %u\n", + __FUNCTION__, ep, ep->com.state); + WARN_ON(1); + return; + } del_timer_sync(&ep->timer); put_ep(&ep->com); } @@ -1083,8 +1089,11 @@ static int tx_ack(struct t3cdev *tdev, struct sk_buff *skb, void *ctx) static int abort_rpl(struct t3cdev *tdev, struct sk_buff *skb, void *ctx) { struct iwch_ep *ep = ctx; + unsigned long flags; + int release = 0; PDBG("%s ep %p\n", __FUNCTION__, ep); + BUG_ON(!ep); /* * We get 2 abort replies from the HW. The first one must @@ -1095,9 +1104,22 @@ static int abort_rpl(struct t3cdev *tdev, struct sk_buff *skb, void *ctx) return CPL_RET_BUF_DONE; } - close_complete_upcall(ep); - state_set(&ep->com, DEAD); - release_ep_resources(ep); + spin_lock_irqsave(&ep->com.lock, flags); + switch (ep->com.state) { + case ABORTING: + close_complete_upcall(ep); + __state_set(&ep->com, DEAD); + release = 1; + break; + default: + printk(KERN_ERR "%s ep %p state %d\n", + __FUNCTION__, ep, ep->com.state); + break; + } + spin_unlock_irqrestore(&ep->com.lock, flags); + + if (release) + release_ep_resources(ep); return CPL_RET_BUF_DONE; } @@ -1470,7 +1492,8 @@ static int peer_abort(struct t3cdev *tdev, struct sk_buff *skb, void *ctx) struct sk_buff *rpl_skb; struct iwch_qp_attributes attrs; int ret; - int state; + int release = 0; + unsigned long flags; if (is_neg_adv_abort(req->status)) { PDBG("%s neg_adv_abort ep %p tid %d\n", __FUNCTION__, ep, @@ -1488,9 +1511,9 @@ static int peer_abort(struct t3cdev *tdev, struct sk_buff *skb, void *ctx) return CPL_RET_BUF_DONE; } - state = state_read(&ep->com); - PDBG("%s ep %p state %u\n", __FUNCTION__, ep, state); - switch (state) { + spin_lock_irqsave(&ep->com.lock, flags); + PDBG("%s ep %p state %u\n", __FUNCTION__, ep, ep->com.state); + switch (ep->com.state) { case CONNECTING: break; case MPA_REQ_WAIT: @@ -1536,21 +1559,25 @@ static int peer_abort(struct t3cdev *tdev, struct sk_buff *skb, void *ctx) break; case DEAD: PDBG("%s PEER_ABORT IN DEAD STATE!!!!\n", __FUNCTION__); + spin_unlock_irqrestore(&ep->com.lock, flags); return CPL_RET_BUF_DONE; default: BUG_ON(1); break; } dst_confirm(ep->dst); + if (ep->com.state != ABORTING) { + __state_set(&ep->com, DEAD); + release = 1; + } + spin_unlock_irqrestore(&ep->com.lock, flags); rpl_skb = get_skb(skb, sizeof(*rpl), GFP_KERNEL); if (!rpl_skb) { printk(KERN_ERR MOD "%s - cannot allocate skb!\n", __FUNCTION__); - dst_release(ep->dst); - l2t_release(L2DATA(ep->com.tdev), ep->l2t); - put_ep(&ep->com); - return CPL_RET_BUF_DONE; + release = 1; + goto out; } rpl_skb->priority = CPL_PRIORITY_DATA; rpl = (struct cpl_abort_rpl *) skb_put(rpl_skb, sizeof(*rpl)); @@ -1559,10 +1586,9 @@ static int peer_abort(struct t3cdev *tdev, struct sk_buff *skb, void *ctx) OPCODE_TID(rpl) = htonl(MK_OPCODE_TID(CPL_ABORT_RPL, ep->hwtid)); rpl->cmd = CPL_ABORT_NO_RST; cxgb3_ofld_send(ep->com.tdev, rpl_skb); - if (state != ABORTING) { - state_set(&ep->com, DEAD); +out: + if (release) release_ep_resources(ep); - } return CPL_RET_BUF_DONE; } @@ -1661,15 +1687,18 @@ static void ep_timeout(unsigned long arg) struct iwch_ep *ep = (struct iwch_ep *)arg; struct iwch_qp_attributes attrs; unsigned long flags; + int abort=1; spin_lock_irqsave(&ep->com.lock, flags); PDBG("%s ep %p tid %u state %d\n", __FUNCTION__, ep, ep->hwtid, ep->com.state); switch (ep->com.state) { case MPA_REQ_SENT: + __state_set(&ep->com, ABORTING); connect_reply_upcall(ep, -ETIMEDOUT); break; case MPA_REQ_WAIT: + __state_set(&ep->com, ABORTING); break; case CLOSING: case MORIBUND: @@ -1679,13 +1708,17 @@ static void ep_timeout(unsigned long arg) ep->com.qp, IWCH_QP_ATTR_NEXT_STATE, &attrs, 1); } + __state_set(&ep->com, ABORTING); break; default: - BUG(); + printk(KERN_ERR "%s unexpected state ep %p state %u\n", + __FUNCTION__, ep, ep->com.state); + WARN_ON(1); + abort=0; } - __state_set(&ep->com, CLOSING); spin_unlock_irqrestore(&ep->com.lock, flags); - abort_connection(ep, NULL, GFP_ATOMIC); + if (abort) + abort_connection(ep, NULL, GFP_ATOMIC); put_ep(&ep->com); } @@ -1968,34 +2001,33 @@ int iwch_ep_disconnect(struct iwch_ep *ep, int abrupt, gfp_t gfp) PDBG("%s ep %p state %s, abrupt %d\n", __FUNCTION__, ep, states[ep->com.state], abrupt); - if (ep->com.state == DEAD) { - PDBG("%s already dead ep %p\n", __FUNCTION__, ep); - goto out; - } - - if (abrupt) { - if (ep->com.state != ABORTING) { - ep->com.state = ABORTING; - close = 1; - } - goto out; - } - switch (ep->com.state) { case MPA_REQ_WAIT: case MPA_REQ_SENT: case MPA_REQ_RCVD: case MPA_REP_SENT: case FPDU_MODE: - start_ep_timer(ep); - ep->com.state = CLOSING; close = 1; + if (abrupt) + ep->com.state = ABORTING; + else { + ep->com.state = CLOSING; + start_ep_timer(ep); + } break; case CLOSING: - ep->com.state = MORIBUND; close = 1; + if (abrupt) { + stop_ep_timer(ep); + ep->com.state = ABORTING; + } else + ep->com.state = MORIBUND; break; case MORIBUND: + case ABORTING: + case DEAD: + PDBG("%s ignoring disconnect ep %p state %u\n", + __FUNCTION__, ep, ep->com.state); break; default: BUG(); diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.h b/drivers/infiniband/hw/cxgb3/iwch_cm.h index 6107e7c..a3fb959 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.h +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.h @@ -56,6 +56,7 @@ #define put_ep(ep) { \ PDBG("put_ep (via %s:%u) ep %p refcnt %d\n", __FUNCTION__, __LINE__, \ ep, atomic_read(&((ep)->kref.refcount))); \ + WARN_ON(atomic_read(&((ep)->kref.refcount)) < 1); \ kref_put(&((ep)->kref), __free_ep); \ } diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index ea2cdd7..c02bb94 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -832,6 +832,7 @@ int iwch_modify_qp(struct iwch_dev *rhp, struct iwch_qp *qhp, abort=0; disconnect = 1; ep = qhp->ep; + get_ep(&ep->com); } flush_qp(qhp, &flag); break; @@ -848,6 +849,7 @@ int iwch_modify_qp(struct iwch_dev *rhp, struct iwch_qp *qhp, abort=1; disconnect = 1; ep = qhp->ep; + get_ep(&ep->com); } goto err; break; @@ -929,8 +931,10 @@ out: * on the EP. This can be a normal close (RTS->CLOSING) or * an abnormal close (RTS/CLOSING->ERROR). */ - if (disconnect) + if (disconnect) { iwch_ep_disconnect(ep, abort, GFP_KERNEL); + put_ep(&ep->com); + } /* * If free is 1, then we've disassociated the EP from the QP From swise at opengridcomputing.com Sun Apr 27 09:00:08 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Sun, 27 Apr 2008 11:00:08 -0500 Subject: [ofa-general] [PATCH 2.6.26 2/3] RDMA/cxgb3: Correctly set the max_mr_size device attribute. In-Reply-To: <20080427155456.31018.22282.stgit@dell3.ogc.int> References: <20080427155456.31018.22282.stgit@dell3.ogc.int> Message-ID: <20080427160008.31018.15516.stgit@dell3.ogc.int> cxgb3 only supports 4GB memory regions. The lustre RDMA code uses this attribute and currently has to code around our bad setting. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/cxio_hal.h | 1 + drivers/infiniband/hw/cxgb3/iwch.c | 1 + drivers/infiniband/hw/cxgb3/iwch.h | 1 + drivers/infiniband/hw/cxgb3/iwch_provider.c | 2 +- 4 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.h b/drivers/infiniband/hw/cxgb3/cxio_hal.h index 99543d6..2bcff7f 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_hal.h +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.h @@ -53,6 +53,7 @@ #define T3_MAX_PBL_SIZE 256 #define T3_MAX_RQ_SIZE 1024 #define T3_MAX_NUM_STAG (1<<15) +#define T3_MAX_MR_SIZE 0x100000000ULL #define T3_STAG_UNSET 0xffffffff diff --git a/drivers/infiniband/hw/cxgb3/iwch.c b/drivers/infiniband/hw/cxgb3/iwch.c index 0315c9d..98a768f 100644 --- a/drivers/infiniband/hw/cxgb3/iwch.c +++ b/drivers/infiniband/hw/cxgb3/iwch.c @@ -83,6 +83,7 @@ static void rnic_init(struct iwch_dev *rnicp) rnicp->attr.max_phys_buf_entries = T3_MAX_PBL_SIZE; rnicp->attr.max_pds = T3_MAX_NUM_PD - 1; rnicp->attr.mem_pgsizes_bitmask = 0x7FFF; /* 4KB-128MB */ + rnicp->attr.max_mr_size = T3_MAX_MR_SIZE; rnicp->attr.can_resize_wq = 0; rnicp->attr.max_rdma_reads_per_qp = 8; rnicp->attr.max_rdma_read_resources = diff --git a/drivers/infiniband/hw/cxgb3/iwch.h b/drivers/infiniband/hw/cxgb3/iwch.h index caf4e60..238c103 100644 --- a/drivers/infiniband/hw/cxgb3/iwch.h +++ b/drivers/infiniband/hw/cxgb3/iwch.h @@ -66,6 +66,7 @@ struct iwch_rnic_attributes { * size (4k)^i. Phys block list mode unsupported. */ u32 mem_pgsizes_bitmask; + u64 max_mr_size; u8 can_resize_wq; /* diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index b2ea921..f7df213 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -998,7 +998,7 @@ static int iwch_query_device(struct ib_device *ibdev, props->device_cap_flags = dev->device_cap_flags; props->vendor_id = (u32)dev->rdev.rnic_info.pdev->vendor; props->vendor_part_id = (u32)dev->rdev.rnic_info.pdev->device; - props->max_mr_size = ~0ull; + props->max_mr_size = dev->attr.max_mr_size; props->max_qp = dev->attr.max_qps; props->max_qp_wr = dev->attr.max_wrs; props->max_sge = dev->attr.max_sge_per_wr; From swise at opengridcomputing.com Sun Apr 27 09:00:10 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Sun, 27 Apr 2008 11:00:10 -0500 Subject: [ofa-general] [PATCH 2.6.26 3/3] RDMA/cxgb3: Support peer-2-peer connection setup. In-Reply-To: <20080427155456.31018.22282.stgit@dell3.ogc.int> References: <20080427155456.31018.22282.stgit@dell3.ogc.int> Message-ID: <20080427160010.31018.67436.stgit@dell3.ogc.int> Open MPI, Intel MPI and other applications don't support the iWARP requirement that the client side send the first RDMA message. This class of application connection setup is called peer-2-peer. Typically once the connection is setup, _both_ sides want to send data. This patch enables supporting peer-2-peer over the chelsio rnic by enforcing this iWARP requirement in the driver itself as part of RDMA connection setup. Connection setup is extended, when peer2peer is 1, such that the MPA initiator will send a 0B Read (the RTR) just after connection setup. The MPA responder will suspend SQ processing until the RTR message is received and reply-to. Design: - Add a module option, peer2peer, to enable this mode. - New firmware support for peer-2-peer mode: - a new bits in the rdma_init WR to tell it to do peer-2-peer and what form of RTR message to send or expect. - process _all_ preposted recvs before moving the connection into rdma mode. - passive side: defer completing the rdma_init WR until all pre-posted recvs are processed. Suspend SQ processing until the RTR is received. - active side: expect and process the 0B read WR on offload tx queue. Defer completing the rdma_init WR until all pre-posted recvs are processed. Suspend SQ processing until the 0B read WR is processed from the offload tx queue. - If peer2peer is set, driver posts 0B read request on offload tx queue just after posting the rdma_init wr to the offload tx queue. - Add cq poll logic to ignore unsolicitied read responses. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/cxio_hal.c | 18 ++++++- drivers/infiniband/hw/cxgb3/cxio_wr.h | 21 +++++++- drivers/infiniband/hw/cxgb3/iwch_cm.c | 68 +++++++++++++++++++-------- drivers/infiniband/hw/cxgb3/iwch_cm.h | 1 drivers/infiniband/hw/cxgb3/iwch_provider.h | 3 + drivers/infiniband/hw/cxgb3/iwch_qp.c | 54 ++++++++++++++++++++- drivers/net/cxgb3/version.h | 2 - 7 files changed, 137 insertions(+), 30 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c index 03c5ff6..3de0fbf 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_hal.c +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c @@ -456,7 +456,8 @@ void cxio_count_scqes(struct t3_cq *cq, struct t3_wq *wq, int *count) ptr = cq->sw_rptr; while (!Q_EMPTY(ptr, cq->sw_wptr)) { cqe = cq->sw_queue + (Q_PTR2IDX(ptr, cq->size_log2)); - if ((SQ_TYPE(*cqe) || (CQE_OPCODE(*cqe) == T3_READ_RESP)) && + if ((SQ_TYPE(*cqe) || + ((CQE_OPCODE(*cqe) == T3_READ_RESP) && wq->oldest_read)) && (CQE_QPID(*cqe) == wq->qpid)) (*count)++; ptr++; @@ -829,7 +830,8 @@ int cxio_rdma_init(struct cxio_rdev *rdev_p, struct t3_rdma_init_attr *attr) wqe->mpaattrs = attr->mpaattrs; wqe->qpcaps = attr->qpcaps; wqe->ulpdu_size = cpu_to_be16(attr->tcp_emss); - wqe->flags = cpu_to_be32(attr->flags); + wqe->rqe_count = cpu_to_be16(attr->rqe_count); + wqe->flags_rtr_type = cpu_to_be16(attr->flags|V_RTR_TYPE(attr->rtr_type)); wqe->ord = cpu_to_be32(attr->ord); wqe->ird = cpu_to_be32(attr->ird); wqe->qp_dma_addr = cpu_to_be64(attr->qp_dma_addr); @@ -1135,6 +1137,18 @@ int cxio_poll_cq(struct t3_wq *wq, struct t3_cq *cq, struct t3_cqe *cqe, if (RQ_TYPE(*hw_cqe) && (CQE_OPCODE(*hw_cqe) == T3_READ_RESP)) { /* + * If this is an unsolicited read response, then the read + * was generated by the kernel driver as part of peer-2-peer + * connection setup. So ignore the completion. + */ + if (!wq->oldest_read) { + if (CQE_STATUS(*hw_cqe)) + wq->error = 1; + ret = -1; + goto skip_cqe; + } + + /* * Don't write to the HWCQ, so create a new read req CQE * in local memory. */ diff --git a/drivers/infiniband/hw/cxgb3/cxio_wr.h b/drivers/infiniband/hw/cxgb3/cxio_wr.h index 969d4d9..f1a25a8 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_wr.h +++ b/drivers/infiniband/hw/cxgb3/cxio_wr.h @@ -278,6 +278,17 @@ enum t3_qp_caps { uP_RI_QP_STAG0_ENABLE = 0x10 } __attribute__ ((packed)); +enum rdma_init_rtr_types { + RTR_READ = 1, + RTR_WRITE = 2, + RTR_SEND = 3, +}; + +#define S_RTR_TYPE 2 +#define M_RTR_TYPE 0x3 +#define V_RTR_TYPE(x) ((x) << S_RTR_TYPE) +#define G_RTR_TYPE(x) ((((x) >> S_RTR_TYPE)) & M_RTR_TYPE) + struct t3_rdma_init_attr { u32 tid; u32 qpid; @@ -293,7 +304,9 @@ struct t3_rdma_init_attr { u32 ird; u64 qp_dma_addr; u32 qp_dma_size; - u32 flags; + enum rdma_init_rtr_types rtr_type; + u16 flags; + u16 rqe_count; u32 irs; }; @@ -309,8 +322,8 @@ struct t3_rdma_init_wr { u8 mpaattrs; /* 5 */ u8 qpcaps; __be16 ulpdu_size; - __be32 flags; /* bits 31-1 - reservered */ - /* bit 0 - set if RECV posted */ + __be16 flags_rtr_type; + __be16 rqe_count; __be32 ord; /* 6 */ __be32 ird; __be64 qp_dma_addr; /* 7 */ @@ -324,7 +337,7 @@ struct t3_genbit { }; enum rdma_init_wr_flags { - RECVS_POSTED = (1<<0), + MPA_INITIATOR = (1<<0), PRIV_QP = (1<<1), }; diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index 1627bff..f4f3c9e 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -63,6 +63,10 @@ static char *states[] = { NULL, }; +int peer2peer = 0; +module_param(peer2peer, int, 0644); +MODULE_PARM_DESC(peer2peer, "Support peer2peer ULPs (default=0)"); + static int ep_timeout_secs = 10; module_param(ep_timeout_secs, int, 0644); MODULE_PARM_DESC(ep_timeout_secs, "CM Endpoint operation timeout " @@ -514,7 +518,7 @@ static void send_mpa_req(struct iwch_ep *ep, struct sk_buff *skb) skb_reset_transport_header(skb); len = skb->len; req = (struct tx_data_wr *) skb_push(skb, sizeof(*req)); - req->wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_TX_DATA)); + req->wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_TX_DATA)|F_WR_COMPL); req->wr_lo = htonl(V_WR_TID(ep->hwtid)); req->len = htonl(len); req->param = htonl(V_TX_PORT(ep->l2t->smt_idx) | @@ -565,7 +569,7 @@ static int send_mpa_reject(struct iwch_ep *ep, const void *pdata, u8 plen) set_arp_failure_handler(skb, arp_failure_discard); skb_reset_transport_header(skb); req = (struct tx_data_wr *) skb_push(skb, sizeof(*req)); - req->wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_TX_DATA)); + req->wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_TX_DATA)|F_WR_COMPL); req->wr_lo = htonl(V_WR_TID(ep->hwtid)); req->len = htonl(mpalen); req->param = htonl(V_TX_PORT(ep->l2t->smt_idx) | @@ -617,7 +621,7 @@ static int send_mpa_reply(struct iwch_ep *ep, const void *pdata, u8 plen) skb_reset_transport_header(skb); len = skb->len; req = (struct tx_data_wr *) skb_push(skb, sizeof(*req)); - req->wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_TX_DATA)); + req->wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_TX_DATA)|F_WR_COMPL); req->wr_lo = htonl(V_WR_TID(ep->hwtid)); req->len = htonl(len); req->param = htonl(V_TX_PORT(ep->l2t->smt_idx) | @@ -885,6 +889,7 @@ static void process_mpa_reply(struct iwch_ep *ep, struct sk_buff *skb) * the MPA header is valid. */ state_set(&ep->com, FPDU_MODE); + ep->mpa_attr.initiator = 1; ep->mpa_attr.crc_enabled = (mpa->flags & MPA_CRC) | crc_enabled ? 1 : 0; ep->mpa_attr.recv_marker_enabled = markers_enabled; ep->mpa_attr.xmit_marker_enabled = mpa->flags & MPA_MARKERS ? 1 : 0; @@ -907,8 +912,14 @@ static void process_mpa_reply(struct iwch_ep *ep, struct sk_buff *skb) /* bind QP and TID with INIT_WR */ err = iwch_modify_qp(ep->com.qp->rhp, ep->com.qp, mask, &attrs, 1); - if (!err) - goto out; + if (err) + goto err; + + if (peer2peer && iwch_rqes_posted(ep->com.qp) == 0) { + iwch_post_zb_read(ep->com.qp); + } + + goto out; err: abort_connection(ep, skb, GFP_KERNEL); out: @@ -1001,6 +1012,7 @@ static void process_mpa_request(struct iwch_ep *ep, struct sk_buff *skb) * If we get here we have accumulated the entire mpa * start reply message including private data. */ + ep->mpa_attr.initiator = 0; ep->mpa_attr.crc_enabled = (mpa->flags & MPA_CRC) | crc_enabled ? 1 : 0; ep->mpa_attr.recv_marker_enabled = markers_enabled; ep->mpa_attr.xmit_marker_enabled = mpa->flags & MPA_MARKERS ? 1 : 0; @@ -1071,17 +1083,33 @@ static int tx_ack(struct t3cdev *tdev, struct sk_buff *skb, void *ctx) PDBG("%s ep %p credits %u\n", __FUNCTION__, ep, credits); - if (credits == 0) + if (credits == 0) { + PDBG(KERN_ERR "%s 0 credit ack ep %p state %u\n", + __FUNCTION__, ep, state_read(&ep->com)); return CPL_RET_BUF_DONE; + } + BUG_ON(credits != 1); - BUG_ON(ep->mpa_skb == NULL); - kfree_skb(ep->mpa_skb); - ep->mpa_skb = NULL; dst_confirm(ep->dst); - if (state_read(&ep->com) == MPA_REP_SENT) { - ep->com.rpl_done = 1; - PDBG("waking up ep %p\n", ep); - wake_up(&ep->com.waitq); + if (!ep->mpa_skb) { + PDBG("%s rdma_init wr_ack ep %p state %u\n", + __FUNCTION__, ep, state_read(&ep->com)); + if (ep->mpa_attr.initiator) { + PDBG("%s initiator ep %p state %u\n", + __FUNCTION__, ep, state_read(&ep->com)); + if (peer2peer) + iwch_post_zb_read(ep->com.qp); + } else { + PDBG("%s responder ep %p state %u\n", + __FUNCTION__, ep, state_read(&ep->com)); + ep->com.rpl_done = 1; + wake_up(&ep->com.waitq); + } + } else { + PDBG("%s lsm ack ep %p state %u freeing skb\n", + __FUNCTION__, ep, state_read(&ep->com)); + kfree_skb(ep->mpa_skb); + ep->mpa_skb = NULL; } return CPL_RET_BUF_DONE; } @@ -1795,16 +1823,19 @@ int iwch_accept_cr(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) if (err) goto err; + /* if needed, wait for wr_ack */ + if (iwch_rqes_posted(qp)) { + wait_event(ep->com.waitq, ep->com.rpl_done); + err = ep->com.rpl_err; + if (err) + goto err; + } + err = send_mpa_reply(ep, conn_param->private_data, conn_param->private_data_len); if (err) goto err; - /* wait for wr_ack */ - wait_event(ep->com.waitq, ep->com.rpl_done); - err = ep->com.rpl_err; - if (err) - goto err; state_set(&ep->com, FPDU_MODE); established_upcall(ep); @@ -2033,7 +2064,6 @@ int iwch_ep_disconnect(struct iwch_ep *ep, int abrupt, gfp_t gfp) BUG(); break; } -out: spin_unlock_irqrestore(&ep->com.lock, flags); if (close) { if (abrupt) diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.h b/drivers/infiniband/hw/cxgb3/iwch_cm.h index a3fb959..c0978a8 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.h +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.h @@ -226,5 +226,6 @@ int iwch_ep_redirect(void *ctx, struct dst_entry *old, struct dst_entry *new, st int __init iwch_cm_init(void); void __exit iwch_cm_term(void); +extern int peer2peer; #endif /* _IWCH_CM_H_ */ diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h index 48833f3..ad77f05 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h @@ -118,6 +118,7 @@ enum IWCH_QP_FLAGS { }; struct iwch_mpa_attributes { + u8 initiator; u8 recv_marker_enabled; u8 xmit_marker_enabled; /* iWARP: enable inbound Read Resp. */ u8 crc_enabled; @@ -322,6 +323,7 @@ enum iwch_qp_query_flags { IWCH_QP_QUERY_TEST_USERWRITE = 0x32 /* Test special */ }; +u16 iwch_rqes_posted(struct iwch_qp *qhp); int iwch_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, struct ib_send_wr **bad_wr); int iwch_post_receive(struct ib_qp *ibqp, struct ib_recv_wr *wr, @@ -331,6 +333,7 @@ int iwch_bind_mw(struct ib_qp *qp, struct ib_mw_bind *mw_bind); int iwch_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc); int iwch_post_terminate(struct iwch_qp *qhp, struct respQ_msg_t *rsp_msg); +int iwch_post_zb_read(struct iwch_qp *qhp); int iwch_register_device(struct iwch_dev *dev); void iwch_unregister_device(struct iwch_dev *dev); int iwch_quiesce_qps(struct iwch_cq *chp); diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index c02bb94..b0e5aea 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -586,6 +586,36 @@ static inline void build_term_codes(struct respQ_msg_t *rsp_msg, } } +int iwch_post_zb_read(struct iwch_qp *qhp) +{ + union t3_wr *wqe; + struct sk_buff *skb; + u8 flit_cnt = sizeof(struct t3_rdma_read_wr) >> 3; + + PDBG("%s enter\n", __FUNCTION__); + skb = alloc_skb(40, GFP_KERNEL); + if (!skb) { + printk(KERN_ERR "%s cannot send zb_read!!\n", __FUNCTION__); + return -ENOMEM; + } + wqe = (union t3_wr *)skb_put(skb, sizeof(struct t3_rdma_read_wr)); + memset(wqe, 0, sizeof(struct t3_rdma_read_wr)); + wqe->read.rdmaop = T3_READ_REQ; + wqe->read.reserved[0] = 0; + wqe->read.reserved[1] = 0; + wqe->read.reserved[2] = 0; + wqe->read.rem_stag = cpu_to_be32(1); + wqe->read.rem_to = cpu_to_be64(1); + wqe->read.local_stag = cpu_to_be32(1); + wqe->read.local_len = cpu_to_be32(0); + wqe->read.local_to = cpu_to_be64(1); + wqe->send.wrh.op_seop_flags = cpu_to_be32(V_FW_RIWR_OP(T3_WR_READ)); + wqe->send.wrh.gen_tid_len = cpu_to_be32(V_FW_RIWR_TID(qhp->ep->hwtid)| + V_FW_RIWR_LEN(flit_cnt)); + skb->priority = CPL_PRIORITY_DATA; + return cxgb3_ofld_send(qhp->rhp->rdev.t3cdev_p, skb); +} + /* * This posts a TERMINATE with layer=RDMA, type=catastrophic. */ @@ -671,11 +701,18 @@ static void flush_qp(struct iwch_qp *qhp, unsigned long *flag) /* - * Return non zero if at least one RECV was pre-posted. + * Return count of RECV WRs posted */ -static int rqes_posted(struct iwch_qp *qhp) +u16 iwch_rqes_posted(struct iwch_qp *qhp) { - return fw_riwrh_opcode((struct fw_riwrh *)qhp->wq.queue) == T3_WR_RCV; + union t3_wr *wqe = qhp->wq.queue; + u16 count = 0; + while ((count+1) != 0 && fw_riwrh_opcode((struct fw_riwrh *)wqe) == T3_WR_RCV) { + count++; + wqe++; + } + PDBG("%s qhp %p count %u\n", __FUNCTION__, qhp, count); + return count; } static int rdma_init(struct iwch_dev *rhp, struct iwch_qp *qhp, @@ -716,8 +753,17 @@ static int rdma_init(struct iwch_dev *rhp, struct iwch_qp *qhp, init_attr.ird = qhp->attr.max_ird; init_attr.qp_dma_addr = qhp->wq.dma_addr; init_attr.qp_dma_size = (1UL << qhp->wq.size_log2); - init_attr.flags = rqes_posted(qhp) ? RECVS_POSTED : 0; + init_attr.rqe_count = iwch_rqes_posted(qhp); + init_attr.flags = qhp->attr.mpa_attr.initiator ? MPA_INITIATOR : 0; init_attr.flags |= capable(CAP_NET_BIND_SERVICE) ? PRIV_QP : 0; + if (peer2peer) { + init_attr.rtr_type = RTR_READ; + if (init_attr.ord == 0 && qhp->attr.mpa_attr.initiator) + init_attr.ord = 1; + if (init_attr.ird == 0 && !qhp->attr.mpa_attr.initiator) + init_attr.ird = 1; + } else + init_attr.rtr_type = 0; init_attr.irs = qhp->ep->rcv_seq; PDBG("%s init_attr.rq_addr 0x%x init_attr.rq_size = %d " "flags 0x%x qpcaps 0x%x\n", __FUNCTION__, diff --git a/drivers/net/cxgb3/version.h b/drivers/net/cxgb3/version.h index 229303f..a0177fc 100644 --- a/drivers/net/cxgb3/version.h +++ b/drivers/net/cxgb3/version.h @@ -38,7 +38,7 @@ #define DRV_VERSION "1.0-ko" /* Firmware version */ -#define FW_VERSION_MAJOR 5 +#define FW_VERSION_MAJOR 6 #define FW_VERSION_MINOR 0 #define FW_VERSION_MICRO 0 #endif /* __CHELSIO_VERSION_H */ From rdreier at cisco.com Sun Apr 27 09:30:47 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 27 Apr 2008 09:30:47 -0700 Subject: [ofa-general] Re: [PATCH 1/2] IB/iSER: Do not add unsolicited data offset to VA in iSER header In-Reply-To: <694d48600804270553u36b776ame9695a8858dd278@mail.gmail.com> (Eli Dorfman's message of "Sun, 27 Apr 2008 15:53:19 +0300") References: <694d48600804270553u36b776ame9695a8858dd278@mail.gmail.com> Message-ID: So what was the conclusion on the right way to handle the change that affects on-the-wire data? Just have a flag day so targets either work with 2.6.25 and earlier initiators, or work with 2.6.26 and later initiators, and corrupt data if someone mixes things the wrong way? - R. From rdreier at cisco.com Sun Apr 27 09:31:08 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 27 Apr 2008 09:31:08 -0700 Subject: [ofa-general] [PATCH 2/2] IB/iSER: Use offset from r2t header for rdma In-Reply-To: <694d48600804270555i6ee55843x51c416294fec6397@mail.gmail.com> (Eli Dorfman's message of "Sun, 27 Apr 2008 15:55:00 +0300") References: <694d48600804270555i6ee55843x51c416294fec6397@mail.gmail.com> Message-ID: > usr/iscsi/iscsi_rdma.c | 16 +++++----------- I have no idea what tree this file lives in so I'll just ignore this patch, right? - R. From rdreier at cisco.com Sun Apr 27 09:34:00 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 27 Apr 2008 09:34:00 -0700 Subject: [ofa-general] [PATCH 2.6.26 3/3] RDMA/cxgb3: Support peer-2-peer connection setup. In-Reply-To: <20080427160010.31018.67436.stgit@dell3.ogc.int> (Steve Wise's message of "Sun, 27 Apr 2008 11:00:10 -0500") References: <20080427155456.31018.22282.stgit@dell3.ogc.int> <20080427160010.31018.67436.stgit@dell3.ogc.int> Message-ID: What are the interoperability implications of this? Looking closer I see that iw_nes has the send_first module parameter. How does this interact with that? I guess it's fine to apply this, but do we have a plan for how we want to handle this issue in the long-term? - R. From swise at opengridcomputing.com Sun Apr 27 09:44:43 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Sun, 27 Apr 2008 11:44:43 -0500 Subject: [ofa-general] [PATCH 2.6.26 3/3] RDMA/cxgb3: Support peer-2-peer connection setup. In-Reply-To: References: <20080427155456.31018.22282.stgit@dell3.ogc.int> <20080427160010.31018.67436.stgit@dell3.ogc.int> Message-ID: <4814AD7B.2060006@opengridcomputing.com> Roland Dreier wrote: > What are the interoperability implications of this? > > Looking closer I see that iw_nes has the send_first module parameter. > How does this interact with that? > It doesn't...yet. But we wanted to enable these applications for chelsio now and get the low level fw and driver changes done first and tested. > I guess it's fine to apply this, but do we have a plan for how we want > to handle this issue in the long-term? > Yes! If you'll recall, we had a thread on the ofa general list discussing how to enhance the MPA negotiation so peers can indicate whether they want/need the RTR and what type of RTR (0B read, 0B write, or 0B send) should be sent. This will be done by standardizing a few bits of the private data in order to negotiate all this. The rdma-cma API will be extended so applications will have to request this peer-2-peer model since it adds overhead to the connection setup. I plan to do this work for 2.6.27/ofed-1.4. I think it was listed in Felix's talk at Sonoma. This work (design, API, and code changes affecting core and placing requirements on iwarp providers) will be posted as RFC changes to get everyones feedback as soon as I get something going. Does that sound ok? Steve. From erezz at voltaire.com Sun Apr 27 11:49:33 2008 From: erezz at voltaire.com (Erez Zilber) Date: Sun, 27 Apr 2008 21:49:33 +0300 Subject: [ofa-general] [PATCH 2/2] IB/iSER: Use offset from r2t header forrdma References: <694d48600804270555i6ee55843x51c416294fec6397@mail.gmail.com> Message-ID: <39C75744D164D948A170E9792AF8E7CAF60D35@exil.voltaire.com> > > usr/iscsi/iscsi_rdma.c | 16 +++++----------- > > I have no idea what tree this file lives in so I'll just ignore this > patch, right? As Eli mentioned in PATCH 0/2, the patch set contains a fix for the initiator side and another fix for the iSER code in stgt. That's why Fujita Tomonori (who maintains stgt) is on the thread. Although the 2 fixes are for separate trees, it's a single logcial change. Erez From erezz at voltaire.com Sun Apr 27 11:53:41 2008 From: erezz at voltaire.com (Erez Zilber) Date: Sun, 27 Apr 2008 21:53:41 +0300 Subject: [ofa-general] RE: [PATCH 1/2] IB/iSER: Do not add unsolicited data offset to VA in iSER header References: <694d48600804270553u36b776ame9695a8858dd278@mail.gmail.com> Message-ID: <39C75744D164D948A170E9792AF8E7CAF60D36@exil.voltaire.com> > So what was the conclusion on the right way to handle the change that > affects on-the-wire data? Just have a flag day so targets either work > with 2.6.25 and earlier initiators, or work with 2.6.26 and later > initiators, and corrupt data if someone mixes things the wrong way? See Eli's answer here: http://lists.openfabrics.org/pipermail/general/2008-April/049248.html From qjeemhpuqggej at pww.every1.net Sun Apr 27 14:24:18 2008 From: qjeemhpuqggej at pww.every1.net (christine) Date: Sun, 27 Apr 2008 13:24:18 -0800 (EDT) Subject: [ofa-general] hi from christine Message-ID: <13140787.5999136133764.JavaMail.vmail@service1.colo.empereur.com> Hi My name is christine. I found your email on that dating site. I also love sex on the side. I have a loving partner but he is working 16 hours a day and we have sex only once a week :( If you are interested and wanna see my pictures just email me at cchristine037 at ewekgame.cn Don`t reply, use the email above (my boyfriend doesn`t know about that email!) From a-alexc at activest.de Sun Apr 27 17:42:38 2008 From: a-alexc at activest.de (Fanny Gabriel) Date: Mon, 28 Apr 2008 08:42:38 +0800 Subject: [ofa-general] Long time, no talk Message-ID: <01c8a90b$d0085300$065c48de@a-alexc> Hello! I am bored tonight. I am nice girl that would like to chat with you. Email me at Katarina at likeihape.cn only, because I am using my friend's email to write this. Wanna see some pictures of me? From ogerlitz at voltaire.com Mon Apr 28 00:09:56 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 28 Apr 2008 10:09:56 +0300 Subject: [ofa-general] Re: status of ofed ipoib changes which are not upstream In-Reply-To: <1209365678.22367.34.camel@mtls03> References: <1209286508.22367.5.camel@mtls03> <1209365678.22367.34.camel@mtls03> Message-ID: <48157844.6000909@voltaire.com> Eli Cohen wrote: >> ...looking closer, what happens if the send queue has less than 16 >> entries? (set with send_queue_size on module load) > I assumed that no one will want to use such a low number but surely > someone will do it ;-) How about using a set function for the module > parameter and allowing values >= 2 * MAX_SEND_CQE? > Or go simpler and have MAX_SEND_CQE be replaced by (say) MIN {16 , send_queue_size/4} Or From eli at mellanox.co.il Mon Apr 28 00:14:58 2008 From: eli at mellanox.co.il (Eli Cohen) Date: Mon, 28 Apr 2008 10:14:58 +0300 Subject: [ofa-general] Re: status of ofed ipoib changes which are not upstream In-Reply-To: <48157844.6000909@voltaire.com> References: <1209286508.22367.5.camel@mtls03> <1209365678.22367.34.camel@mtls03> <48157844.6000909@voltaire.com> Message-ID: <1209366898.22367.47.camel@mtls03> On Mon, 2008-04-28 at 10:09 +0300, Or Gerlitz wrote: > > > Or go simpler and have MAX_SEND_CQE be replaced by (say) MIN {16 , > send_queue_size/4} > But then I have to evaluate this expression in the fast path so I think we should put the limit at module initialization. From ogerlitz at voltaire.com Mon Apr 28 00:17:27 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 28 Apr 2008 10:17:27 +0300 Subject: [ofa-general] Re: split CQs for IPOIB UD In-Reply-To: <1209366898.22367.47.camel@mtls03> References: <1209286508.22367.5.camel@mtls03> <1209365678.22367.34.camel@mtls03> <48157844.6000909@voltaire.com> <1209366898.22367.47.camel@mtls03> Message-ID: <48157A07.1020905@voltaire.com> Eli Cohen wrote: >> Or go simpler and have MAX_SEND_CQE be replaced by (say) MIN {16 , >> send_queue_size/4} >> > But then I have to evaluate this expression in the fast path so I think > we should put the limit at module initialization. > Using a value V which equals MIN (A, b) which is not predefined does mot mean to you need to evaluate the MIN function each time you check if X > V, just compute V once and you are done. Or. From eli at mellanox.co.il Mon Apr 28 00:28:19 2008 From: eli at mellanox.co.il (Eli Cohen) Date: Mon, 28 Apr 2008 10:28:19 +0300 Subject: [ofa-general] Re: split CQs for IPOIB UD In-Reply-To: <48157A07.1020905@voltaire.com> References: <1209286508.22367.5.camel@mtls03> <1209365678.22367.34.camel@mtls03> <48157844.6000909@voltaire.com> <1209366898.22367.47.camel@mtls03> <48157A07.1020905@voltaire.com> Message-ID: <1209367699.22367.51.camel@mtls03> On Mon, 2008-04-28 at 10:17 +0300, Or Gerlitz wrote: > Eli Cohen wrote: > >> Or go simpler and have MAX_SEND_CQE be replaced by (say) MIN {16 , > >> send_queue_size/4} > >> > > But then I have to evaluate this expression in the fast path so I think > > we should put the limit at module initialization. > > > Using a value V which equals MIN (A, b) which is not predefined does > mot mean to you need to evaluate the MIN function each time you check if > X > V, just compute V once and you are done. > That's true but then you have to read the calculated value from the memory location where you saved it while in the case of a macro, the compare value is placed in the code segment at the same area where the code is read from. From eli at dev.mellanox.co.il Mon Apr 28 01:14:47 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Mon, 28 Apr 2008 11:14:47 +0300 Subject: [ofa-general] [PATCH v2] IB/ipoib: Split CQs for IPOIB UD Message-ID: <1209370487.11248.1.camel@mtls03> >From 3d87645b9209f95d374c455b3d7535673518b421 Mon Sep 17 00:00:00 2001 From: Eli Cohen Date: Thu, 20 Mar 2008 16:35:30 +0200 Subject: [PATCH] IB/ipoib: Split CQs for IPOIB UD Use a dedicated CQ for UD send. Also, do not arm the UD send CQ thus reducing the number of interrupts generated by the HCA. This patch farther reduces overhead by not calling poll CQ for every posted send WR - it does it only when there 16 or more outstanding work requests. Signed-off-by: Eli Cohen --- changes since the last commit (v1): make sure the tx ring size is at least twice MAX_SEND_CQE to ensure polling the send CQ is done before the tx ring is exhausted. drivers/infiniband/ulp/ipoib/ipoib.h | 9 ++++-- drivers/infiniband/ulp/ipoib/ipoib_cm.c | 8 ++-- drivers/infiniband/ulp/ipoib/ipoib_etool.c | 2 +- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 45 ++++++++++++++++------------ drivers/infiniband/ulp/ipoib/ipoib_main.c | 3 +- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 39 ++++++++++++++++-------- 6 files changed, 65 insertions(+), 41 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 43feffc..fb28f0b 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -95,6 +95,8 @@ enum { IPOIB_MCAST_FLAG_SENDONLY = 1, IPOIB_MCAST_FLAG_BUSY = 2, /* joining or already joined */ IPOIB_MCAST_FLAG_ATTACHED = 3, + + MAX_SEND_CQE = 16, }; #define IPOIB_OP_RECV (1ul << 31) @@ -285,7 +287,8 @@ struct ipoib_dev_priv { u16 pkey_index; struct ib_pd *pd; struct ib_mr *mr; - struct ib_cq *cq; + struct ib_cq *rcq; + struct ib_cq *scq; struct ib_qp *qp; u32 qkey; @@ -305,7 +308,8 @@ struct ipoib_dev_priv { struct ib_send_wr tx_wr; unsigned tx_outstanding; - struct ib_wc ibwc[IPOIB_NUM_WC]; + struct ib_wc ibwc[IPOIB_NUM_WC]; + struct ib_wc send_wc[MAX_SEND_CQE]; struct list_head dead_ahs; @@ -650,7 +654,6 @@ static inline int ipoib_register_debugfs(void) { return 0; } static inline void ipoib_unregister_debugfs(void) { } #endif - #define ipoib_printk(level, priv, format, arg...) \ printk(level "%s: " format, ((struct ipoib_dev_priv *) priv)->dev->name , ## arg) #define ipoib_warn(priv, format, arg...) \ diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 90ff2c9..dfabb38 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -249,8 +249,8 @@ static struct ib_qp *ipoib_cm_create_rx_qp(struct net_device *dev, struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_qp_init_attr attr = { .event_handler = ipoib_cm_rx_event_handler, - .send_cq = priv->cq, /* For drain WR */ - .recv_cq = priv->cq, + .send_cq = priv->rcq, /* For drain WR */ + .recv_cq = priv->rcq, .srq = priv->cm.srq, .cap.max_send_wr = 1, /* For drain WR */ .cap.max_send_sge = 1, /* FIXME: 0 Seems not to work */ @@ -951,8 +951,8 @@ static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_ { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_qp_init_attr attr = { - .send_cq = priv->cq, - .recv_cq = priv->cq, + .send_cq = priv->rcq, + .recv_cq = priv->rcq, .srq = priv->cm.srq, .cap.max_send_wr = ipoib_sendq_size, .cap.max_send_sge = 1, diff --git a/drivers/infiniband/ulp/ipoib/ipoib_etool.c b/drivers/infiniband/ulp/ipoib/ipoib_etool.c index a3ac4cf..b4f4f0f 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_etool.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_etool.c @@ -73,7 +73,7 @@ static int ipoib_set_coalesce(struct net_device *dev, coal->rx_max_coalesced_frames > 0xffff) return -EINVAL; - ret = ib_modify_cq(priv->cq, coal->rx_max_coalesced_frames, + ret = ib_modify_cq(priv->rcq, coal->rx_max_coalesced_frames, coal->rx_coalesce_usecs); if (ret) { ipoib_dbg(priv, "failed modifying CQ\n"); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 5c61a81..8222b50 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -311,7 +311,6 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) struct ipoib_dev_priv *priv = netdev_priv(dev); unsigned int wr_id = wc->wr_id; struct ipoib_tx_buf *tx_req; - unsigned long flags; ipoib_dbg_data(priv, "send completion: id %d, status: %d\n", wr_id, wc->status); @@ -331,13 +330,11 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) dev_kfree_skb_any(tx_req->skb); - spin_lock_irqsave(&priv->tx_lock, flags); ++priv->tx_tail; if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && netif_queue_stopped(dev) && test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)) netif_wake_queue(dev); - spin_unlock_irqrestore(&priv->tx_lock, flags); if (wc->status != IB_WC_SUCCESS && wc->status != IB_WC_WR_FLUSH_ERR) @@ -346,6 +343,17 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) wc->status, wr_id, wc->vendor_err); } +static int poll_tx(struct ipoib_dev_priv *priv) +{ + int n, i; + + n = ib_poll_cq(priv->scq, MAX_SEND_CQE, priv->send_wc); + for (i = 0; i < n; ++i) + ipoib_ib_handle_tx_wc(priv->dev, priv->send_wc + i); + + return n == MAX_SEND_CQE; +} + int ipoib_poll(struct napi_struct *napi, int budget) { struct ipoib_dev_priv *priv = container_of(napi, struct ipoib_dev_priv, napi); @@ -361,7 +369,7 @@ poll_more: int max = (budget - done); t = min(IPOIB_NUM_WC, max); - n = ib_poll_cq(priv->cq, t, priv->ibwc); + n = ib_poll_cq(priv->rcq, t, priv->ibwc); for (i = 0; i < n; i++) { struct ib_wc *wc = priv->ibwc + i; @@ -372,12 +380,8 @@ poll_more: ipoib_cm_handle_rx_wc(dev, wc); else ipoib_ib_handle_rx_wc(dev, wc); - } else { - if (wc->wr_id & IPOIB_OP_CM) - ipoib_cm_handle_tx_wc(dev, wc); - else - ipoib_ib_handle_tx_wc(dev, wc); - } + } else + ipoib_cm_handle_tx_wc(priv->dev, wc); } if (n != t) @@ -386,7 +390,7 @@ poll_more: if (done < budget) { netif_rx_complete(dev, napi); - if (unlikely(ib_req_notify_cq(priv->cq, + if (unlikely(ib_req_notify_cq(priv->rcq, IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)) && netif_rx_reschedule(dev, napi)) @@ -507,12 +511,17 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, address->last_send = priv->tx_head; ++priv->tx_head; + skb_orphan(skb); if (++priv->tx_outstanding == ipoib_sendq_size) { ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n"); netif_stop_queue(dev); } } + + if (unlikely(priv->tx_outstanding > MAX_SEND_CQE)) + poll_tx(priv); + return; drop: @@ -665,7 +674,7 @@ void ipoib_drain_cq(struct net_device *dev) struct ipoib_dev_priv *priv = netdev_priv(dev); int i, n; do { - n = ib_poll_cq(priv->cq, IPOIB_NUM_WC, priv->ibwc); + n = ib_poll_cq(priv->rcq, IPOIB_NUM_WC, priv->ibwc); for (i = 0; i < n; ++i) { /* * Convert any successful completions to flush @@ -680,14 +689,12 @@ void ipoib_drain_cq(struct net_device *dev) ipoib_cm_handle_rx_wc(dev, priv->ibwc + i); else ipoib_ib_handle_rx_wc(dev, priv->ibwc + i); - } else { - if (priv->ibwc[i].wr_id & IPOIB_OP_CM) - ipoib_cm_handle_tx_wc(dev, priv->ibwc + i); - else - ipoib_ib_handle_tx_wc(dev, priv->ibwc + i); - } + } else + ipoib_cm_handle_tx_wc(dev, priv->ibwc + i); } } while (n == IPOIB_NUM_WC); + + while(poll_tx(priv)); } int ipoib_ib_dev_stop(struct net_device *dev, int flush) @@ -779,7 +786,7 @@ timeout: msleep(1); } - ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP); + ib_req_notify_cq(priv->rcq, IB_CQ_NEXT_COMP); return 0; } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 2f8a07d..b0633dd 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -1291,7 +1291,8 @@ static int __init ipoib_init_module(void) ipoib_sendq_size = roundup_pow_of_two(ipoib_sendq_size); ipoib_sendq_size = min(ipoib_sendq_size, IPOIB_MAX_QUEUE_SIZE); - ipoib_sendq_size = max(ipoib_sendq_size, IPOIB_MIN_QUEUE_SIZE); + ipoib_sendq_size = max(ipoib_sendq_size, max(2 * MAX_SEND_CQE, + IPOIB_MIN_QUEUE_SIZE)); #ifdef CONFIG_INFINIBAND_IPOIB_CM ipoib_max_conn_qp = min(ipoib_max_conn_qp, IPOIB_CM_MAX_CONN_QP); #endif diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c index 1d59f27..b9e7eab 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -171,26 +171,33 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca) goto out_free_pd; } - size = ipoib_sendq_size + ipoib_recvq_size + 1; + size = ipoib_recvq_size + 1; ret = ipoib_cm_dev_init(dev); if (!ret) { + size += ipoib_sendq_size; if (ipoib_cm_has_srq(dev)) size += ipoib_recvq_size + 1; /* 1 extra for rx_drain_qp */ else size += ipoib_recvq_size * ipoib_max_conn_qp; } - priv->cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, size, 0); - if (IS_ERR(priv->cq)) { - printk(KERN_WARNING "%s: failed to create CQ\n", ca->name); + priv->rcq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, size, 0); + if (IS_ERR(priv->rcq)) { + printk(KERN_WARNING "%s: failed to create receive CQ\n", ca->name); goto out_free_mr; } - if (ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP)) - goto out_free_cq; + priv->scq = ib_create_cq(priv->ca, NULL, NULL, dev, ipoib_sendq_size, 0); + if (IS_ERR(priv->scq)) { + printk(KERN_WARNING "%s: failed to create send CQ\n", ca->name); + goto out_free_rcq; + } + + if (ib_req_notify_cq(priv->rcq, IB_CQ_NEXT_COMP)) + goto out_free_scq; - init_attr.send_cq = priv->cq; - init_attr.recv_cq = priv->cq; + init_attr.send_cq = priv->scq; + init_attr.recv_cq = priv->rcq; if (priv->hca_caps & IB_DEVICE_TCP_TSO) init_attr.create_flags = QP_CREATE_LSO; @@ -201,7 +208,7 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca) priv->qp = ib_create_qp(priv->pd, &init_attr); if (IS_ERR(priv->qp)) { printk(KERN_WARNING "%s: failed to create QP\n", ca->name); - goto out_free_cq; + goto out_free_scq; } priv->dev->dev_addr[1] = (priv->qp->qp_num >> 16) & 0xff; @@ -217,8 +224,11 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca) return 0; -out_free_cq: - ib_destroy_cq(priv->cq); +out_free_scq: + ib_destroy_cq(priv->scq); + +out_free_rcq: + ib_destroy_cq(priv->rcq); out_free_mr: ib_dereg_mr(priv->mr); @@ -241,8 +251,11 @@ void ipoib_transport_dev_cleanup(struct net_device *dev) clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); } - if (ib_destroy_cq(priv->cq)) - ipoib_warn(priv, "ib_cq_destroy failed\n"); + if (ib_destroy_cq(priv->scq)) + ipoib_warn(priv, "ib_cq_destroy (send) failed\n"); + + if (ib_destroy_cq(priv->rcq)) + ipoib_warn(priv, "ib_cq_destroy (recv) failed\n"); ipoib_cm_dev_cleanup(dev); -- 1.5.5 From jackm at dev.mellanox.co.il Mon Apr 28 04:38:28 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Mon, 28 Apr 2008 14:38:28 +0300 Subject: [ofa-general] [PATCH] mlx4_core: enable changing default max HCA resource limits at run time -- reposting Message-ID: <200804281438.28417.jackm@dev.mellanox.co.il> mlx4-core: enable changing default max HCA resource limits. Enable module-initialization time modification of default HCA maximum resource limits via module parameters, as is done in mthca. Specify the log of the parameter value, rather than the value itself to avoid the hidden side-effect of rounding up values to next power-of-2. Signed-off-by: Jack Morgenstein --- Roland, This patch was first posted on Oct 16, 2007 (but got overlooked). I'm reposting its current incarnation, which applies to the OFED 1.4 driver as is currently on the OpenFabrics server (based on Kernel 2.6.25-rc7). Please queue up for kernel 2.6.26. Thanks! Jack Index: ofed_kernel/drivers/net/mlx4/main.c =================================================================== --- ofed_kernel.orig/drivers/net/mlx4/main.c 2007-10-29 10:22:34.771753000 +0200 +++ ofed_kernel/drivers/net/mlx4/main.c 2007-10-29 11:03:17.939875000 +0200 @@ -85,6 +85,56 @@ static struct mlx4_profile default_profi .num_mtt = 1 << 20, }; +static struct mlx4_profile mod_param_profile = { 0 }; + +module_param_named(log_num_qp, mod_param_profile.num_qp, int, 0444); +MODULE_PARM_DESC(log_num_qp, "log maximum number of QPs per HCA"); + +module_param_named(log_num_srq, mod_param_profile.num_srq, int, 0444); +MODULE_PARM_DESC(log_num_srq, "log maximum number of SRQs per HCA"); + +module_param_named(log_rdmarc_per_qp, mod_param_profile.rdmarc_per_qp, int, 0444); +MODULE_PARM_DESC(log_rdmarc_per_qp, "log number of RDMARC buffers per QP"); + +module_param_named(log_num_cq, mod_param_profile.num_cq, int, 0444); +MODULE_PARM_DESC(log_num_cq, "log maximum number of CQs per HCA"); + +module_param_named(log_num_mcg, mod_param_profile.num_mcg, int, 0444); +MODULE_PARM_DESC(log_num_mcg, "log maximum number of multicast groups per HCA"); + +module_param_named(log_num_mpt, mod_param_profile.num_mpt, int, 0444); +MODULE_PARM_DESC(log_num_mpt, + "log maximum number of memory protection table entries per HCA"); + +module_param_named(log_num_mtt, mod_param_profile.num_mtt, int, 0444); +MODULE_PARM_DESC(log_num_mtt, + "log maximum number of memory translation table segments per HCA"); + +static void process_mod_param_profile(void) +{ + default_profile.num_qp = (mod_param_profile.num_qp ? + 1 << mod_param_profile.num_qp : + default_profile.num_qp); + default_profile.num_srq = (mod_param_profile.num_srq ? + 1 << mod_param_profile.num_srq : + default_profile.num_srq); + default_profile.rdmarc_per_qp = (mod_param_profile.rdmarc_per_qp ? + 1 << mod_param_profile.rdmarc_per_qp : + default_profile.rdmarc_per_qp); + default_profile.num_cq = (mod_param_profile.num_cq ? + 1 << mod_param_profile.num_cq : + default_profile.num_cq); + default_profile.num_mcg = (mod_param_profile.num_mcg ? + 1 << mod_param_profile.num_mcg : + default_profile.num_mcg); + default_profile.num_mpt = (mod_param_profile.num_mpt ? + 1 << mod_param_profile.num_mpt : + default_profile.num_mpt); + default_profile.num_mtt = (mod_param_profile.num_mtt ? + 1 << mod_param_profile.num_mtt : + default_profile.num_mtt); +} + static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) { int err; @@ -514,6 +564,7 @@ static int __devinit mlx4_init_hca(struc goto err_stop_fw; } + process_mod_param_profile(); profile = default_profile; icm_size = mlx4_make_profile(dev, &profile, &dev_cap, &init_hca); From tziporet at dev.mellanox.co.il Mon Apr 28 04:52:56 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 28 Apr 2008 14:52:56 +0300 Subject: [ofa-general] Loading of ib_mthca fails In-Reply-To: References: Message-ID: <4815BA98.8000802@mellanox.co.il> Xavier Andrade wrote: > On Wed, 23 Apr 2008, Roland Dreier wrote: > >> Hmm, not sure... let's see what the Mellanox guys say (they're mostly on >> vacation this week so it might be a few days). We are back :-) > > I can't locate the correct firmware, the PSID reported by mtsflint > corresponds to an Intel one: > > Image type: Failsafe > I.S. Version: 1 > Chip Revision: A0 > Description: Node Port1 Sys image > GUIDs: 0002c9020022baa4 0002c9020022baa5 0002c9020022baa7 > Board ID: (INT0010000001) > VSD: > PSID: INT0010000001 > > But I haven't been able to find any firmware in Intel's webpage. > > Do you think that I could use a Mellanox firmware? Which one? There > are three different ones for the MT25204. > Attached is the ini file for this PSID. Please create a binary using the MFT package in our web site and try to burn it. If you have more issues please work with Todd that cc on this maik Tziporet -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: D51144_A1-A3.ini URL: From dorfman.eli at gmail.com Mon Apr 28 05:01:33 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Mon, 28 Apr 2008 15:01:33 +0300 Subject: [ofa-general] [PATCH} IB/iSER: Move high-volume debug output to higher debug levels Message-ID: <694d48600804280501q3cf74a10p2e1b73b4ac0d3d27@mail.gmail.com> Add more levels for debug. Signed-off-by: Eli Dorfman --- drivers/infiniband/ulp/iser/iscsi_iser.c | 5 ++--- drivers/infiniband/ulp/iser/iscsi_iser.h | 7 +++++++ drivers/infiniband/ulp/iser/iser_memory.c | 7 +++++-- 3 files changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index be1b9fb..451e601 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -78,15 +78,14 @@ static unsigned int iscsi_max_lun = 512; module_param_named(max_lun, iscsi_max_lun, uint, S_IRUGO); int iser_debug_level = 0; +module_param_named(debug_level, iser_debug_level, int, S_IRUGO|S_IWUSR|S_IWGRP); +MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0 (default:disabled)"); MODULE_DESCRIPTION("iSER (iSCSI Extensions for RDMA) Datamover " "v" DRV_VER " (" DRV_DATE ")"); MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Alex Nezhinsky, Dan Bar Dov, Or Gerlitz"); -module_param_named(debug_level, iser_debug_level, int, 0644); -MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0 (default:disabled)"); - struct iser_global ig; void diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index 1ee867b..a8c1b30 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -71,6 +71,13 @@ #define iser_dbg(fmt, arg...) \ do { \ + if (iser_debug_level > 1) \ + printk(KERN_DEBUG PFX "%s:" fmt,\ + __func__ , ## arg); \ + } while (0) + +#define iser_warn(fmt, arg...) \ + do { \ if (iser_debug_level > 0) \ printk(KERN_DEBUG PFX "%s:" fmt,\ __func__ , ## arg); \ diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index 4a17743..ee58199 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -334,8 +334,11 @@ static void iser_data_buf_dump(struct iser_data_buf *data, struct scatterlist *sg; int i; + if (iser_debug_level == 0) + return; + for_each_sg(sgl, sg, data->dma_nents, i) - iser_err("sg[%d] dma_addr:0x%lX page:0x%p " + iser_warn("sg[%d] dma_addr:0x%lX page:0x%p " "off:0x%x sz:0x%x dma_len:0x%x\n", i, (unsigned long)ib_sg_dma_address(ibdev, sg), sg_page(sg), sg->offset, @@ -434,7 +437,7 @@ int iser_reg_rdma_mem(struct iscsi_iser_cmd_task *iser_ctask, aligned_len = iser_data_buf_aligned_len(mem, ibdev); if (aligned_len != mem->dma_nents) { - iser_err("rdma alignment violation %d/%d aligned\n", + iser_warn("rdma alignment violation %d/%d aligned\n", aligned_len, mem->size); iser_data_buf_dump(mem, ibdev); -- 1.5.5 This patch was made against 2.6.26 branch. Since this includes minor changes please try to push it to 2.6.26. Otherwise this can go to 2.6.27. From dorfman.eli at gmail.com Mon Apr 28 05:10:16 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Mon, 28 Apr 2008 15:10:16 +0300 Subject: [ofa-general] [PATCH] IB/iSER: Add module param to count alignment violations Message-ID: <694d48600804280510l25ee6f90t9eff86fd6743461@mail.gmail.com> Add read only module param to count alignment violations. In case of unaligned pages iSER allocates memory and copies the data to the new memory. Signed-off-by: Eli Dorfman --- drivers/infiniband/ulp/iser/iscsi_iser.c | 3 +++ drivers/infiniband/ulp/iser/iscsi_iser.h | 1 + drivers/infiniband/ulp/iser/iser_memory.c | 1 + 3 files changed, 5 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index 451e601..5181a1e 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -77,6 +77,9 @@ static unsigned int iscsi_max_lun = 512; module_param_named(max_lun, iscsi_max_lun, uint, S_IRUGO); +unsigned int iser_unaligned_cnt = 0; +module_param_named(unaligned_cnt, iser_unaligned_cnt, uint, S_IRUGO); + int iser_debug_level = 0; module_param_named(debug_level, iser_debug_level, int, S_IRUGO|S_IWUSR|S_IWGRP); MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0 (default:disabled)"); diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index a8c1b30..4a39a38 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -294,6 +294,7 @@ struct iser_global { extern struct iser_global ig; extern int iser_debug_level; +extern unsigned int iser_unaligned_cnt; /* allocate connection resources needed for rdma functionality */ int iser_conn_set_full_featured_mode(struct iscsi_conn *conn); diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index ee58199..0f0fcb3 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -437,6 +437,7 @@ int iser_reg_rdma_mem(struct iscsi_iser_cmd_task *iser_ctask, aligned_len = iser_data_buf_aligned_len(mem, ibdev); if (aligned_len != mem->dma_nents) { + iser_unaligned_cnt++; iser_warn("rdma alignment violation %d/%d aligned\n", aligned_len, mem->size); iser_data_buf_dump(mem, ibdev); -- 1.5.5 This patch was made against 2.6.26 branch. Since it includes minor changes please try to push it to 2.6.26. Otherwise this can go to 2.6.27. From Arkady.Kanevsky at netapp.com Mon Apr 28 06:51:11 2008 From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady) Date: Mon, 28 Apr 2008 09:51:11 -0400 Subject: [ofa-general] [PATCH 2.6.26 3/3] RDMA/cxgb3: Support peer-2-peerconnection setup. In-Reply-To: <4814AD7B.2060006@opengridcomputing.com> References: <20080427155456.31018.22282.stgit@dell3.ogc.int> <20080427160010.31018.67436.stgit@dell3.ogc.int> <4814AD7B.2060006@opengridcomputing.com> Message-ID: I expect it to be tests at Sept interop event. If it works then I will send proposal to IETF for MPA enhancement. Thanks, Arkady Kanevsky email: arkady at netapp.com Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 > -----Original Message----- > From: Steve Wise [mailto:swise at opengridcomputing.com] > Sent: Sunday, April 27, 2008 12:45 PM > To: Roland Dreier > Cc: netdev at vger.kernel.org; general at lists.openfabrics.org; > linux-kernel at vger.kernel.org; divy at chelsio.com > Subject: Re: [ofa-general] [PATCH 2.6.26 3/3] RDMA/cxgb3: > Support peer-2-peerconnection setup. > > > > Roland Dreier wrote: > > What are the interoperability implications of this? > > > > Looking closer I see that iw_nes has the send_first module > parameter. > > How does this interact with that? > > > > It doesn't...yet. But we wanted to enable these applications > for chelsio now and get the low level fw and driver changes > done first and tested. > > > I guess it's fine to apply this, but do we have a plan for > how we want > > to handle this issue in the long-term? > > > > Yes! If you'll recall, we had a thread on the ofa general > list discussing how to enhance the MPA negotiation so peers > can indicate whether they want/need the RTR and what type of > RTR (0B read, 0B write, or 0B send) should be sent. This > will be done by standardizing a few bits of the private data > in order to negotiate all this. The rdma-cma API will be > extended so applications will have to request this > peer-2-peer model since it adds overhead to the connection setup. > > I plan to do this work for 2.6.27/ofed-1.4. I think it was > listed in Felix's talk at Sonoma. This work (design, API, > and code changes affecting core and placing requirements on > iwarp providers) will be posted as RFC changes to get > everyones feedback as soon as I get something going. > > Does that sound ok? > > > Steve. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From mtosatti at redhat.com Mon Apr 28 07:48:55 2008 From: mtosatti at redhat.com (Marcelo Tosatti) Date: Mon, 28 Apr 2008 11:48:55 -0300 Subject: [ofa-general] Re: [kvm-devel] mmu notifier #v14 In-Reply-To: <20080427030514.GM9514@duo.random> References: <20080426164511.GJ9514@duo.random> <48137B8B.7010202@us.ibm.com> <20080427002019.GL9514@duo.random> <4813DCCF.3020201@codemonkey.ws> <20080427030514.GM9514@duo.random> Message-ID: <20080428144855.GA1702@dmt> Hi Andrea, Looks good. Acked-by: Marcelo Tosatti From hrosenstock at xsigo.com Mon Apr 28 08:02:55 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Mon, 28 Apr 2008 08:02:55 -0700 Subject: [ofa-general] [PATCH] mlx4_core: enable changing default max HCA resource limits at run time -- reposting In-Reply-To: <200804281438.28417.jackm@dev.mellanox.co.il> References: <200804281438.28417.jackm@dev.mellanox.co.il> Message-ID: <1209394975.689.386.camel@hrosenstock-ws.xsigo.com> Jack, On Mon, 2008-04-28 at 14:38 +0300, Jack Morgenstein wrote: > mlx4-core: enable changing default max HCA resource limits. > > Enable module-initialization time modification of default HCA > maximum resource limits via module parameters, as is done in mthca. > > Specify the log of the parameter value, rather than the value itself > to avoid the hidden side-effect of rounding up values to next power-of-2. This is much needed; thanks! One minor comment: In places where there are reserved resources (like qps, srqs, others ?), should it be ensured that the parameters set are above the logs of those amounts so the user doesn't shoot themselves in the foot by accident ? Or perhaps a little more on the ranges in the mod param descriptions ? -- Hal > Signed-off-by: Jack Morgenstein > > --- > > Roland, > This patch was first posted on Oct 16, 2007 (but got overlooked). > > I'm reposting its current incarnation, which applies to the OFED 1.4 driver > as is currently on the OpenFabrics server (based on Kernel 2.6.25-rc7). > > Please queue up for kernel 2.6.26. > Thanks! > Jack > > Index: ofed_kernel/drivers/net/mlx4/main.c > =================================================================== > --- ofed_kernel.orig/drivers/net/mlx4/main.c 2007-10-29 10:22:34.771753000 +0200 > +++ ofed_kernel/drivers/net/mlx4/main.c 2007-10-29 11:03:17.939875000 +0200 > @@ -85,6 +85,56 @@ static struct mlx4_profile default_profi > .num_mtt = 1 << 20, > }; > > +static struct mlx4_profile mod_param_profile = { 0 }; > + > +module_param_named(log_num_qp, mod_param_profile.num_qp, int, 0444); > +MODULE_PARM_DESC(log_num_qp, "log maximum number of QPs per HCA"); > + > +module_param_named(log_num_srq, mod_param_profile.num_srq, int, 0444); > +MODULE_PARM_DESC(log_num_srq, "log maximum number of SRQs per HCA"); > + > +module_param_named(log_rdmarc_per_qp, mod_param_profile.rdmarc_per_qp, int, 0444); > +MODULE_PARM_DESC(log_rdmarc_per_qp, "log number of RDMARC buffers per QP"); > + > +module_param_named(log_num_cq, mod_param_profile.num_cq, int, 0444); > +MODULE_PARM_DESC(log_num_cq, "log maximum number of CQs per HCA"); > + > +module_param_named(log_num_mcg, mod_param_profile.num_mcg, int, 0444); > +MODULE_PARM_DESC(log_num_mcg, "log maximum number of multicast groups per HCA"); > + > +module_param_named(log_num_mpt, mod_param_profile.num_mpt, int, 0444); > +MODULE_PARM_DESC(log_num_mpt, > + "log maximum number of memory protection table entries per HCA"); > + > +module_param_named(log_num_mtt, mod_param_profile.num_mtt, int, 0444); > +MODULE_PARM_DESC(log_num_mtt, > + "log maximum number of memory translation table segments per HCA"); > + > +static void process_mod_param_profile(void) > +{ > + default_profile.num_qp = (mod_param_profile.num_qp ? > + 1 << mod_param_profile.num_qp : > + default_profile.num_qp); > + default_profile.num_srq = (mod_param_profile.num_srq ? > + 1 << mod_param_profile.num_srq : > + default_profile.num_srq); > + default_profile.rdmarc_per_qp = (mod_param_profile.rdmarc_per_qp ? > + 1 << mod_param_profile.rdmarc_per_qp : > + default_profile.rdmarc_per_qp); > + default_profile.num_cq = (mod_param_profile.num_cq ? > + 1 << mod_param_profile.num_cq : > + default_profile.num_cq); > + default_profile.num_mcg = (mod_param_profile.num_mcg ? > + 1 << mod_param_profile.num_mcg : > + default_profile.num_mcg); > + default_profile.num_mpt = (mod_param_profile.num_mpt ? > + 1 << mod_param_profile.num_mpt : > + default_profile.num_mpt); > + default_profile.num_mtt = (mod_param_profile.num_mtt ? > + 1 << mod_param_profile.num_mtt : > + default_profile.num_mtt); > +} > + > static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) > { > int err; > @@ -514,6 +564,7 @@ static int __devinit mlx4_init_hca(struc > goto err_stop_fw; > } > > + process_mod_param_profile(); > profile = default_profile; > > icm_size = mlx4_make_profile(dev, &profile, &dev_cap, &init_hca); > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From David.Chevalier at ge.com Mon Apr 28 08:13:36 2008 From: David.Chevalier at ge.com (Chevalier, David (GE Healthcare)) Date: Mon, 28 Apr 2008 11:13:36 -0400 Subject: [ofa-general] SDP poll() behavior Message-ID: <68D58DEFB8673048A64DE1FBE56BEE1807CF50EC@CINMLVEM11.e2k.ad.ge.com> Hi SDP developers, I've noticed apparent difference between SDP and TCP/IP handling of certain scenario (OFED 1.3), not necessarily a bug, but wondering if it might be better to behave more like TCP/IP in this case: receiver and sender use non-blocking sockets (SDP) and monitor through poll() sender writes a known quantity of data through many calls to write(), then closes its side of socket. receiver polls socket, and reads the data through many calls to read(), then closes its socket. receiver is monitoring poll() revents for POLLERR, POLLHUP and POLLIN On the receiver's last expected pass through the poll() loop to read() the last remaining data, I'll often get revents of {POLLERR|POLLHUP|POLLIN}, likely due to sender closing its socket after last write(). If my poll() handling loop goes in this order: check/handle POLLERR check/handle POLLHUP check/handle POLLIN then it fails, because I don't expect to be able to read() data when poll() return POLLERR or POLLHUP. If I change the order and handle POLLIN first, then read() works and gets the last data. I've never encountered this in TCP/IP - that is to say, for TCP/IP I first receive a clean POLLIN from poll(), then the next poll()(after I read() the data) returns POLLHUP (without the POLLERR). If I get POLLERR from poll(), I'd expect subsequent call to read() to return an error, not valid data... While this is probably an "implementation defined" behavior, it seems like a good idea to try to behave the same as the TCP/IP sockets that SDP aims to replace... Regards, Dave From jackm at dev.mellanox.co.il Mon Apr 28 08:20:21 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Mon, 28 Apr 2008 18:20:21 +0300 Subject: [ofa-general] [PATCH] mlx4_core: enable changing default max HCA resource limits at run time -- reposting In-Reply-To: <1209394975.689.386.camel@hrosenstock-ws.xsigo.com> References: <200804281438.28417.jackm@dev.mellanox.co.il> <1209394975.689.386.camel@hrosenstock-ws.xsigo.com> Message-ID: <200804281820.21672.jackm@dev.mellanox.co.il> On Monday 28 April 2008 18:02, Hal Rosenstock wrote: > In places where there are reserved resources (like qps, srqs, others ?), > should it be ensured that the parameters set are above the logs of those > amounts so the user doesn't shoot themselves in the foot by accident ? > No worry there. The reserved resources are subtracted from the above log amounts (when expressed as a power-of-2: 1UL << log), and the resulting amounts ( - reserved) returned to the user as the device limits. (check this out using ibv_devinfo) Thus, the user CANNOT "shoot themselves in the foot". - Jack (P.S. this patch is in OFED 1.3 -- do "modinfo mlx4_core" to see the above module parameters ). From Tyson at ceo-ag.de Mon Apr 28 08:21:05 2008 From: Tyson at ceo-ag.de (Tyson Monroe) Date: Mon, 28 Apr 2008 19:21:05 +0400 Subject: [ofa-general] Superb quality copies of best-known watches Message-ID: <305d01c8a943$79bbc170$bb0682d5@Tyson> We offer the largest selection of splendid watch rep1!c at s on the web! Have a good time shopping in our store! http://sebua.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From olga.shern at gmail.com Mon Apr 28 08:22:23 2008 From: olga.shern at gmail.com (Olga Shern (Voltaire)) Date: Mon, 28 Apr 2008 18:22:23 +0300 Subject: [ofa-general] Re: [ewg] OFED April 21 meeting summary In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com> References: <458BC6B0F287034F92FE78908BD01CE831A08338@mtlexch01.mtl.com> <6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com> Message-ID: Hi Tziporet, I was on vacation, therefore couldn't attend this meeting, and I want to update about Voltaire's plans for OFED 1.3.1. We are working on bug fixes for Bonding and HA , minimal impact on traffic, multicast and partitioning during SM failover. Also it is very important for us that IPoIB 2 kernel panics will be fixed ( https://bugs.openfabrics.org/show_bug.cgi?id=989, https://bugs.openfabrics.org/show_bug.cgi?id=985) Best Regards, Olga On 4/22/08, Tziporet Koren wrote: > > OFED April 21 meeting summary about 1.3.1 plans and OFED 1.4 development: > > 1. OFED 1.3.1: > > 1.1 Planned changes: > > ULPs changes: > > IB-bonding - done > SRP failover - on work > SDP crashes - on work > RDS fixes for RDMA API - done > librdmacm 1.0.7 - done > Open MPI 1.2.6 - done > uDAPL - on work > > Low level drivers: - each HW vendor should reply when the > changes will be ready > > nes - will be ready on first week of May > mlx4 - fixes are ready; changes to support Eth are under > review of the submission to kernel so not clear if they will make it on > time. > > cxgb3 - will be ready by middle of may. Majority of > changes should be submitted for RC1. > ipath - wait for update from Betsy > ehca - wait for update from Christoph > > 1.2 Schedule: we agreed that 2 release candidate should be > sufficient > > GA is planned for May-29 > - RC1 - May 6 > - RC2 - May 20 > > Note: daily builds of 1.3.1 are already available at: * > http://www.openfabrics.org/builds/ofed-1.3.1* > > > 2. OFED 1.4: > > Release features were presented at Sonoma (presentation available at > *http://www.openfabrics.org/archives/april2008sonoma.htm* > ) > > IPv6: Woody is looking for resources to add IPv6 support to the CMA. > Hal noted that it will require a change in opensm too. > > Xsigo Vnic & Vhba - Not clear if they will make it > > Kernel tree is under work at: git:// > git.openfabrics.org/ofed_1_4/linux-2.6.git branch ofed_kernel > We should try to get the kernel code to compile as soon as possible > so everybody will be able to contribute code. > > Schedule reminder: > ============== > Release: Oct 06, 2008 > Features freeze: Jun 25, 08 (kernel 2.6.26 based) > Alpha: Jul 9, 08 > Beta: Jul 30, 08 kernel 2.6.27-rcX (assuming it will be available) > RC1: Aug 13, 08 > RC2: Aug 27, 08 > RC3-RC5/6 – every 5-10 days > Latest RC to be used in OFA interop event > GA: Oct 06 08 > > > Tziporet > > > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hrosenstock at xsigo.com Mon Apr 28 08:50:13 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Mon, 28 Apr 2008 08:50:13 -0700 Subject: [ofa-general] [PATCH] mlx4_core: enable changing default max HCA resource limits at run time -- reposting In-Reply-To: <200804281820.21672.jackm@dev.mellanox.co.il> References: <200804281438.28417.jackm@dev.mellanox.co.il> <1209394975.689.386.camel@hrosenstock-ws.xsigo.com> <200804281820.21672.jackm@dev.mellanox.co.il> Message-ID: <1209397813.689.395.camel@hrosenstock-ws.xsigo.com> On Mon, 2008-04-28 at 18:20 +0300, Jack Morgenstein wrote: > On Monday 28 April 2008 18:02, Hal Rosenstock wrote: > > In places where there are reserved resources (like qps, srqs, others ?), > > should it be ensured that the parameters set are above the logs of those > > amounts so the user doesn't shoot themselves in the foot by accident ? > > > No worry there. The reserved resources are subtracted from the above > log amounts (when expressed as a power-of-2: 1UL << log), and the > resulting amounts ( - reserved) returned to the user as > the device limits. > (check this out using ibv_devinfo) > > Thus, the user CANNOT "shoot themselves in the foot". Right; that accounts for the reserved ones but what happens if they mistakenly set something like log_num_qp = 5 where the total number is less than the reserved number ? Should this be protected against in some way (friendly error or bump to minimum needed) and/or indicate a minimum in the mod param description ? > - Jack > > (P.S. this patch is in OFED 1.3 Also, OFED 1.2.5.4 > -- do "modinfo mlx4_core" to see the above module parameters ). Thanks. -- Hal From rdreier at cisco.com Mon Apr 28 08:50:25 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 28 Apr 2008 08:50:25 -0700 Subject: [ofa-general] Re: [PATCH] mlx4_core: enable changing default max HCA resource limits at run time -- reposting In-Reply-To: <200804281438.28417.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Mon, 28 Apr 2008 14:38:28 +0300") References: <200804281438.28417.jackm@dev.mellanox.co.il> Message-ID: Hmm... wouldn't it be better to follow the same interface as ib_mthca and have consumers pass in the numbers instead of the log sizes? Having two different ways of changing the same parameters seems pretty confusing. From rdreier at cisco.com Mon Apr 28 08:51:10 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 28 Apr 2008 08:51:10 -0700 Subject: [ofa-general] [PATCH] IB/iSER: Add module param to count alignment violations In-Reply-To: <694d48600804280510l25ee6f90t9eff86fd6743461@mail.gmail.com> (Eli Dorfman's message of "Mon, 28 Apr 2008 15:10:16 +0300") References: <694d48600804280510l25ee6f90t9eff86fd6743461@mail.gmail.com> Message-ID: > Add read only module param to count alignment violations. I don't think a module parameter is the way to report statistics from the kernel. Can't you just add a device attribute or something? Or stick a file in debugfs? - R. From rdreier at cisco.com Mon Apr 28 08:57:12 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 28 Apr 2008 08:57:12 -0700 Subject: [ofa-general] Re: [ewg] OFED April 21 meeting summary In-Reply-To: (Olga Shern's message of "Mon, 28 Apr 2008 18:22:23 +0300") References: <458BC6B0F287034F92FE78908BD01CE831A08338@mtlexch01.mtl.com> <6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com> Message-ID: > Also it is very important for us that IPoIB 2 kernel panics will be fixed ( > https://bugs.openfabrics.org/show_bug.cgi?id=989, > https://bugs.openfabrics.org/show_bug.cgi?id=985) Are either of these panics seen with upstream kernels? If we don't know then this points to a serious problem with the OFED model: we are diluting testing resources from the upstream kernel, which hurts the quality of the kernel that most users get from their distro. From olga.shern at gmail.com Mon Apr 28 09:14:39 2008 From: olga.shern at gmail.com (Olga Shern (Voltaire)) Date: Mon, 28 Apr 2008 19:14:39 +0300 Subject: [ofa-general] Re: [ewg] OFED April 21 meeting summary In-Reply-To: References: <458BC6B0F287034F92FE78908BD01CE831A08338@mtlexch01.mtl.com> <6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com> Message-ID: On 4/28/08, Roland Dreier wrote: > > > Also it is very important for us that IPoIB 2 kernel panics will be > fixed ( > > https://bugs.openfabrics.org/show_bug.cgi?id=989, > > https://bugs.openfabrics.org/show_bug.cgi?id=985) > > Are either of these panics seen with upstream kernels? > > https://bugs.openfabrics.org/show_bug.cgi?id=989 is OFED bug https://bugs.openfabrics.org/show_bug.cgi?id=985 we will try to reproduce it on upstream kernel and let you know -------------- next part -------------- An HTML attachment was scrubbed... URL: From weiny2 at llnl.gov Mon Apr 28 09:19:23 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Mon, 28 Apr 2008 09:19:23 -0700 Subject: [PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting rereg bit (Was: Re: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup.) In-Reply-To: <48143DBA.3080701@voltaire.com> References: <20080423133816.6c1b6315.weiny2@llnl.gov> <48109087.6030606@voltaire.com> <20080424143125.2aad1db8.weiny2@llnl.gov> <15ddcffd0804241523p19559580vc3a1293c1fe097b1@mail.gmail.com> <20080424181657.28d58a29.weiny2@llnl.gov> <48143DBA.3080701@voltaire.com> Message-ID: <20080428091923.0abf9fb5.weiny2@llnl.gov> On Sun, 27 Apr 2008 11:47:54 +0300 Or Gerlitz wrote: > Ira Weiny wrote: > > > > I did not get any output with multicast_debug_level! > why should you, as from the node's point of view nothing has happened > (the exact param name is mcast_debug_level) > > > > Here is a patch which fixes the problem. (At least with the partial sub-nets > > configuration I explained before.) I will have to verify this fixes the problem > > I originally reported. > OK, good. Does this problem exist in the released openSM? if yes, what > would be the trigger for the SM to "really discover" (i.e do PortInfo > SET) this sub-fabric and how much time would it take to reach this > trigger, worst case wise? Yes, this is in the current released version of OpenSM, AFAICT. The trigger is: the single link separating the partial sub net will come up and that trap will cause OpenSM to resweep. I believe this will happen on the next resweep cycle which is by default 10 sec. (But this is configurable.) I don't think there is an issue with allowing OpenSM to resweep as designed. > > The failure configuration you have set to reproduce the problem is very > untypical, I think. I agree. I made a patch to turn off the processing of MAD's in the kernel to test my original theory, that the node is not responding to MAD's. Using this patch I have been able to verify that if a node stops responding that the rereg is sent by OpenSM when the node comes back. See my next email response to Sasha concerning the original issue. Ira > > Since under common clos etc topologies which don't > have a 1:n blocking nature, failure of such link would cause re-route > etc by the SM which would not (and should not) be noted by the nodes (I > hope I am not falling into another problem here...) > > Or. > > > From HNGUYEN at de.ibm.com Mon Apr 28 09:36:20 2008 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Mon, 28 Apr 2008 18:36:20 +0200 Subject: [ofa-general] Re: [PATCH] ehca: ret is unsigned, ibmebus_request_irq() negative return ignored in hca_create_eq() In-Reply-To: <480FA529.2030800@tiscali.nl> Message-ID: Hello Roel! Thanks for pointing this out. Will send another version with a more consistent naming convention for return variable from firmware. We used to naming it h_ret. Regards Nam Roel Kluin <12o3l at tiscali.nl> wrote on 23.04.2008 23:07:53: > diff --git a/drivers/infiniband/hw/ehca/ehca_eq.c > b/drivers/infiniband/hw/ehca/ehca_eq.c > index b4ac617..9727235 100644 > --- a/drivers/infiniband/hw/ehca/ehca_eq.c > +++ b/drivers/infiniband/hw/ehca/ehca_eq.c > @@ -59,6 +59,7 @@ int ehca_create_eq(struct ehca_shca *shca, > u32 i; > void *vpage; > struct ib_device *ib_dev = &shca->ib_device; > + int ret2; > > spin_lock_init(&eq->spinlock); > spin_lock_init(&eq->irq_spinlock); > @@ -123,18 +124,18 @@ int ehca_create_eq(struct ehca_shca *shca, > > /* register interrupt handlers and initialize work queues */ > if (type == EHCA_EQ) { > - ret = ibmebus_request_irq(eq->ist, ehca_interrupt_eq, > + ret2 = ibmebus_request_irq(eq->ist, ehca_interrupt_eq, > IRQF_DISABLED, "ehca_eq", > (void *)shca); > - if (ret < 0) > + if (ret2 < 0) > ehca_err(ib_dev, "Can't map interrupt handler."); > > tasklet_init(&eq->interrupt_task, ehca_tasklet_eq, (long)shca); > } else if (type == EHCA_NEQ) { > - ret = ibmebus_request_irq(eq->ist, ehca_interrupt_neq, > + ret2 = ibmebus_request_irq(eq->ist, ehca_interrupt_neq, > IRQF_DISABLED, "ehca_neq", > (void *)shca); > - if (ret < 0) > + if (ret2 < 0) > ehca_err(ib_dev, "Can't map interrupt handler."); > > tasklet_init(&eq->interrupt_task, ehca_tasklet_neq, (long)shca); From akepner at sgi.com Mon Apr 28 09:37:31 2008 From: akepner at sgi.com (akepner at sgi.com) Date: Mon, 28 Apr 2008 09:37:31 -0700 Subject: [ofa-general] Re: [ewg] OFED April 21 meeting summary In-Reply-To: References: <458BC6B0F287034F92FE78908BD01CE831A08338@mtlexch01.mtl.com> <6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com> Message-ID: <20080428163731.GL30919@sgi.com> On Mon, Apr 28, 2008 at 07:14:39PM +0300, Olga Shern (Voltaire) wrote: > ... > https://bugs.openfabrics.org/show_bug.cgi?id=985 we will try to reproduce > it on upstream kernel and let you know I just saw this bug report today, but we've had similar crashes. Looks like the problem is that in ipoib_neigh_cleanup() this is done (no locking): neigh = *to_ipoib_neigh(n); then later: spin_lock_irqsave(&priv->lock, flags); if (neigh->ah) ah = neigh->ah; list_del(&neigh->list); <---- neigh may be stale now ipoib_neigh_free(n->dev, neigh); spin_unlock_irqrestore(&priv->lock, flags); neigh wasn't re-read after acquiring the lock, so it may point to an already freed data structure. In our crashes we had backtraces like: RIP: ib_ipoib:ipoib_neigh_cleanup+368 neigh_destroy+197 neigh_periodic_timer+249 neigh_periodic_timer+0 run_timer_softirq+348 __do_softirq+85 call_softirq+30 do_softirq+44 ..... And the following helpful hint: Unable to handle kernel paging request at 0000000000100108 ^^^^^^^^^^^^^^^^ LIST_POISON1 + 0x8 So we were dying in the midst of list_del(). -- Arthur From hnguyen at linux.vnet.ibm.com Mon Apr 28 09:47:44 2008 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Mon, 28 Apr 2008 18:47:44 +0200 Subject: [ofa-general] IB/ehca: handle negative return value from ibmebus_request_irq() properly in ehca_create_eq() Message-ID: <200804281847.44968.hnguyen@linux.vnet.ibm.com> Signed-off-by: Hoang-Nam Nguyen --- drivers/infiniband/hw/ehca/ehca_eq.c | 35 ++++++++++++++++----------------- 1 files changed, 17 insertions(+), 18 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_eq.c b/drivers/infiniband/hw/ehca/ehca_eq.c index b4ac617..49660df 100644 --- a/drivers/infiniband/hw/ehca/ehca_eq.c +++ b/drivers/infiniband/hw/ehca/ehca_eq.c @@ -54,7 +54,8 @@ int ehca_create_eq(struct ehca_shca *shca, struct ehca_eq *eq, const enum ehca_eq_type type, const u32 length) { - u64 ret; + int ret; + u64 h_ret; u32 nr_pages; u32 i; void *vpage; @@ -73,15 +74,15 @@ int ehca_create_eq(struct ehca_shca *shca, return -EINVAL; } - ret = hipz_h_alloc_resource_eq(shca->ipz_hca_handle, - &eq->pf, - type, - length, - &eq->ipz_eq_handle, - &eq->length, - &nr_pages, &eq->ist); + h_ret = hipz_h_alloc_resource_eq(shca->ipz_hca_handle, + &eq->pf, + type, + length, + &eq->ipz_eq_handle, + &eq->length, + &nr_pages, &eq->ist); - if (ret != H_SUCCESS) { + if (h_ret != H_SUCCESS) { ehca_err(ib_dev, "Can't allocate EQ/NEQ. eq=%p", eq); return -EINVAL; } @@ -97,24 +98,22 @@ int ehca_create_eq(struct ehca_shca *shca, u64 rpage; vpage = ipz_qpageit_get_inc(&eq->ipz_queue); - if (!vpage) { - ret = H_RESOURCE; + if (!vpage) goto create_eq_exit2; - } rpage = virt_to_abs(vpage); - ret = hipz_h_register_rpage_eq(shca->ipz_hca_handle, - eq->ipz_eq_handle, - &eq->pf, - 0, 0, rpage, 1); + h_ret = hipz_h_register_rpage_eq(shca->ipz_hca_handle, + eq->ipz_eq_handle, + &eq->pf, + 0, 0, rpage, 1); if (i == (nr_pages - 1)) { /* last page */ vpage = ipz_qpageit_get_inc(&eq->ipz_queue); - if (ret != H_SUCCESS || vpage) + if (h_ret != H_SUCCESS || vpage) goto create_eq_exit2; } else { - if (ret != H_PAGE_REGISTERED || !vpage) + if (h_ret != H_PAGE_REGISTERED || !vpage) goto create_eq_exit2; } } -- 1.5.5 From arlin.r.davis at intel.com Mon Apr 28 10:13:40 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Mon, 28 Apr 2008 10:13:40 -0700 Subject: [ofa-general] [PATCH] [dat1.2] dapl cma: add check before destroying cm event channel in release Message-ID: library may be loaded and unloaded without calling open in which case the cm event channel is not created. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/openib_cma/dapl_ib_util.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/dapl/openib_cma/dapl_ib_util.c b/dapl/openib_cma/dapl_ib_util.c index 5f4fbd0..56c0a05 100755 --- a/dapl/openib_cma/dapl_ib_util.c +++ b/dapl/openib_cma/dapl_ib_util.c @@ -189,7 +189,8 @@ int32_t dapls_ib_release(void) { dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " dapl_ib_release: \n"); dapli_ib_thread_destroy(); - rdma_destroy_event_channel(g_cm_events); + if (g_cm_events != NULL) + rdma_destroy_event_channel(g_cm_events); return 0; } -- 1.5.2.5 -------------- next part -------------- An HTML attachment was scrubbed... URL: From arlin.r.davis at intel.com Mon Apr 28 10:13:42 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Mon, 28 Apr 2008 10:13:42 -0700 Subject: [ofa-general] [PATCH] [master] dapl cma: add check before destroying cm event channel in release Message-ID: <002401c8a953$35b62b90$51fc070a@amr.corp.intel.com> library may be loaded and unloaded without calling open in which case the cm event channel is not created. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/openib_cma/dapl_ib_util.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/dapl/openib_cma/dapl_ib_util.c b/dapl/openib_cma/dapl_ib_util.c index a7ba3d6..1f41186 100755 --- a/dapl/openib_cma/dapl_ib_util.c +++ b/dapl/openib_cma/dapl_ib_util.c @@ -178,7 +178,8 @@ int32_t dapls_ib_release(void) { dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " dapl_ib_release: \n"); dapli_ib_thread_destroy(); - rdma_destroy_event_channel(g_cm_events); + if (g_cm_events != NULL) + rdma_destroy_event_channel(g_cm_events); return 0; } -- 1.5.2.5 -------------- next part -------------- An HTML attachment was scrubbed... URL: From arlin.r.davis at intel.com Mon Apr 28 10:17:06 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Mon, 28 Apr 2008 10:17:06 -0700 Subject: [ofa-general] [PATCH][master] dapl: add vendor_err with DTO error logging Message-ID: DAPL_GET_CQE_VENDOR_ERR added to get vendor_err via cq entry. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/common/dapl_evd_util.c | 16 ++++++++++------ dapl/openib_cma/dapl_ib_dto.h | 2 +- 2 files changed, 11 insertions(+), 7 deletions(-) diff --git a/dapl/common/dapl_evd_util.c b/dapl/common/dapl_evd_util.c index 32fbaba..293759f 100755 --- a/dapl/common/dapl_evd_util.c +++ b/dapl/common/dapl_evd_util.c @@ -543,9 +543,12 @@ bail: return dat_status; } -#if defined(DAPL_DBG) && !defined(DAPL_GET_CQE_OP_STR) +#if !defined(DAPL_GET_CQE_OP_STR) #define DAPL_GET_CQE_OP_STR(e) "Unknown CEQ OP String?" #endif +#if !defined(DAPL_GET_CQE_VENDOR_ERR) +#define DAPL_GET_CQE_VENDOR_ERR(e) 0 +#endif /* * dapli_evd_eh_print_cqe @@ -565,7 +568,6 @@ dapli_evd_eh_print_cqe ( IN ib_work_completion_t *cqe_ptr) { #ifdef DAPL_DBG - dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK, "\t >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<\n"); dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK, @@ -583,8 +585,9 @@ dapli_evd_eh_print_cqe ( DAPL_GET_CQE_BYTESNUM (cqe_ptr)); } dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK, - "\t\t status %d\n", - DAPL_GET_CQE_STATUS (cqe_ptr)); + "\t\t status %d vendor_err 0x%x\n", + DAPL_GET_CQE_STATUS(cqe_ptr), + DAPL_GET_CQE_VENDOR_ERR(cqe_ptr)); dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK, "\t >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<\n"); #endif @@ -1215,9 +1218,10 @@ dapli_evd_cqe_to_event ( } dapl_log(DAPL_DBG_TYPE_ERR, - "DTO completion ERR: status %d, opcode %s \n", + "DTO completion ERR: status %d, opcode %s, vendor_err 0x%x\n", DAPL_GET_CQE_STATUS(cqe_ptr), - DAPL_GET_CQE_OP_STR(cqe_ptr)); + DAPL_GET_CQE_OP_STR(cqe_ptr), + DAPL_GET_CQE_VENDOR_ERR(cqe_ptr)); } } diff --git a/dapl/openib_cma/dapl_ib_dto.h b/dapl/openib_cma/dapl_ib_dto.h index a90aea2..b111e5e 100644 --- a/dapl/openib_cma/dapl_ib_dto.h +++ b/dapl/openib_cma/dapl_ib_dto.h @@ -458,10 +458,10 @@ STATIC _INLINE_ int dapls_cqe_opcode(ib_work_completion_t *cqe_p) } } - #define DAPL_GET_CQE_OPTYPE(cqe_p) dapls_cqe_opcode(cqe_p) #define DAPL_GET_CQE_WRID(cqe_p) ((ib_work_completion_t*)cqe_p)->wr_id #define DAPL_GET_CQE_STATUS(cqe_p) ((ib_work_completion_t*)cqe_p)->status +#define DAPL_GET_CQE_VENDOR_ERR(cqe_p) ((ib_work_completion_t*)cqe_p)->vendor_err #define DAPL_GET_CQE_BYTESNUM(cqe_p) ((ib_work_completion_t*)cqe_p)->byte_len #define DAPL_GET_CQE_IMMED_DATA(cqe_p) ((ib_work_completion_t*)cqe_p)->imm_data -- 1.5.2.5 From arlin.r.davis at intel.com Mon Apr 28 10:17:07 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Mon, 28 Apr 2008 10:17:07 -0700 Subject: [ofa-general] [PATCH][dat1.2] dapl: add vendor_err with DTO error logging Message-ID: <002901c8a953$affe56c0$51fc070a@amr.corp.intel.com> DAPL_GET_CQE_VENDOR_ERR added to get vendor_err via cq entry. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/common/dapl_evd_util.c | 50 ++++++++++++++++++---------------------- dapl/openib_cma/dapl_ib_dto.h | 1 + 2 files changed, 24 insertions(+), 27 deletions(-) diff --git a/dapl/common/dapl_evd_util.c b/dapl/common/dapl_evd_util.c index 36b776c..2c95c6d 100644 --- a/dapl/common/dapl_evd_util.c +++ b/dapl/common/dapl_evd_util.c @@ -485,6 +485,12 @@ bail: return dat_status; } +#if !defined(DAPL_GET_CQE_OP_STR) +#define DAPL_GET_CQE_OP_STR(e) "Unknown CEQ OP String?" +#endif +#if !defined(DAPL_GET_CQE_VENDOR_ERR) +#define DAPL_GET_CQE_VENDOR_ERR(e) 0 +#endif /* * dapli_evd_eh_print_cqe @@ -504,39 +510,28 @@ dapli_evd_eh_print_cqe ( IN ib_work_completion_t *cqe_ptr) { #ifdef DAPL_DBG - static char *optable[] = - { - "OP_SEND", - "OP_RDMA_READ", - "OP_RDMA_WRITE", - "OP_COMP_AND_SWAP", - "OP_FETCH_AND_ADD", - "OP_RECEIVE", - "OP_BIND_MW", - 0 - }; - dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK, - "\t >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<\n"); + "\t >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<\n"); dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK, - "\t dapl_evd_dto_callback : CQE \n"); + "\t dapl_evd_dto_callback : CQE \n"); dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK, - "\t\t work_req_id %lli\n", - DAPL_GET_CQE_WRID (cqe_ptr)); + "\t\t work_req_id %lli\n", + DAPL_GET_CQE_WRID (cqe_ptr)); if (DAPL_GET_CQE_STATUS (cqe_ptr) == 0) { - dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK, - "\t\t op_type: %s\n", - optable[DAPL_GET_CQE_OPTYPE (cqe_ptr)]); - dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK, - "\t\t bytes_num %d\n", - DAPL_GET_CQE_BYTESNUM (cqe_ptr)); + dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK, + "\t\t op_type: %s\n", + DAPL_GET_CQE_OP_STR(cqe_ptr)); + dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK, + "\t\t bytes_num %d\n", + DAPL_GET_CQE_BYTESNUM (cqe_ptr)); } dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK, - "\t\t status %d\n", - DAPL_GET_CQE_STATUS (cqe_ptr)); + "\t\t status %d vendor_err 0x%x\n", + DAPL_GET_CQE_STATUS(cqe_ptr), + DAPL_GET_CQE_VENDOR_ERR(cqe_ptr)); dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK, - "\t >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<\n"); + "\t >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<\n"); #endif return; } @@ -1171,9 +1166,10 @@ dapli_evd_cqe_to_event ( } dapl_log(DAPL_DBG_TYPE_ERR, - "DTO completion ERR: status %d, opcode %s \n", + "DTO completion ERR: status %d, opcode %s, vendor_err 0x%x\n", DAPL_GET_CQE_STATUS(cqe_ptr), - DAPL_GET_CQE_OP_STR(cqe_ptr)); + DAPL_GET_CQE_OP_STR(cqe_ptr), + DAPL_GET_CQE_VENDOR_ERR(cqe_ptr)); } } diff --git a/dapl/openib_cma/dapl_ib_dto.h b/dapl/openib_cma/dapl_ib_dto.h index 1a83718..52b189b 100644 --- a/dapl/openib_cma/dapl_ib_dto.h +++ b/dapl/openib_cma/dapl_ib_dto.h @@ -272,6 +272,7 @@ STATIC _INLINE_ int dapls_cqe_opcode(ib_work_completion_t *cqe_p) #define DAPL_GET_CQE_OPTYPE(cqe_p) dapls_cqe_opcode(cqe_p) #define DAPL_GET_CQE_WRID(cqe_p) ((ib_work_completion_t*)cqe_p)->wr_id #define DAPL_GET_CQE_STATUS(cqe_p) ((ib_work_completion_t*)cqe_p)->status +#define DAPL_GET_CQE_VENDOR_ERR(cqe_p) ((ib_work_completion_t*)cqe_p)->vendor_err #define DAPL_GET_CQE_BYTESNUM(cqe_p) ((ib_work_completion_t*)cqe_p)->byte_len #define DAPL_GET_CQE_IMMED_DATA(cqe_p) ((ib_work_completion_t*)cqe_p)->imm_data -- 1.5.2.5 From weiny2 at llnl.gov Mon Apr 28 11:03:32 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Mon, 28 Apr 2008 11:03:32 -0700 Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup. In-Reply-To: <20080427171140.GI22406@sashak.voltaire.com> References: <20080423133816.6c1b6315.weiny2@llnl.gov> <20080427171140.GI22406@sashak.voltaire.com> Message-ID: <20080428110332.6fb8e1d8.weiny2@llnl.gov> On Sun, 27 Apr 2008 17:11:40 +0000 Sasha Khapyorsky wrote: > Hi Ira, > > On 13:38 Wed 23 Apr , Ira Weiny wrote: > > > > The symptom is that nodes drop out of the IPoIB mcast group after a node > > temporarily goes catatonic. The details are: > > > > 1) Issues on a node cause a soft lockup of the node. > > 2) OpenSM does a normal light sweep. > > 3) MADs to the node time out since the node is in a "bad state" > > Normally during light sweep OpenSM will not query nodes. I think OpenSM > should not detect such soft lockup unless ib link state was changed and > heavy sweep was triggered. Is this the case? Yes I agree. Per my previous mail to Or I found that light sweeps did not in fact notice the nodes were gone. Looking at the logs I am not sure what caused OpenSM to notice them. However, something must have triggered a heavy sweep when those nodes were catatonic. From the logs they were unresponsive for multiple seconds, some as long as 30s. It is still a bit of a mystery why OpenSM did a heavy sweep during this period but I don't think it is unreasonable for it to do so. > > > 4) OpenSM marks the node down and drops it from internal tables, including > > mcast groups. > > 5) Node recovers from soft lock up condition. > > 6) A subsequent sweep causes OpenSM see the node and add it back to the > > fabric. > > 7) Node is fully functional on the verbs layer but IPoIB never knew anything > > was wrong so it does _not_ rejoin the mcast groups. (This is different > > from the condition where the link actually goes down.) > > If my approach above is correct it should be same as port down/up > handling. And as was noted already in this thread OpenSM should ask > for reregistration (by setting client reregistration bit). > > I see your patch - seems this part is buggy in OpenSM now, will see > closer to this. > Yes I believe this is all fixed. Thanks again for everyone's help on this, Ira From arlin.r.davis at intel.com Mon Apr 28 11:47:38 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Mon, 28 Apr 2008 11:47:38 -0700 Subject: [ofa-general] [PATCH][dat1.2] dapl: cma provider needs to support lower inline send default for iWARP Message-ID: IB and iWARP work best with different defaults. Add transport check and set default accordingly. 64 for iWARP, 200 for IB. DAPL_MAX_INLINE environment variable is still used to override. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/openib_cma/dapl_ib_util.c | 11 +++++++++-- dapl/openib_cma/dapl_ib_util.h | 3 ++- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/dapl/openib_cma/dapl_ib_util.c b/dapl/openib_cma/dapl_ib_util.c index 56c0a05..4de5a2c 100755 --- a/dapl/openib_cma/dapl_ib_util.c +++ b/dapl/openib_cma/dapl_ib_util.c @@ -274,8 +274,15 @@ DAT_RETURN dapls_ib_open_hca(IN IB_HCA_NAME hca_name, IN DAPL_HCA *hca_ptr) (unsigned long long)bswap_64(gid->global.interface_id)); /* set inline max with env or default, get local lid and gid 0 */ - hca_ptr->ib_trans.max_inline_send = - dapl_os_get_env_val("DAPL_MAX_INLINE", INLINE_SEND_DEFAULT); + if (hca_ptr->ib_hca_handle->device->transport_type + == IBV_TRANSPORT_IWARP) + hca_ptr->ib_trans.max_inline_send = + dapl_os_get_env_val("DAPL_MAX_INLINE", + INLINE_SEND_IWARP_DEFAULT); + else + hca_ptr->ib_trans.max_inline_send = + dapl_os_get_env_val("DAPL_MAX_INLINE", + INLINE_SEND_IB_DEFAULT); /* set CM timer defaults */ hca_ptr->ib_trans.max_cm_timeout = diff --git a/dapl/openib_cma/dapl_ib_util.h b/dapl/openib_cma/dapl_ib_util.h index 93f4fde..1e464b2 100755 --- a/dapl/openib_cma/dapl_ib_util.h +++ b/dapl/openib_cma/dapl_ib_util.h @@ -122,7 +122,8 @@ typedef struct _ib_wait_obj_handle #define IB_INVALID_HANDLE NULL /* inline send rdma threshold */ -#define INLINE_SEND_DEFAULT 128 +#define INLINE_SEND_IWARP_DEFAULT 64 +#define INLINE_SEND_IB_DEFAULT 200 /* CM private data areas */ #define IB_MAX_REQ_PDATA_SIZE 48 -- 1.5.2.5 From arlin.r.davis at intel.com Mon Apr 28 11:47:41 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Mon, 28 Apr 2008 11:47:41 -0700 Subject: [ofa-general] [PATCH][master] dapl: cma provider needs to support lower inline send default for iWARP Message-ID: <002a01c8a960$56e964f0$51fc070a@amr.corp.intel.com> IB and iWARP work best with different defaults. Add transport check and set default accordingly. 64 for iWARP, 200 for IB. DAPL_MAX_INLINE environment variable is still used to override. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/openib_cma/dapl_ib_util.c | 11 +++++++++-- dapl/openib_cma/dapl_ib_util.h | 3 ++- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/dapl/openib_cma/dapl_ib_util.c b/dapl/openib_cma/dapl_ib_util.c index 1f41186..41986a3 100755 --- a/dapl/openib_cma/dapl_ib_util.c +++ b/dapl/openib_cma/dapl_ib_util.c @@ -270,8 +270,15 @@ DAT_RETURN dapls_ib_open_hca(IN IB_HCA_NAME hca_name, IN DAPL_HCA *hca_ptr) (unsigned long long)bswap_64(gid->global.interface_id)); /* set inline max with env or default, get local lid and gid 0 */ - hca_ptr->ib_trans.max_inline_send = - dapl_os_get_env_val("DAPL_MAX_INLINE", INLINE_SEND_DEFAULT); + if (hca_ptr->ib_hca_handle->device->transport_type + == IBV_TRANSPORT_IWARP) + hca_ptr->ib_trans.max_inline_send = + dapl_os_get_env_val("DAPL_MAX_INLINE", + INLINE_SEND_IWARP_DEFAULT); + else + hca_ptr->ib_trans.max_inline_send = + dapl_os_get_env_val("DAPL_MAX_INLINE", + INLINE_SEND_IB_DEFAULT); /* set CM timer defaults */ hca_ptr->ib_trans.max_cm_timeout = diff --git a/dapl/openib_cma/dapl_ib_util.h b/dapl/openib_cma/dapl_ib_util.h index 71593fd..3368180 100755 --- a/dapl/openib_cma/dapl_ib_util.h +++ b/dapl/openib_cma/dapl_ib_util.h @@ -111,7 +111,8 @@ typedef struct _ib_wait_obj_handle #define IB_INVALID_HANDLE NULL /* inline send rdma threshold */ -#define INLINE_SEND_DEFAULT 64 +#define INLINE_SEND_IWARP_DEFAULT 64 +#define INLINE_SEND_IB_DEFAULT 200 /* CMA private data areas */ #define CMA_PDATA_HDR 36 -- 1.5.2.5 From jimmott at austin.rr.com Mon Apr 28 12:43:35 2008 From: jimmott at austin.rr.com (Jim Mott) Date: Mon, 28 Apr 2008 14:43:35 -0500 Subject: [ofa-general] SDP poll() behavior In-Reply-To: <68D58DEFB8673048A64DE1FBE56BEE1807CF50EC@CINMLVEM11.e2k.ad.ge.com> References: <68D58DEFB8673048A64DE1FBE56BEE1807CF50EC@CINMLVEM11.e2k.ad.ge.com> Message-ID: <005f01c8a968$262c9cd0$7285d670$@rr.com> I agree that SDP should have the same behavior as TCP in this situation. Bug 1020 has been opened so we can track the issue. Thanks David! -----Original Message----- From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Chevalier, David (GE Healthcare) Sent: Monday, April 28, 2008 10:14 AM To: general at lists.openfabrics.org Subject: [ofa-general] SDP poll() behavior Hi SDP developers, I've noticed apparent difference between SDP and TCP/IP handling of certain scenario (OFED 1.3), not necessarily a bug, but wondering if it might be better to behave more like TCP/IP in this case: receiver and sender use non-blocking sockets (SDP) and monitor through poll() sender writes a known quantity of data through many calls to write(), then closes its side of socket. receiver polls socket, and reads the data through many calls to read(), then closes its socket. receiver is monitoring poll() revents for POLLERR, POLLHUP and POLLIN On the receiver's last expected pass through the poll() loop to read() the last remaining data, I'll often get revents of {POLLERR|POLLHUP|POLLIN}, likely due to sender closing its socket after last write(). If my poll() handling loop goes in this order: check/handle POLLERR check/handle POLLHUP check/handle POLLIN then it fails, because I don't expect to be able to read() data when poll() return POLLERR or POLLHUP. If I change the order and handle POLLIN first, then read() works and gets the last data. I've never encountered this in TCP/IP - that is to say, for TCP/IP I first receive a clean POLLIN from poll(), then the next poll()(after I read() the data) returns POLLHUP (without the POLLERR). If I get POLLERR from poll(), I'd expect subsequent call to read() to return an error, not valid data... While this is probably an "implementation defined" behavior, it seems like a good idea to try to behave the same as the TCP/IP sockets that SDP aims to replace... Regards, Dave _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Mon Apr 28 08:57:12 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 28 Apr 2008 08:57:12 -0700 Subject: [ofa-general] Re: [ewg] OFED April 21 meeting summary In-Reply-To: (Olga Shern's message of "Mon, 28 Apr 2008 18:22:23 +0300") References: <458BC6B0F287034F92FE78908BD01CE831A08338@mtlexch01.mtl.com> <6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com> Message-ID: > Also it is very important for us that IPoIB 2 kernel panics will be fixed ( > https://bugs.openfabrics.org/show_bug.cgi?id=989, > https://bugs.openfabrics.org/show_bug.cgi?id=985) Are either of these panics seen with upstream kernels? If we don't know then this points to a serious problem with the OFED model: we are diluting testing resources from the upstream kernel, which hurts the quality of the kernel that most users get from their distro. _______________________________________________ ewg mailing list ewg at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg From clameter at sgi.com Mon Apr 28 13:34:11 2008 From: clameter at sgi.com (Christoph Lameter) Date: Mon, 28 Apr 2008 13:34:11 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080427122727.GO9514@duo.random> References: <20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com> <20080423163713.GC24536@duo.random> <20080423221928.GV24536@duo.random> <20080424064753.GH24536@duo.random> <20080424095112.GC30298@sgi.com> <20080424153943.GJ24536@duo.random> <20080424174145.GM24536@duo.random> <20080426131734.GB19717@sgi.com> <20080427122727.GO9514@duo.random> Message-ID: On Sun, 27 Apr 2008, Andrea Arcangeli wrote: > Talking about post 2.6.26: the refcount with rcu in the anon-vma > conversion seems unnecessary and may explain part of the AIM slowdown > too. The rest looks ok and probably we should switch the code to a > compile-time decision between rwlock and rwsem (so obsoleting the > current spinlock). You are going to take a semphore in an rcu section? Guess you did not activate all debugging options while testing? I was not aware that you can take a sleeping lock from a non preemptible context. From olga.shern at gmail.com Mon Apr 28 09:14:39 2008 From: olga.shern at gmail.com (Olga Shern (Voltaire)) Date: Mon, 28 Apr 2008 19:14:39 +0300 Subject: [ofa-general] Re: [ewg] OFED April 21 meeting summary In-Reply-To: References: <458BC6B0F287034F92FE78908BD01CE831A08338@mtlexch01.mtl.com> <6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com> Message-ID: On 4/28/08, Roland Dreier wrote: > > > Also it is very important for us that IPoIB 2 kernel panics will be > fixed ( > > https://bugs.openfabrics.org/show_bug.cgi?id=989, > > https://bugs.openfabrics.org/show_bug.cgi?id=985) > > Are either of these panics seen with upstream kernels? > > https://bugs.openfabrics.org/show_bug.cgi?id=989 is OFED bug https://bugs.openfabrics.org/show_bug.cgi?id=985 we will try to reproduce it on upstream kernel and let you know -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ ewg mailing list ewg at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg From rdreier at cisco.com Mon Apr 28 14:45:21 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 28 Apr 2008 14:45:21 -0700 Subject: [ofa-general] Re: [PATCH 1/2] IB/iSER: Do not add unsolicited data offset to VA in iSER header In-Reply-To: <39C75744D164D948A170E9792AF8E7CAF60D36@exil.voltaire.com> (Erez Zilber's message of "Sun, 27 Apr 2008 21:53:41 +0300") References: <694d48600804270553u36b776ame9695a8858dd278@mail.gmail.com> <39C75744D164D948A170E9792AF8E7CAF60D36@exil.voltaire.com> Message-ID: > See Eli's answer here: > > http://lists.openfabrics.org/pipermail/general/2008-April/049248.html Does everyone agree with that? Pete? stgt developers? - R. From rdreier at cisco.com Mon Apr 28 15:34:00 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 28 Apr 2008 15:34:00 -0700 Subject: [ofa-general][PATCH 9/12 v1] mlx4: Collapsed CQ support In-Reply-To: <480F506D.9020202@mellanox.co.il> (Yevgeny Petrilin's message of "Wed, 23 Apr 2008 18:06:21 +0300") References: <480F506D.9020202@mellanox.co.il> Message-ID: thanks, applied From dks at mediaweb.com Mon Apr 28 15:43:02 2008 From: dks at mediaweb.com (DK Smith) Date: Mon, 28 Apr 2008 15:43:02 -0700 Subject: [ofa-general] install.sh question In-Reply-To: <48141EC1.7010801@dev.mellanox.co.il> References: <1207688301.1661.86.camel@localhost> <48141EC1.7010801@dev.mellanox.co.il> Message-ID: <481652F6.50008@mediaweb.com> > Hi Frank, > install.sh checks if there are binary RPMS for all selected packages > under OFED-x.x.x/RPMS directory. > If you have created binary RPMs on one of the nodes (by install.sh > script), then make sure that the OFED-x.x.x/ofed.conf file includes only > these packages. > Then run on all cluster nodes (no kernel sources, compilers, ... > required on these nodes): >> ./install.sh -c ofed.conf -net ofed_net.conf > > Note: If there are no RPMs for one or more of the packages selected > (package_name=y)in the ofed.conf file then install.sh will run the RPM > build process. > > Regards, > Vladimir Is the NEW & IMPROVED installer, install.pl, a drop in replacement for build.sh? I recently wrote a set of build scripts that are used to build a distribution (kernel + modules + root file system) for deployment elsewhere. (i.e. a non-native build of everything including OFED). In the OFED 1.2 installer, I used this method of invocation: /build.sh -c wherein, build.sh locates the config file, "ofed.conf" in the same directory. That worked. The statement about "run on all cluster nodes" appears to indicate a non-native build is no-longer possible. Cheers, DK From rdreier at cisco.com Mon Apr 28 15:44:17 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 28 Apr 2008 15:44:17 -0700 Subject: [ofa-general] [PATCH 2.6.26 1/3] RDMA/cxgb3: Correctly serialize peer abort path. In-Reply-To: <20080427160006.31018.66715.stgit@dell3.ogc.int> (Steve Wise's message of "Sun, 27 Apr 2008 11:00:06 -0500") References: <20080427155456.31018.22282.stgit@dell3.ogc.int> <20080427160006.31018.66715.stgit@dell3.ogc.int> Message-ID: OK, applied, with a few fixups based on checkpatch output -- mostly __FUNCTION__ -> __func__ (__FUNCTION__ is a deprecated gcc-specific extension, __func__ is standard), and also a couple "abort=0" -> "abort = 0". - R. From rdreier at cisco.com Mon Apr 28 15:45:40 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 28 Apr 2008 15:45:40 -0700 Subject: [ofa-general] Re: [PATCH 2.6.26 2/3] RDMA/cxgb3: Correctly set the max_mr_size device attribute. In-Reply-To: <20080427160008.31018.15516.stgit@dell3.ogc.int> (Steve Wise's message of "Sun, 27 Apr 2008 11:00:08 -0500") References: <20080427155456.31018.22282.stgit@dell3.ogc.int> <20080427160008.31018.15516.stgit@dell3.ogc.int> Message-ID: thanks, applied From rdreier at cisco.com Mon Apr 28 15:47:27 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 28 Apr 2008 15:47:27 -0700 Subject: [ofa-general] [PATCH 2.6.26 1/3] RDMA/cxgb3: Correctly serialize peer abort path. In-Reply-To: (Roland Dreier's message of "Mon, 28 Apr 2008 15:44:17 -0700") References: <20080427155456.31018.22282.stgit@dell3.ogc.int> <20080427160006.31018.66715.stgit@dell3.ogc.int> Message-ID: oh yeah, and I deleted an unused "out" label From rdreier at cisco.com Mon Apr 28 15:54:29 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 28 Apr 2008 15:54:29 -0700 Subject: [ofa-general] [PATCH 2.6.26 3/3] RDMA/cxgb3: Support peer-2-peer connection setup. In-Reply-To: <20080427160010.31018.67436.stgit@dell3.ogc.int> (Steve Wise's message of "Sun, 27 Apr 2008 11:00:10 -0500") References: <20080427155456.31018.22282.stgit@dell3.ogc.int> <20080427160010.31018.67436.stgit@dell3.ogc.int> Message-ID: thanks applied (and I see you deleted the unused label in this patch, heh) From rdreier at cisco.com Mon Apr 28 16:00:04 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 28 Apr 2008 16:00:04 -0700 Subject: [ofa-general] Re: IB/ehca: handle negative return value from ibmebus_request_irq() properly in ehca_create_eq() In-Reply-To: <200804281847.44968.hnguyen@linux.vnet.ibm.com> (Hoang-Nam Nguyen's message of "Mon, 28 Apr 2008 18:47:44 +0200") References: <200804281847.44968.hnguyen@linux.vnet.ibm.com> Message-ID: thanks, applied From rdreier at cisco.com Mon Apr 28 16:07:35 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 28 Apr 2008 16:07:35 -0700 Subject: [ofa-general] Re: [ewg] OFED April 21 meeting summary In-Reply-To: <20080428163731.GL30919@sgi.com> (akepner@sgi.com's message of "Mon, 28 Apr 2008 09:37:31 -0700") References: <458BC6B0F287034F92FE78908BD01CE831A08338@mtlexch01.mtl.com> <6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com> <20080428163731.GL30919@sgi.com> Message-ID: > I just saw this bug report today, but we've had similar crashes. > Looks like the problem is that in ipoib_neigh_cleanup() this is > done (no locking): > > neigh = *to_ipoib_neigh(n); > > then later: > > spin_lock_irqsave(&priv->lock, flags); > if (neigh->ah) > ah = neigh->ah; > list_del(&neigh->list); <---- neigh may be stale now > ipoib_neigh_free(n->dev, neigh); > spin_unlock_irqrestore(&priv->lock, flags); > > neigh wasn't re-read after acquiring the lock, so it may point > to an already freed data structure. Ugh, looks delicate to fix properly, since we don't have a lock to take until we find out whether the neighbour is attached to an IPoIB device. > Unable to handle kernel paging request at 0000000000100108 > ^^^^^^^^^^^^^^^^ > LIST_POISON1 + 0x8 strange that the ofa bugzilla entry has a different address it's crashing at. From andrea at qumranet.com Mon Apr 28 17:10:52 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 29 Apr 2008 02:10:52 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: References: <20080423002848.GA32618@sgi.com> <20080423163713.GC24536@duo.random> <20080423221928.GV24536@duo.random> <20080424064753.GH24536@duo.random> <20080424095112.GC30298@sgi.com> <20080424153943.GJ24536@duo.random> <20080424174145.GM24536@duo.random> <20080426131734.GB19717@sgi.com> <20080427122727.GO9514@duo.random> Message-ID: <20080429001052.GA8315@duo.random> On Mon, Apr 28, 2008 at 01:34:11PM -0700, Christoph Lameter wrote: > On Sun, 27 Apr 2008, Andrea Arcangeli wrote: > > > Talking about post 2.6.26: the refcount with rcu in the anon-vma > > conversion seems unnecessary and may explain part of the AIM slowdown > > too. The rest looks ok and probably we should switch the code to a > > compile-time decision between rwlock and rwsem (so obsoleting the > > current spinlock). > > You are going to take a semphore in an rcu section? Guess you did not > activate all debugging options while testing? I was not aware that you can > take a sleeping lock from a non preemptible context. I'd hoped to discuss this topic after mmu-notifier-core was already merged, but let's do it anyway. My point of view is that there was no rcu when I wrote that code, yet there was no reference count and yet all locking looks still exactly the same as I wrote it. There's even still the page_table_lock to serialize threads taking the mmap_sem in read mode against the first vma->anon_vma = anon_vma during the page fault. Frankly I've absolutely no idea why rcu is needed in all rmap code when walking the page->mapping. Definitely the PG_locked is taken so there's no way page->mapping could possibly go away under the rmap code, hence the anon_vma can't go away as it's queued in the vma, and the vma has to go away before the page is zapped out of the pte. So there are some possible scenarios: 1) my original anon_vma code was buggy not taking the rcu_read_lock() and somebody fixed it (I tend to exclude it) 2) somebody has seen a race that doesn't exist and didn't bother to document it other than with this obscure comment * Getting a lock on a stable anon_vma from a page off the LRU is * tricky: page_lock_anon_vma rely on RCU to guard against the races. I tend to exclude it too as VM folks are too smart for this to be the case. 3) somebody did some microoptimization using rcu and we surely can undo that microoptimization to get the code back to my original code that didn't need rcu despite it worked exactly the same, and that is going to be cheaper to use with semaphores than doubling the number of locked ops for every lock instruction. Now the double atomic op may not be horrible when not contented, as it works on the same cacheline but with cacheline bouncing with contention it sounds doubly horrible than a single cacheline bounce and I don't see the point of it as you can't use rcu anyways, so you can't possibly take advantage of whatever microoptimization done over the original locking. From clameter at sgi.com Mon Apr 28 18:28:06 2008 From: clameter at sgi.com (Christoph Lameter) Date: Mon, 28 Apr 2008 18:28:06 -0700 (PDT) Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080429001052.GA8315@duo.random> References: <20080423002848.GA32618@sgi.com> <20080423163713.GC24536@duo.random> <20080423221928.GV24536@duo.random> <20080424064753.GH24536@duo.random> <20080424095112.GC30298@sgi.com> <20080424153943.GJ24536@duo.random> <20080424174145.GM24536@duo.random> <20080426131734.GB19717@sgi.com> <20080427122727.GO9514@duo.random> <20080429001052.GA8315@duo.random> Message-ID: On Tue, 29 Apr 2008, Andrea Arcangeli wrote: > Frankly I've absolutely no idea why rcu is needed in all rmap code > when walking the page->mapping. Definitely the PG_locked is taken so > there's no way page->mapping could possibly go away under the rmap > code, hence the anon_vma can't go away as it's queued in the vma, and > the vma has to go away before the page is zapped out of the pte. zap_pte_range can race with the rmap code and it does not take the page lock. The page may not go away since a refcount was taken but the mapping can go away. Without RCU you have no guarantee that the anon_vma is existing when you take the lock. How long were you away from VM development? > Now the double atomic op may not be horrible when not contented, as it > works on the same cacheline but with cacheline bouncing with > contention it sounds doubly horrible than a single cacheline bounce > and I don't see the point of it as you can't use rcu anyways, so you > can't possibly take advantage of whatever microoptimization done over > the original locking. Cachelines are acquired for exclusive use for a mininum duration. Multiple atomic operations can be performed after a cacheline becomes exclusive without danger of bouncing. From gstreiff at neteffect.com Mon Apr 28 21:24:04 2008 From: gstreiff at neteffect.com (Glenn Streiff) Date: Mon, 28 Apr 2008 23:24:04 -0500 Subject: [ofa-general] [ PATCH 1/3 ] RDMA/nes LRO enablement Message-ID: <200804290424.m3T4O4i4018169@velma.neteffect.com> From: Faisal Latif Adding Large Receive Offload (LRO) enablement to iw_nes module. Signed-off-by: Faisal Latif --- drivers/infiniband/hw/nes/Kconfig | 1 + drivers/infiniband/hw/nes/nes.c | 4 +++ drivers/infiniband/hw/nes/nes.h | 1 + drivers/infiniband/hw/nes/nes_hw.c | 53 ++++++++++++++++++++++++++++++----- drivers/infiniband/hw/nes/nes_hw.h | 11 ++++++- drivers/infiniband/hw/nes/nes_nic.c | 12 +++++++- 6 files changed, 70 insertions(+), 12 deletions(-) diff --git a/drivers/infiniband/hw/nes/Kconfig b/drivers/infiniband/hw/nes/Kconfig index 2aeb7ac..d449eb6 100644 --- a/drivers/infiniband/hw/nes/Kconfig +++ b/drivers/infiniband/hw/nes/Kconfig @@ -2,6 +2,7 @@ config INFINIBAND_NES tristate "NetEffect RNIC Driver" depends on PCI && INET && INFINIBAND select LIBCRC32C + select INET_LRO ---help--- This is a low-level driver for NetEffect RDMA enabled Network Interface Cards (RNIC). diff --git a/drivers/infiniband/hw/nes/nes.c b/drivers/infiniband/hw/nes/nes.c index a4e9269..9f7364a 100644 --- a/drivers/infiniband/hw/nes/nes.c +++ b/drivers/infiniband/hw/nes/nes.c @@ -91,6 +91,10 @@ unsigned int nes_debug_level = 0; module_param_named(debug_level, nes_debug_level, uint, 0644); MODULE_PARM_DESC(debug_level, "Enable debug output level"); +unsigned int nes_lro_max_aggr = NES_LRO_MAX_AGGR; +module_param(nes_lro_max_aggr, int, NES_LRO_MAX_AGGR); +MODULE_PARM_DESC(nes_mro_max_aggr, " nic LRO MAX packet aggregation"); + LIST_HEAD(nes_adapter_list); static LIST_HEAD(nes_dev_list); diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h index cdf2e9a..484b5e3 100644 --- a/drivers/infiniband/hw/nes/nes.h +++ b/drivers/infiniband/hw/nes/nes.h @@ -173,6 +173,7 @@ extern int disable_mpa_crc; extern unsigned int send_first; extern unsigned int nes_drv_opt; extern unsigned int nes_debug_level; +extern unsigned int nes_lro_max_aggr; extern struct list_head nes_adapter_list; diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index 08964cc..197eee9 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -38,6 +38,7 @@ #include #include #include #include +#include #include "nes.h" @@ -1375,6 +1376,25 @@ static void nes_rq_wqes_timeout(unsigned } +static int nes_lro_get_skb_hdr(struct sk_buff *skb, void **iphdr, + void **tcph, u64 *hdr_flags, void *priv) +{ + unsigned int ip_len; + struct iphdr *iph; + skb_reset_network_header(skb); + iph = ip_hdr(skb); + if (iph->protocol != IPPROTO_TCP) + return -1; + ip_len = ip_hdrlen(skb); + skb_set_transport_header(skb, ip_len); + *tcph = tcp_hdr(skb); + + *hdr_flags = LRO_IPV4 | LRO_TCP; + *iphdr = iph; + return 0; +} + + /** * nes_init_nic_qp */ @@ -1592,15 +1612,21 @@ int nes_init_nic_qp(struct nes_device *n nesvnic->rq_wqes_timer.function = nes_rq_wqes_timeout; nesvnic->rq_wqes_timer.data = (unsigned long)nesvnic; nes_debug(NES_DBG_INIT, "NAPI support Enabled\n"); - if (nesdev->nesadapter->et_use_adaptive_rx_coalesce) { nes_nic_init_timer(nesdev); if (netdev->mtu > 1500) jumbomode = 1; - nes_nic_init_timer_defaults(nesdev, jumbomode); - } - + nes_nic_init_timer_defaults(nesdev, jumbomode); + } + nesvnic->lro_mgr.max_aggr = NES_LRO_MAX_AGGR; + nesvnic->lro_mgr.max_desc = NES_MAX_LRO_DESCRIPTORS; + nesvnic->lro_mgr.lro_arr = nesvnic->lro_desc; + nesvnic->lro_mgr.get_skb_header = nes_lro_get_skb_hdr; + nesvnic->lro_mgr.features = LRO_F_NAPI | LRO_F_EXTRACT_VLAN_ID; + nesvnic->lro_mgr.dev = netdev; + nesvnic->lro_mgr.ip_summed = CHECKSUM_UNNECESSARY; + nesvnic->lro_mgr.ip_summed_aggr = CHECKSUM_UNNECESSARY; return 0; } @@ -2254,10 +2280,13 @@ void nes_nic_ce_handler(struct nes_devic u16 pkt_type; u16 rqes_processed = 0; u8 sq_cqes = 0; + u8 nes_use_lro = 0; head = cq->cq_head; cq_size = cq->cq_size; cq->cqes_pending = 1; + if (nesvnic->netdev->features & NETIF_F_LRO) + nes_use_lro = 1; do { if (le32_to_cpu(cq->cq_vbase[head].cqe_words[NES_NIC_CQE_MISC_IDX]) & NES_NIC_CQE_VALID) { @@ -2379,9 +2408,16 @@ void nes_nic_ce_handler(struct nes_devic >> 16); nes_debug(NES_DBG_CQ, "%s: Reporting stripped VLAN packet. Tag = 0x%04X\n", nesvnic->netdev->name, vlan_tag); - nes_vlan_rx(rx_skb, nesvnic->vlan_grp, vlan_tag); + if (nes_use_lro) + lro_vlan_hwaccel_receive_skb(&nesvnic->lro_mgr, rx_skb, + nesvnic->vlan_grp, vlan_tag, NULL); + else + nes_vlan_rx(rx_skb, nesvnic->vlan_grp, vlan_tag); } else { - nes_netif_rx(rx_skb); + if (nes_use_lro) + lro_receive_skb(&nesvnic->lro_mgr, rx_skb, NULL); + else + nes_netif_rx(rx_skb); } } @@ -2413,13 +2449,14 @@ void nes_nic_ce_handler(struct nes_devic } while (1); + if (nes_use_lro) + lro_flush_all(&nesvnic->lro_mgr); if (sq_cqes) { barrier(); /* restart the queue if it had been stopped */ if (netif_queue_stopped(nesvnic->netdev)) netif_wake_queue(nesvnic->netdev); } - cq->cq_head = head; /* nes_debug(NES_DBG_CQ, "CQ%u Processed = %u cqes, new head = %u.\n", cq->cq_number, cqe_count, cq->cq_head); */ @@ -2432,7 +2469,7 @@ void nes_nic_ce_handler(struct nes_devic } if (atomic_read(&nesvnic->rx_skbs_needed)) nes_replenish_nic_rq(nesvnic); - } +} /** diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h index 8f36e23..1363995 100644 --- a/drivers/infiniband/hw/nes/nes_hw.h +++ b/drivers/infiniband/hw/nes/nes_hw.h @@ -33,6 +33,8 @@ #ifndef __NES_HW_H #define __NES_HW_H +#include + #define NES_PHY_TYPE_1G 2 #define NES_PHY_TYPE_IRIS 3 #define NES_PHY_TYPE_PUMA_10G 6 @@ -982,8 +984,10 @@ struct nes_hw_tune_timer { #define NES_TIMER_INT_LIMIT 2 #define NES_TIMER_INT_LIMIT_DYNAMIC 10 #define NES_TIMER_ENABLE_LIMIT 4 -#define NES_MAX_LINK_INTERRUPTS 128 -#define NES_MAX_LINK_CHECK 200 +#define NES_MAX_LINK_INTERRUPTS 128 +#define NES_MAX_LINK_CHECK 200 +#define NES_MAX_LRO_DESCRIPTORS 32 +#define NES_LRO_MAX_AGGR 64 struct nes_adapter { u64 fw_ver; @@ -1183,6 +1187,9 @@ struct nes_vnic { u8 of_device_registered; u8 rdma_enabled; u8 rx_checksum_disabled; + u32 lro_max_aggr; + struct net_lro_mgr lro_mgr; + struct net_lro_desc lro_desc[ NES_MAX_LRO_DESCRIPTORS ]; }; struct nes_ib_device { diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c index e5366b0..6998af0 100644 --- a/drivers/infiniband/hw/nes/nes_nic.c +++ b/drivers/infiniband/hw/nes/nes_nic.c @@ -936,8 +936,7 @@ static int nes_netdev_change_mtu(struct return ret; } -#define NES_ETHTOOL_STAT_COUNT 55 -static const char nes_ethtool_stringset[NES_ETHTOOL_STAT_COUNT][ETH_GSTRING_LEN] = { +static const char nes_ethtool_stringset[][ETH_GSTRING_LEN] = { "Link Change Interrupts", "Linearized SKBs", "T/GSO Requests", @@ -993,8 +992,12 @@ static const char nes_ethtool_stringset[ "CQ Depth 32", "CQ Depth 128", "CQ Depth 256", + "LRO aggregated", + "LRO flushed", + "LRO no_desc", }; +#define NES_ETHTOOL_STAT_COUNT ARRAY_SIZE(nes_ethtool_stringset) /** * nes_netdev_get_rx_csum @@ -1189,6 +1192,9 @@ static void nes_netdev_get_ethtool_stats target_stat_values[52] = int_mod_cq_depth_32; target_stat_values[53] = int_mod_cq_depth_128; target_stat_values[54] = int_mod_cq_depth_256; + target_stat_values[55] = nesvnic->lro_mgr.stats.aggregated; + target_stat_values[56] = nesvnic->lro_mgr.stats.flushed; + target_stat_values[57] = nesvnic->lro_mgr.stats.no_desc; } @@ -1454,6 +1460,8 @@ static struct ethtool_ops nes_ethtool_op .set_sg = ethtool_op_set_sg, .get_tso = ethtool_op_get_tso, .set_tso = ethtool_op_set_tso, + .get_flags = ethtool_op_get_flags, + .set_flags = ethtool_op_set_flags, }; From gstreiff at neteffect.com Mon Apr 28 21:25:46 2008 From: gstreiff at neteffect.com (Glenn Streiff) Date: Mon, 28 Apr 2008 23:25:46 -0500 Subject: [ofa-general] [ PATCH 2/3 ] RDMA/nes SFP+ enablement Message-ID: <200804290425.m3T4PkKq018184@velma.neteffect.com> From: Eric Schneider This patch enables the iw_nes module for NetEffect RNICs to support additional PHYs including SFP+ optical transceivers (referred to as ARGUS in the code). Signed-off-by: Eric Schneider Signed-off-by: Glenn Streiff --- drivers/infiniband/hw/nes/nes.h | 4 - drivers/infiniband/hw/nes/nes_hw.c | 210 ++++++++++++++++++++++++++++----- drivers/infiniband/hw/nes/nes_hw.h | 6 + drivers/infiniband/hw/nes/nes_nic.c | 69 +++++++---- drivers/infiniband/hw/nes/nes_utils.c | 10 -- 5 files changed, 237 insertions(+), 62 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h index 484b5e3..1f9f7bf 100644 --- a/drivers/infiniband/hw/nes/nes.h +++ b/drivers/infiniband/hw/nes/nes.h @@ -536,8 +536,8 @@ int nes_register_ofa_device(struct nes_i int nes_read_eeprom_values(struct nes_device *, struct nes_adapter *); void nes_write_1G_phy_reg(struct nes_device *, u8, u8, u16); void nes_read_1G_phy_reg(struct nes_device *, u8, u8, u16 *); -void nes_write_10G_phy_reg(struct nes_device *, u16, u8, u16); -void nes_read_10G_phy_reg(struct nes_device *, u16, u8); +void nes_write_10G_phy_reg(struct nes_device *, u16, u8, u16, u16); +void nes_read_10G_phy_reg(struct nes_device *, u8, u8, u16); struct nes_cqp_request *nes_get_cqp_request(struct nes_device *); void nes_post_cqp_request(struct nes_device *, struct nes_cqp_request *, int); int nes_arp_table(struct nes_device *, u32, u8 *, u32); diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index 197eee9..19f2a5b 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -1208,11 +1208,15 @@ int nes_init_phy(struct nes_device *nesd { struct nes_adapter *nesadapter = nesdev->nesadapter; u32 counter = 0; + u32 sds_common_control0; u32 mac_index = nesdev->mac_index; - u32 tx_config; + u32 tx_config = 0; u16 phy_data; + u32 temp_phy_data = 0; + u32 temp_phy_data2 = 0; + u32 i =0; - if (nesadapter->OneG_Mode) { + if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[mac_index] != NES_PHY_TYPE_PUMA_1G)) { nes_debug(NES_DBG_PHY, "1G PHY, mac_index = %d.\n", mac_index); if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_1G) { printk(PFX "%s: Programming mdc config for 1G\n", __func__); @@ -1278,12 +1282,108 @@ int nes_init_phy(struct nes_device *nesd nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], &phy_data); nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], phy_data | 0x0300); } else { - if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_IRIS) { + if ((nesadapter->phy_type[mac_index] == NES_PHY_TYPE_IRIS) || (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_ARGUS)) { /* setup 10G MDIO operation */ tx_config = nes_read_indexed(nesdev, NES_IDX_MAC_TX_CONFIG); tx_config |= 0x14; nes_write_indexed(nesdev, NES_IDX_MAC_TX_CONFIG, tx_config); } + if ((nesadapter->phy_type[mac_index] == NES_PHY_TYPE_ARGUS)) { + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee); + + temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + mdelay(10); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee); + temp_phy_data2 = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + + /* if firmware is already running (like from a driver un-load/load, don't do anything. */ + if (temp_phy_data == temp_phy_data2) { + /* configure QT2505 AMCC PHY */ + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0x0000, 0x8000); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc300, 0x0000); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc302, 0x0044); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc318, 0x0052); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc319, 0x0008); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc31a, 0x0098); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0026, 0x0E00); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0027, 0x0000); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0028, 0xA528); + + //remove micro from reset; chip boots from ROM, uploads EEPROM f/w image, uC executes f/w + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc300, 0x0002); + + //wait for heart beat to start to know loading is done + counter = 0; + do { + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee); + temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + if (counter++ > 1000) { + nes_debug(NES_DBG_PHY, "AMCC PHY- breaking from heartbeat check \n"); + break; + } + mdelay(100); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee); + temp_phy_data2 = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + } while ( (temp_phy_data2 == temp_phy_data) ); + + + //wait for tracking to start to know f/w is good to go. + counter = 0; + do { + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7fd); + temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + if (counter++ > 1000) { + nes_debug(NES_DBG_PHY, "AMCC PHY- breaking from status check \n"); + break; + } + mdelay(1000); +// nes_debug(NES_DBG_PHY, "AMCC PHY- phy_status not ready yet = 0x%02X\n", temp_phy_data); + } while ( ((temp_phy_data & 0xff) != 0x50) && ((temp_phy_data & 0xff) != 0x70) ); + + + + + //set LOS Control invert RXLOSB_I_PADINV + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd003, 0x0000); + //set LOS Control to mask of RXLOSB_I + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc314, 0x0042); + //set LED1 to input mode (LED1 and LED2 share same LED) + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd006, 0x0007); + //set LED2 to RX link_status and activity + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd007, 0x000A); + //set LED3 to RX link_status + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd008, 0x0009); + + // reset the res-calibration on t2 serdes, ensures it is stable after the amcc phy is stable. + + sds_common_control0 = nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0); + sds_common_control0 |= 0x1; + nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0, sds_common_control0); + + //release the res-calibration reset. + sds_common_control0 &= 0xfffffffe; + nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0, sds_common_control0); + + + i=0; + while (((nes_read32(nesdev->regs+NES_SOFTWARE_RESET) & 0x00000040) != 0x00000040) && (i++ < 5000)) { + /* mdelay(1); */ + } + + + + // wait for link train done before moving on, or will get an interupt storm + counter = 0; + do { + temp_phy_data = nes_read_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 +(0x200*(nesdev->mac_index&1) )); + if (counter++ > 1000) { + nes_debug(NES_DBG_PHY, "AMCC PHY- breaking from link train wait \n"); + break; + } + mdelay(1); + } while ( ((temp_phy_data & 0x0f1f0000) != 0x0f0f0000) ); + } + } } return 0; } @@ -2107,6 +2207,8 @@ static void nes_process_mac_intr(struct u32 u32temp; u16 phy_data; u16 temp_phy_data; + u32 pcs_val = 0x0f0f0000; + u32 pcs_mask = 0x0f1f0000; spin_lock_irqsave(&nesadapter->phy_lock, flags); if (nesadapter->mac_sw_state[mac_number] != NES_MAC_SW_IDLE) { @@ -2170,13 +2272,29 @@ static void nes_process_mac_intr(struct nes_debug(NES_DBG_PHY, "Eth SERDES Common Status: 0=0x%08X, 1=0x%08X\n", nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_STATUS0), nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_STATUS0+0x200)); - pcs_control_status = nes_read_indexed(nesdev, - NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index&1)*0x200)); - pcs_control_status = nes_read_indexed(nesdev, - NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index&1)*0x200)); + + if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_PUMA_1G) { + switch (mac_index) { + case 1: + case 3: + pcs_control_status = nes_read_indexed(nesdev, + NES_IDX_PHY_PCS_CONTROL_STATUS0 + 0x200); + break; + default: + pcs_control_status = nes_read_indexed(nesdev, + NES_IDX_PHY_PCS_CONTROL_STATUS0); + break; + } + } else { + pcs_control_status = nes_read_indexed(nesdev, + NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index&1)*0x200)); + pcs_control_status = nes_read_indexed(nesdev, + NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index&1)*0x200)); + } + nes_debug(NES_DBG_PHY, "PCS PHY Control/Status%u: 0x%08X\n", mac_index, pcs_control_status); - if (nesadapter->OneG_Mode) { + if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[mac_index] != NES_PHY_TYPE_PUMA_1G)) { u32temp = 0x01010000; if (nesadapter->port_count > 2) { u32temp |= 0x02020000; @@ -2185,24 +2303,58 @@ static void nes_process_mac_intr(struct phy_data = 0; nes_debug(NES_DBG_PHY, "PCS says the link is down\n"); } - } else if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_IRIS) { - nes_read_10G_phy_reg(nesdev, 1, nesadapter->phy_index[mac_index]); - temp_phy_data = (u16)nes_read_indexed(nesdev, - NES_IDX_MAC_MDIO_CONTROL); - u32temp = 20; - do { - nes_read_10G_phy_reg(nesdev, 1, nesadapter->phy_index[mac_index]); - phy_data = (u16)nes_read_indexed(nesdev, - NES_IDX_MAC_MDIO_CONTROL); - if ((phy_data == temp_phy_data) || (!(--u32temp))) - break; - temp_phy_data = phy_data; - } while (1); - nes_debug(NES_DBG_PHY, "%s: Phy data = 0x%04X, link was %s.\n", - __func__, phy_data, nesadapter->mac_link_down ? "DOWN" : "UP"); - } else { - phy_data = (0x0f0f0000 == (pcs_control_status & 0x0f1f0000)) ? 4 : 0; + switch (nesadapter->phy_type[mac_index]) { + case NES_PHY_TYPE_IRIS: + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1); + temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + u32temp = 20; + do { + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1); + phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + if ((phy_data == temp_phy_data) || (!(--u32temp))) + break; + temp_phy_data = phy_data; + } while (1); + nes_debug(NES_DBG_PHY, "%s: Phy data = 0x%04X, link was %s.\n", + __func__, phy_data, nesadapter->mac_link_down[mac_index] ? "DOWN" : "UP"); + break; + case NES_PHY_TYPE_ARGUS: + //clear the alarms. + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0x0008); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc001); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc002); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc005); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc006); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 0x9003); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 0x9004); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 0x9005); + //check link status + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1); + temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + u32temp = 100; + do { + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1); + + phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + if ((phy_data == temp_phy_data) || (!(--u32temp))) + break; + temp_phy_data = phy_data; + } while (1); + nes_debug(NES_DBG_PHY, "%s: Phy data = 0x%04X, link was %s.\n", + __func__, phy_data, nesadapter->mac_link_down ? "DOWN" : "UP"); + break; + case NES_PHY_TYPE_PUMA_1G: + if (mac_index < 2) { + pcs_val = pcs_mask = 0x01010000; + } else { + pcs_val = pcs_mask = 0x02020000; + } + /* fall through */ + default: + phy_data = (pcs_val == (pcs_control_status & pcs_mask)) ? 0x4 : 0x0; + break; + } } if (phy_data & 0x0004) { @@ -2211,8 +2363,8 @@ static void nes_process_mac_intr(struct nes_debug(NES_DBG_PHY, "The Link is UP!!. linkup was %d\n", nesvnic->linkup); if (nesvnic->linkup == 0) { - printk(PFX "The Link is now up for port %u, netdev %p.\n", - mac_index, nesvnic->netdev); + printk(PFX "The Link is now up for port %s, netdev %p.\n", + nesvnic->netdev->name, nesvnic->netdev); if (netif_queue_stopped(nesvnic->netdev)) netif_start_queue(nesvnic->netdev); nesvnic->linkup = 1; @@ -2225,8 +2377,8 @@ static void nes_process_mac_intr(struct nes_debug(NES_DBG_PHY, "The Link is Down!!. linkup was %d\n", nesvnic->linkup); if (nesvnic->linkup == 1) { - printk(PFX "The Link is now down for port %u, netdev %p.\n", - mac_index, nesvnic->netdev); + printk(PFX "The Link is now down for port %s, netdev %p.\n", + nesvnic->netdev->name, nesvnic->netdev); if (!(netif_queue_stopped(nesvnic->netdev))) netif_stop_queue(nesvnic->netdev); nesvnic->linkup = 0; diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h index 1363995..7d47f92 100644 --- a/drivers/infiniband/hw/nes/nes_hw.h +++ b/drivers/infiniband/hw/nes/nes_hw.h @@ -35,8 +35,10 @@ #define __NES_HW_H #include -#define NES_PHY_TYPE_1G 2 -#define NES_PHY_TYPE_IRIS 3 +#define NES_PHY_TYPE_1G 2 +#define NES_PHY_TYPE_IRIS 3 +#define NES_PHY_TYPE_ARGUS 4 +#define NES_PHY_TYPE_PUMA_1G 5 #define NES_PHY_TYPE_PUMA_10G 6 #define NES_MULTICAST_PF_MAX 8 diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c index 6998af0..5ba9dd3 100644 --- a/drivers/infiniband/hw/nes/nes_nic.c +++ b/drivers/infiniband/hw/nes/nes_nic.c @@ -1377,21 +1377,31 @@ static int nes_netdev_get_settings(struc et_cmd->duplex = DUPLEX_FULL; et_cmd->port = PORT_MII; + if (nesadapter->OneG_Mode) { - et_cmd->supported = SUPPORTED_1000baseT_Full|SUPPORTED_Autoneg; - et_cmd->advertising = ADVERTISED_1000baseT_Full|ADVERTISED_Autoneg; et_cmd->speed = SPEED_1000; - nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[nesdev->mac_index], - &phy_data); - if (phy_data&0x1000) { - et_cmd->autoneg = AUTONEG_ENABLE; - } else { + if (nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_PUMA_1G) { + et_cmd->supported = SUPPORTED_1000baseT_Full; + et_cmd->advertising = ADVERTISED_1000baseT_Full; et_cmd->autoneg = AUTONEG_DISABLE; + et_cmd->transceiver = XCVR_INTERNAL; + et_cmd->phy_address = nesdev->mac_index; + } else { + et_cmd->supported = SUPPORTED_1000baseT_Full|SUPPORTED_Autoneg; + et_cmd->advertising = ADVERTISED_1000baseT_Full|ADVERTISED_Autoneg; + nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[nesdev->mac_index], + &phy_data); + if (phy_data&0x1000) { + et_cmd->autoneg = AUTONEG_ENABLE; + } else { + et_cmd->autoneg = AUTONEG_DISABLE; + } + et_cmd->transceiver = XCVR_EXTERNAL; + et_cmd->phy_address = nesadapter->phy_index[nesdev->mac_index]; } - et_cmd->transceiver = XCVR_EXTERNAL; - et_cmd->phy_address = nesadapter->phy_index[nesdev->mac_index]; } else { - if (nesadapter->phy_type[nesvnic->logical_port] == NES_PHY_TYPE_IRIS) { + if ( (nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_IRIS) || + (nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_ARGUS) ) { et_cmd->transceiver = XCVR_EXTERNAL; et_cmd->port = PORT_FIBRE; et_cmd->supported = SUPPORTED_FIBRE; @@ -1422,7 +1432,7 @@ static int nes_netdev_set_settings(struc struct nes_adapter *nesadapter = nesdev->nesadapter; u16 phy_data; - if (nesadapter->OneG_Mode) { + if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[nesdev->mac_index] != NES_PHY_TYPE_PUMA_1G)) { nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[nesdev->mac_index], &phy_data); if (et_cmd->autoneg) { @@ -1615,27 +1625,34 @@ struct net_device *nes_netdev_init(struc list_add_tail(&nesvnic->list, &nesdev->nesadapter->nesvnic_list[nesdev->mac_index]); if ((nesdev->netdev_count == 0) && - (PCI_FUNC(nesdev->pcidev->devfn) == nesdev->mac_index)) { + ((PCI_FUNC(nesdev->pcidev->devfn) == nesdev->mac_index) || + ((nesdev->nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_PUMA_1G) && + (((PCI_FUNC(nesdev->pcidev->devfn) == 1) && (nesdev->mac_index == 2)) || + ((PCI_FUNC(nesdev->pcidev->devfn) == 2) && (nesdev->mac_index == 1)) ) ) ) ) { +/* PUMA HACK nes_debug(NES_DBG_INIT, "Setting up PHY interrupt mask. Using register index 0x%04X\n", NES_IDX_PHY_PCS_CONTROL_STATUS0+(0x200*(nesvnic->logical_port&1))); +*/ u32temp = nes_read_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 + - (0x200*(nesvnic->logical_port&1))); - u32temp |= 0x00200000; - nes_write_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 + - (0x200*(nesvnic->logical_port&1)), u32temp); + (0x200*(nesdev->mac_index&1))); + if (nesdev->nesadapter->phy_type[nesdev->mac_index] != NES_PHY_TYPE_PUMA_1G) { + u32temp |= 0x00200000; + nes_write_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 + + (0x200*(nesdev->mac_index&1)), u32temp); + } + u32temp = nes_read_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 + - (0x200*(nesvnic->logical_port&1)) ); + (0x200*(nesdev->mac_index&1)) ); + if ((u32temp&0x0f1f0000) == 0x0f0f0000) { - if (nesdev->nesadapter->phy_type[nesvnic->logical_port] == NES_PHY_TYPE_IRIS) { + if (nesdev->nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_IRIS) { nes_init_phy(nesdev); - nes_read_10G_phy_reg(nesdev, 1, - nesdev->nesadapter->phy_index[nesvnic->logical_port]); + nes_read_10G_phy_reg(nesdev, nesdev->nesadapter->phy_index[nesdev->mac_index], 1, 1); temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); u32temp = 20; do { - nes_read_10G_phy_reg(nesdev, 1, - nesdev->nesadapter->phy_index[nesvnic->logical_port]); + nes_read_10G_phy_reg(nesdev, nesdev->nesadapter->phy_index[nesdev->mac_index], 1, 1); phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); if ((phy_data == temp_phy_data) || (!(--u32temp))) @@ -1652,6 +1669,14 @@ struct net_device *nes_netdev_init(struc nes_debug(NES_DBG_INIT, "The Link is UP!!.\n"); nesvnic->linkup = 1; } + } else if (nesdev->nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_PUMA_1G) { + nes_debug(NES_DBG_INIT, "mac_index=%d, logical_port=%d, u32temp=0x%04X, PCI_FUNC=%d\n", + nesdev->mac_index, nesvnic->logical_port, u32temp, PCI_FUNC(nesdev->pcidev->devfn)); + if (((nesdev->mac_index < 2) && ((u32temp&0x01010000) == 0x01010000) ) || + ((nesdev->mac_index > 1) && ((u32temp&0x02020000) == 0x02020000) ) ) { + nes_debug(NES_DBG_INIT, "The Link is UP!!.\n"); + nesvnic->linkup = 1; + } } /* clear the MAC interrupt status, assumes direct logical to physical mapping */ u32temp = nes_read_indexed(nesdev, NES_IDX_MAC_INT_STATUS + (0x200 * nesdev->mac_index)); diff --git a/drivers/infiniband/hw/nes/nes_utils.c b/drivers/infiniband/hw/nes/nes_utils.c index c6d5631..fe83d1b 100644 --- a/drivers/infiniband/hw/nes/nes_utils.c +++ b/drivers/infiniband/hw/nes/nes_utils.c @@ -444,15 +444,13 @@ void nes_read_1G_phy_reg(struct nes_devi /** * nes_write_10G_phy_reg */ -void nes_write_10G_phy_reg(struct nes_device *nesdev, u16 phy_reg, - u8 phy_addr, u16 data) +void nes_write_10G_phy_reg(struct nes_device *nesdev, u16 phy_addr, u8 dev_addr, u16 phy_reg, + u16 data) { - u32 dev_addr; u32 port_addr; u32 u32temp; u32 counter; - dev_addr = 1; port_addr = phy_addr; /* set address */ @@ -492,14 +490,12 @@ void nes_write_10G_phy_reg(struct nes_de * This routine only issues the read, the data must be read * separately. */ -void nes_read_10G_phy_reg(struct nes_device *nesdev, u16 phy_reg, u8 phy_addr) +void nes_read_10G_phy_reg(struct nes_device *nesdev, u8 phy_addr, u8 dev_addr, u16 phy_reg) { - u32 dev_addr; u32 port_addr; u32 u32temp; u32 counter; - dev_addr = 1; port_addr = phy_addr; /* set address */ From gstreiff at neteffect.com Mon Apr 28 21:26:43 2008 From: gstreiff at neteffect.com (Glenn Streiff) Date: Mon, 28 Apr 2008 23:26:43 -0500 Subject: [ofa-general] [ PATCH 3/3 ] RDMA/nes SFP+ cleanup Message-ID: <200804290426.m3T4QhJl018196@velma.neteffect.com> Clean up the SFP+ patch. Signed-off-by: Glenn Streiff --- drivers/infiniband/hw/nes/nes_hw.c | 279 ++++++++++++++++++----------------- drivers/infiniband/hw/nes/nes_nic.c | 63 ++++---- 2 files changed, 178 insertions(+), 164 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index 19f2a5b..dce2d66 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -1214,9 +1214,9 @@ int nes_init_phy(struct nes_device *nesd u16 phy_data; u32 temp_phy_data = 0; u32 temp_phy_data2 = 0; - u32 i =0; + u32 i = 0; - if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[mac_index] != NES_PHY_TYPE_PUMA_1G)) { + if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[ mac_index ] != NES_PHY_TYPE_PUMA_1G)) { nes_debug(NES_DBG_PHY, "1G PHY, mac_index = %d.\n", mac_index); if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_1G) { printk(PFX "%s: Programming mdc config for 1G\n", __func__); @@ -1225,17 +1225,17 @@ int nes_init_phy(struct nes_device *nesd nes_write_indexed(nesdev, NES_IDX_MAC_TX_CONFIG, tx_config); } - nes_read_1G_phy_reg(nesdev, 1, nesadapter->phy_index[mac_index], &phy_data); + nes_read_1G_phy_reg(nesdev, 1, nesadapter->phy_index[ mac_index ], &phy_data); nes_debug(NES_DBG_PHY, "Phy data from register 1 phy address %u = 0x%X.\n", - nesadapter->phy_index[mac_index], phy_data); - nes_write_1G_phy_reg(nesdev, 23, nesadapter->phy_index[mac_index], 0xb000); + nesadapter->phy_index[ mac_index ], phy_data); + nes_write_1G_phy_reg(nesdev, 23, nesadapter->phy_index[ mac_index ], 0xb000); /* Reset the PHY */ - nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], 0x8000); + nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[ mac_index ], 0x8000); udelay(100); counter = 0; do { - nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], &phy_data); + nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[ mac_index ], &phy_data); nes_debug(NES_DBG_PHY, "Phy data from register 0 = 0x%X.\n", phy_data); if (counter++ > 100) break; } while (phy_data & 0x8000); @@ -1243,145 +1243,156 @@ int nes_init_phy(struct nes_device *nesd /* Setting no phy loopback */ phy_data &= 0xbfff; phy_data |= 0x1140; - nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], phy_data); - nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], &phy_data); + nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[ mac_index ], phy_data); + nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[ mac_index ], &phy_data); nes_debug(NES_DBG_PHY, "Phy data from register 0 = 0x%X.\n", phy_data); - nes_read_1G_phy_reg(nesdev, 0x17, nesadapter->phy_index[mac_index], &phy_data); + nes_read_1G_phy_reg(nesdev, 0x17, nesadapter->phy_index[ mac_index ], &phy_data); nes_debug(NES_DBG_PHY, "Phy data from register 0x17 = 0x%X.\n", phy_data); - nes_read_1G_phy_reg(nesdev, 0x1e, nesadapter->phy_index[mac_index], &phy_data); + nes_read_1G_phy_reg(nesdev, 0x1e, nesadapter->phy_index[ mac_index ], &phy_data); nes_debug(NES_DBG_PHY, "Phy data from register 0x1e = 0x%X.\n", phy_data); /* Setting the interrupt mask */ - nes_read_1G_phy_reg(nesdev, 0x19, nesadapter->phy_index[mac_index], &phy_data); + nes_read_1G_phy_reg(nesdev, 0x19, nesadapter->phy_index[ mac_index ], &phy_data); nes_debug(NES_DBG_PHY, "Phy data from register 0x19 = 0x%X.\n", phy_data); - nes_write_1G_phy_reg(nesdev, 0x19, nesadapter->phy_index[mac_index], 0xffee); + nes_write_1G_phy_reg(nesdev, 0x19, nesadapter->phy_index[ mac_index ], 0xffee); - nes_read_1G_phy_reg(nesdev, 0x19, nesadapter->phy_index[mac_index], &phy_data); + nes_read_1G_phy_reg(nesdev, 0x19, nesadapter->phy_index[ mac_index ], &phy_data); nes_debug(NES_DBG_PHY, "Phy data from register 0x19 = 0x%X.\n", phy_data); /* turning on flow control */ - nes_read_1G_phy_reg(nesdev, 4, nesadapter->phy_index[mac_index], &phy_data); + nes_read_1G_phy_reg(nesdev, 4, nesadapter->phy_index[ mac_index ], &phy_data); nes_debug(NES_DBG_PHY, "Phy data from register 0x4 = 0x%X.\n", phy_data); - nes_write_1G_phy_reg(nesdev, 4, nesadapter->phy_index[mac_index], + nes_write_1G_phy_reg(nesdev, 4, nesadapter->phy_index[ mac_index ], (phy_data & ~(0x03E0)) | 0xc00); - /* nes_write_1G_phy_reg(nesdev, 4, nesadapter->phy_index[mac_index], - phy_data | 0xc00); */ - nes_read_1G_phy_reg(nesdev, 4, nesadapter->phy_index[mac_index], &phy_data); + + /* + * nes_write_1G_phy_reg(nesdev, 4, nesadapter->phy_index[ mac_index ], + * phy_data | 0xc00); + */ + nes_read_1G_phy_reg(nesdev, 4, nesadapter->phy_index[ mac_index ], &phy_data); nes_debug(NES_DBG_PHY, "Phy data from register 0x4 = 0x%X.\n", phy_data); - nes_read_1G_phy_reg(nesdev, 9, nesadapter->phy_index[mac_index], &phy_data); + nes_read_1G_phy_reg(nesdev, 9, nesadapter->phy_index[ mac_index ], &phy_data); nes_debug(NES_DBG_PHY, "Phy data from register 0x9 = 0x%X.\n", phy_data); + /* Clear Half duplex */ - nes_write_1G_phy_reg(nesdev, 9, nesadapter->phy_index[mac_index], + nes_write_1G_phy_reg(nesdev, 9, nesadapter->phy_index[ mac_index ], phy_data & ~(0x0100)); - nes_read_1G_phy_reg(nesdev, 9, nesadapter->phy_index[mac_index], &phy_data); + nes_read_1G_phy_reg(nesdev, 9, nesadapter->phy_index[ mac_index ], &phy_data); nes_debug(NES_DBG_PHY, "Phy data from register 0x9 = 0x%X.\n", phy_data); - nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], &phy_data); - nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], phy_data | 0x0300); + nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[ mac_index ], &phy_data); + nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[ mac_index ], phy_data | 0x0300); } else { - if ((nesadapter->phy_type[mac_index] == NES_PHY_TYPE_IRIS) || (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_ARGUS)) { + if ((nesadapter->phy_type[ mac_index ] == NES_PHY_TYPE_IRIS) || + (nesadapter->phy_type[ mac_index ] == NES_PHY_TYPE_ARGUS)) { /* setup 10G MDIO operation */ tx_config = nes_read_indexed(nesdev, NES_IDX_MAC_TX_CONFIG); tx_config |= 0x14; nes_write_indexed(nesdev, NES_IDX_MAC_TX_CONFIG, tx_config); } - if ((nesadapter->phy_type[mac_index] == NES_PHY_TYPE_ARGUS)) { - nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee); + if ((nesadapter->phy_type[ mac_index ] == NES_PHY_TYPE_ARGUS)) { + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x3, 0xd7ee); temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); mdelay(10); - nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x3, 0xd7ee); temp_phy_data2 = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); /* if firmware is already running (like from a driver un-load/load, don't do anything. */ if (temp_phy_data == temp_phy_data2) { /* configure QT2505 AMCC PHY */ - nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0x0000, 0x8000); - nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc300, 0x0000); - nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc302, 0x0044); - nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc318, 0x0052); - nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc319, 0x0008); - nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc31a, 0x0098); - nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0026, 0x0E00); - nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0027, 0x0000); - nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0028, 0xA528); - - //remove micro from reset; chip boots from ROM, uploads EEPROM f/w image, uC executes f/w - nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc300, 0x0002); - - //wait for heart beat to start to know loading is done + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0x0000, 0x8000); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xc300, 0x0000); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xc302, 0x0044); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xc318, 0x0052); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xc319, 0x0008); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xc31a, 0x0098); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x3, 0x0026, 0x0E00); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x3, 0x0027, 0x0000); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x3, 0x0028, 0xA528); + + /* + * remove micro from reset; chip boots from ROM, + * uploads EEPROM f/w image, uC executes f/w + */ + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xc300, 0x0002); + + /* wait for heart beat to start to know loading is done */ counter = 0; do { - nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x3, 0xd7ee); temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); if (counter++ > 1000) { nes_debug(NES_DBG_PHY, "AMCC PHY- breaking from heartbeat check \n"); break; } mdelay(100); - nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x3, 0xd7ee); temp_phy_data2 = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); } while ( (temp_phy_data2 == temp_phy_data) ); - - //wait for tracking to start to know f/w is good to go. + /* wait for tracking to start to know f/w is good to go */ counter = 0; do { - nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7fd); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x3, 0xd7fd); temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); if (counter++ > 1000) { nes_debug(NES_DBG_PHY, "AMCC PHY- breaking from status check \n"); break; } mdelay(1000); -// nes_debug(NES_DBG_PHY, "AMCC PHY- phy_status not ready yet = 0x%02X\n", temp_phy_data); - } while ( ((temp_phy_data & 0xff) != 0x50) && ((temp_phy_data & 0xff) != 0x70) ); + /* nes_debug(NES_DBG_PHY, "AMCC PHY- phy_status not ready yet = 0x%02X\n", temp_phy_data); */ + } while (((temp_phy_data & 0xff) != 0x50) && ((temp_phy_data & 0xff) != 0x70)); + + /* set LOS Control invert RXLOSB_I_PADINV */ + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xd003, 0x0000); + /* set LOS Control to mask of RXLOSB_I */ + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xc314, 0x0042); + /* set LED1 to input mode (LED1 and LED2 share same LED) */ + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xd006, 0x0007); + /* set LED2 to RX link_status and activity */ + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xd007, 0x000A); - //set LOS Control invert RXLOSB_I_PADINV - nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd003, 0x0000); - //set LOS Control to mask of RXLOSB_I - nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc314, 0x0042); - //set LED1 to input mode (LED1 and LED2 share same LED) - nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd006, 0x0007); - //set LED2 to RX link_status and activity - nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd007, 0x000A); - //set LED3 to RX link_status - nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd008, 0x0009); + /* set LED3 to RX link_status */ + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xd008, 0x0009); - // reset the res-calibration on t2 serdes, ensures it is stable after the amcc phy is stable. + /* + * reset the res-calibration on t2 serdes, ensures it is stable + * after the amcc phy is stable. + */ - sds_common_control0 = nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0); + sds_common_control0 = nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0); sds_common_control0 |= 0x1; nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0, sds_common_control0); - //release the res-calibration reset. + /* release the res-calibration reset */ sds_common_control0 &= 0xfffffffe; nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0, sds_common_control0); - - i=0; - while (((nes_read32(nesdev->regs+NES_SOFTWARE_RESET) & 0x00000040) != 0x00000040) && (i++ < 5000)) { + i = 0; + while (((nes_read32(nesdev->regs + NES_SOFTWARE_RESET) & 0x00000040) != 0x00000040) + && (i++ < 5000)) { /* mdelay(1); */ } - - - // wait for link train done before moving on, or will get an interupt storm + /* wait for link train done before moving on, or will get an interupt storm */ counter = 0; do { - temp_phy_data = nes_read_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 +(0x200*(nesdev->mac_index&1) )); + temp_phy_data = nes_read_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 + + (0x200 * (nesdev->mac_index & 1))); if (counter++ > 1000) { - nes_debug(NES_DBG_PHY, "AMCC PHY- breaking from link train wait \n"); + nes_debug(NES_DBG_PHY, + "AMCC PHY- breaking from link train wait \n"); break; } mdelay(1); - } while ( ((temp_phy_data & 0x0f1f0000) != 0x0f0f0000) ); + } while (((temp_phy_data & 0x0f1f0000) != 0x0f0f0000)); } } } @@ -2271,30 +2282,30 @@ static void nes_process_mac_intr(struct } nes_debug(NES_DBG_PHY, "Eth SERDES Common Status: 0=0x%08X, 1=0x%08X\n", nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_STATUS0), - nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_STATUS0+0x200)); + nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_STATUS0 + 0x200)); - if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_PUMA_1G) { + if (nesadapter->phy_type[ mac_index ] == NES_PHY_TYPE_PUMA_1G) { switch (mac_index) { - case 1: - case 3: - pcs_control_status = nes_read_indexed(nesdev, - NES_IDX_PHY_PCS_CONTROL_STATUS0 + 0x200); - break; - default: - pcs_control_status = nes_read_indexed(nesdev, - NES_IDX_PHY_PCS_CONTROL_STATUS0); - break; + case 1: + case 3: + pcs_control_status = nes_read_indexed(nesdev, + NES_IDX_PHY_PCS_CONTROL_STATUS0 + 0x200); + break; + default: + pcs_control_status = nes_read_indexed(nesdev, + NES_IDX_PHY_PCS_CONTROL_STATUS0); + break; } } else { pcs_control_status = nes_read_indexed(nesdev, - NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index&1)*0x200)); + NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index & 1) * 0x200)); pcs_control_status = nes_read_indexed(nesdev, - NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index&1)*0x200)); + NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index & 1) * 0x200)); } nes_debug(NES_DBG_PHY, "PCS PHY Control/Status%u: 0x%08X\n", mac_index, pcs_control_status); - if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[mac_index] != NES_PHY_TYPE_PUMA_1G)) { + if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[ mac_index ] != NES_PHY_TYPE_PUMA_1G)) { u32temp = 0x01010000; if (nesadapter->port_count > 2) { u32temp |= 0x02020000; @@ -2304,56 +2315,58 @@ static void nes_process_mac_intr(struct nes_debug(NES_DBG_PHY, "PCS says the link is down\n"); } } else { - switch (nesadapter->phy_type[mac_index]) { - case NES_PHY_TYPE_IRIS: - nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1); - temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); - u32temp = 20; - do { - nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1); - phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); - if ((phy_data == temp_phy_data) || (!(--u32temp))) - break; - temp_phy_data = phy_data; - } while (1); - nes_debug(NES_DBG_PHY, "%s: Phy data = 0x%04X, link was %s.\n", - __func__, phy_data, nesadapter->mac_link_down[mac_index] ? "DOWN" : "UP"); - break; - case NES_PHY_TYPE_ARGUS: - //clear the alarms. - nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0x0008); - nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc001); - nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc002); - nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc005); - nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc006); - nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 0x9003); - nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 0x9004); - nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 0x9005); - //check link status + switch (nesadapter->phy_type[ mac_index ]) { + case NES_PHY_TYPE_IRIS: + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 1, 1); + temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + u32temp = 20; + do { + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 1, 1); + phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + if ((phy_data == temp_phy_data) || (!(--u32temp))) + break; + temp_phy_data = phy_data; + } while (1); + nes_debug(NES_DBG_PHY, "%s: Phy data = 0x%04X, link was %s.\n", + __func__, phy_data, nesadapter->mac_link_down[mac_index] ? "DOWN" : "UP"); + break; + + case NES_PHY_TYPE_ARGUS: + //clear the alarms. + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 4, 0x0008); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 4, 0xc001); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 4, 0xc002); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 4, 0xc005); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 4, 0xc006); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 1, 0x9003); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 1, 0x9004); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 1, 0x9005); + //check link status + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 1, 1); + temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + u32temp = 100; + do { nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1); - temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); - u32temp = 100; - do { - nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1); - phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); - if ((phy_data == temp_phy_data) || (!(--u32temp))) - break; - temp_phy_data = phy_data; - } while (1); - nes_debug(NES_DBG_PHY, "%s: Phy data = 0x%04X, link was %s.\n", - __func__, phy_data, nesadapter->mac_link_down ? "DOWN" : "UP"); - break; - case NES_PHY_TYPE_PUMA_1G: - if (mac_index < 2) { - pcs_val = pcs_mask = 0x01010000; - } else { - pcs_val = pcs_mask = 0x02020000; - } - /* fall through */ - default: - phy_data = (pcs_val == (pcs_control_status & pcs_mask)) ? 0x4 : 0x0; - break; + phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + if ((phy_data == temp_phy_data) || (!(--u32temp))) + break; + temp_phy_data = phy_data; + } while (1); + nes_debug(NES_DBG_PHY, "%s: Phy data = 0x%04X, link was %s.\n", + __func__, phy_data, nesadapter->mac_link_down ? "DOWN" : "UP"); + break; + + case NES_PHY_TYPE_PUMA_1G: + if (mac_index < 2) { + pcs_val = pcs_mask = 0x01010000; + } else { + pcs_val = pcs_mask = 0x02020000; + } + /* fall through */ + default: + phy_data = (pcs_val == (pcs_control_status & pcs_mask)) ? 0x4 : 0x0; + break; } } diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c index 5ba9dd3..939887a 100644 --- a/drivers/infiniband/hw/nes/nes_nic.c +++ b/drivers/infiniband/hw/nes/nes_nic.c @@ -1376,20 +1376,20 @@ static int nes_netdev_get_settings(struc u16 phy_data; et_cmd->duplex = DUPLEX_FULL; - et_cmd->port = PORT_MII; + et_cmd->port = PORT_MII; if (nesadapter->OneG_Mode) { et_cmd->speed = SPEED_1000; - if (nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_PUMA_1G) { - et_cmd->supported = SUPPORTED_1000baseT_Full; + if (nesadapter->phy_type[ nesdev->mac_index ] == NES_PHY_TYPE_PUMA_1G) { + et_cmd->supported = SUPPORTED_1000baseT_Full; et_cmd->advertising = ADVERTISED_1000baseT_Full; - et_cmd->autoneg = AUTONEG_DISABLE; + et_cmd->autoneg = AUTONEG_DISABLE; et_cmd->transceiver = XCVR_INTERNAL; et_cmd->phy_address = nesdev->mac_index; } else { - et_cmd->supported = SUPPORTED_1000baseT_Full|SUPPORTED_Autoneg; - et_cmd->advertising = ADVERTISED_1000baseT_Full|ADVERTISED_Autoneg; - nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[nesdev->mac_index], + et_cmd->supported = SUPPORTED_1000baseT_Full | SUPPORTED_Autoneg; + et_cmd->advertising = ADVERTISED_1000baseT_Full | ADVERTISED_Autoneg; + nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[ nesdev->mac_index ], &phy_data); if (phy_data&0x1000) { et_cmd->autoneg = AUTONEG_ENABLE; @@ -1400,20 +1400,20 @@ static int nes_netdev_get_settings(struc et_cmd->phy_address = nesadapter->phy_index[nesdev->mac_index]; } } else { - if ( (nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_IRIS) || - (nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_ARGUS) ) { + if ((nesadapter->phy_type[ nesdev->mac_index ] == NES_PHY_TYPE_IRIS) || + (nesadapter->phy_type[ nesdev->mac_index ] == NES_PHY_TYPE_ARGUS)) { et_cmd->transceiver = XCVR_EXTERNAL; - et_cmd->port = PORT_FIBRE; - et_cmd->supported = SUPPORTED_FIBRE; + et_cmd->port = PORT_FIBRE; + et_cmd->supported = SUPPORTED_FIBRE; et_cmd->advertising = ADVERTISED_FIBRE; - et_cmd->phy_address = nesadapter->phy_index[nesdev->mac_index]; + et_cmd->phy_address = nesadapter->phy_index[ nesdev->mac_index ]; } else { et_cmd->transceiver = XCVR_INTERNAL; - et_cmd->supported = SUPPORTED_10000baseT_Full; + et_cmd->supported = SUPPORTED_10000baseT_Full; et_cmd->advertising = ADVERTISED_10000baseT_Full; et_cmd->phy_address = nesdev->mac_index; } - et_cmd->speed = SPEED_10000; + et_cmd->speed = SPEED_10000; et_cmd->autoneg = AUTONEG_DISABLE; } et_cmd->maxtxpkt = 511; @@ -1432,17 +1432,18 @@ static int nes_netdev_set_settings(struc struct nes_adapter *nesadapter = nesdev->nesadapter; u16 phy_data; - if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[nesdev->mac_index] != NES_PHY_TYPE_PUMA_1G)) { - nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[nesdev->mac_index], + if ((nesadapter->OneG_Mode) && + (nesadapter->phy_type[ nesdev->mac_index ] != NES_PHY_TYPE_PUMA_1G)) { + nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[ nesdev->mac_index ], &phy_data); if (et_cmd->autoneg) { /* Turn on Full duplex, Autoneg, and restart autonegotiation */ phy_data |= 0x1300; } else { - // Turn off autoneg + /* Turn off autoneg */ phy_data &= ~0x1000; } - nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[nesdev->mac_index], + nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[ nesdev->mac_index ], phy_data); } @@ -1628,13 +1629,13 @@ struct net_device *nes_netdev_init(struc ((PCI_FUNC(nesdev->pcidev->devfn) == nesdev->mac_index) || ((nesdev->nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_PUMA_1G) && (((PCI_FUNC(nesdev->pcidev->devfn) == 1) && (nesdev->mac_index == 2)) || - ((PCI_FUNC(nesdev->pcidev->devfn) == 2) && (nesdev->mac_index == 1)) ) ) ) ) { -/* PUMA HACK - nes_debug(NES_DBG_INIT, "Setting up PHY interrupt mask. Using register index 0x%04X\n", - NES_IDX_PHY_PCS_CONTROL_STATUS0+(0x200*(nesvnic->logical_port&1))); -*/ + ((PCI_FUNC(nesdev->pcidev->devfn) == 2) && (nesdev->mac_index == 1)))))){ + /* + * nes_debug(NES_DBG_INIT, "Setting up PHY interrupt mask. Using register index 0x%04X\n", + * NES_IDX_PHY_PCS_CONTROL_STATUS0 + (0x200 * (nesvnic->logical_port & 1))); + */ u32temp = nes_read_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 + - (0x200*(nesdev->mac_index&1))); + (0x200*(nesdev->mac_index & 1))); if (nesdev->nesadapter->phy_type[nesdev->mac_index] != NES_PHY_TYPE_PUMA_1G) { u32temp |= 0x00200000; nes_write_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 + @@ -1645,14 +1646,14 @@ struct net_device *nes_netdev_init(struc (0x200*(nesdev->mac_index&1)) ); if ((u32temp&0x0f1f0000) == 0x0f0f0000) { - if (nesdev->nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_IRIS) { + if (nesdev->nesadapter->phy_type[ nesdev->mac_index ] == NES_PHY_TYPE_IRIS) { nes_init_phy(nesdev); - nes_read_10G_phy_reg(nesdev, nesdev->nesadapter->phy_index[nesdev->mac_index], 1, 1); + nes_read_10G_phy_reg(nesdev, nesdev->nesadapter->phy_index[ nesdev->mac_index ], 1, 1); temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); u32temp = 20; do { - nes_read_10G_phy_reg(nesdev, nesdev->nesadapter->phy_index[nesdev->mac_index], 1, 1); + nes_read_10G_phy_reg(nesdev, nesdev->nesadapter->phy_index[ nesdev->mac_index ], 1, 1); phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); if ((phy_data == temp_phy_data) || (!(--u32temp))) @@ -1669,11 +1670,11 @@ struct net_device *nes_netdev_init(struc nes_debug(NES_DBG_INIT, "The Link is UP!!.\n"); nesvnic->linkup = 1; } - } else if (nesdev->nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_PUMA_1G) { + } else if (nesdev->nesadapter->phy_type[ nesdev->mac_index ] == NES_PHY_TYPE_PUMA_1G) { nes_debug(NES_DBG_INIT, "mac_index=%d, logical_port=%d, u32temp=0x%04X, PCI_FUNC=%d\n", nesdev->mac_index, nesvnic->logical_port, u32temp, PCI_FUNC(nesdev->pcidev->devfn)); - if (((nesdev->mac_index < 2) && ((u32temp&0x01010000) == 0x01010000) ) || - ((nesdev->mac_index > 1) && ((u32temp&0x02020000) == 0x02020000) ) ) { + if (((nesdev->mac_index < 2) && ((u32temp&0x01010000) == 0x01010000)) || + ((nesdev->mac_index > 1) && ((u32temp&0x02020000) == 0x02020000))) { nes_debug(NES_DBG_INIT, "The Link is UP!!.\n"); nesvnic->linkup = 1; } @@ -1683,7 +1684,7 @@ struct net_device *nes_netdev_init(struc nes_debug(NES_DBG_INIT, "Phy interrupt status = 0x%X.\n", u32temp); nes_write_indexed(nesdev, NES_IDX_MAC_INT_STATUS + (0x200 * nesdev->mac_index), u32temp); - if (nesdev->nesadapter->phy_type[nesdev->mac_index] != NES_PHY_TYPE_IRIS) + if (nesdev->nesadapter->phy_type[ nesdev->mac_index ] != NES_PHY_TYPE_IRIS) nes_init_phy(nesdev); } From sashak at voltaire.com Tue Apr 29 01:01:41 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 29 Apr 2008 08:01:41 +0000 Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup. In-Reply-To: <20080428110332.6fb8e1d8.weiny2@llnl.gov> References: <20080423133816.6c1b6315.weiny2@llnl.gov> <20080427171140.GI22406@sashak.voltaire.com> <20080428110332.6fb8e1d8.weiny2@llnl.gov> Message-ID: <20080429080141.GE20790@sashak.voltaire.com> On 11:03 Mon 28 Apr , Ira Weiny wrote: > > Yes I agree. Per my previous mail to Or I found that light sweeps did not in > fact notice the nodes were gone. Looking at the logs I am not sure what > caused OpenSM to notice them. However, something must have triggered a heavy > sweep when those nodes were catatonic. From the logs they were unresponsive > for multiple seconds, some as long as 30s. It is still a bit of a mystery why > OpenSM did a heavy sweep during this period but I don't think it is > unreasonable for it to do so. Could you send me log file? Sasha From jackm at dev.mellanox.co.il Mon Apr 28 23:40:37 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Tue, 29 Apr 2008 09:40:37 +0300 Subject: [ofa-general] Re: [PATCH] mlx4_core: enable changing default max HCA resource limits at run time -- reposting In-Reply-To: References: <200804281438.28417.jackm@dev.mellanox.co.il> Message-ID: <200804290940.37587.jackm@dev.mellanox.co.il> On Monday 28 April 2008 18:50, Roland Dreier wrote: > Hmm... wouldn't it be better to follow the same interface as ib_mthca > and have consumers pass in the numbers instead of the log sizes? Having > two different ways of changing the same parameters seems pretty confusing. > Dotan also mentioned this. Our preference was to change ib_mthca to use logs as well, since this way the user knows exactly what the amounts will be (instead of having a hidden "round up to next or equal power of 2"). I just did not get around to doing this change for ib_mthca. I'll do this for the next release. - Jack P.S. BTW, I think there is a bug in the mthca driver, which messes things up if the profile numbers are NOT powers of 2: (from mthca_make_profile, in file mthca_profile.c): for (i = 0; i < MTHCA_RES_NUM; ++i) { profile[i].type = i; profile[i].log_num = max(ffs(profile[i].num) - 1, 0); profile[i].size *= profile[i].num; should be for (i = 0; i < MTHCA_RES_NUM; ++i) { profile[i].type = i; profile[i].num = roundup_pow_of_two(profile[i].num); profile[i].log_num = ilog2(profile[i].num); profile[i].size *= profile[i].num; since later the procedure assumes that all sizes are powers of 2. From ogerlitz at voltaire.com Mon Apr 28 23:57:58 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 29 Apr 2008 09:57:58 +0300 Subject: [ofa-general] Nodes dropping out of IPoIB mcast group In-Reply-To: <20080428091923.0abf9fb5.weiny2@llnl.gov> References: <20080423133816.6c1b6315.weiny2@llnl.gov> <48109087.6030606@voltaire.com> <20080424143125.2aad1db8.weiny2@llnl.gov> <15ddcffd0804241523p19559580vc3a1293c1fe097b1@mail.gmail.com> <20080424181657.28d58a29.weiny2@llnl.gov> <48143DBA.3080701@voltaire.com> <20080428091923.0abf9fb5.weiny2@llnl.gov> Message-ID: <4816C6F6.6000602@voltaire.com> Ira Weiny wrote: >> OK, good. Does this problem exist in the released openSM? if yes, what >> would be the trigger for the SM to "really discover" (i.e do PortInfo >> SET) this sub-fabric and how much time would it take to reach this >> trigger, worst case wise? > Yes, this is in the current released version of OpenSM, AFAICT. The trigger > is: the single link separating the partial sub net will come up and that trap > will cause OpenSM to resweep. I believe this will happen on the next resweep > cycle which is by default 10 sec. (But this is configurable.) I don't think > there is an issue with allowing OpenSM to resweep as designed. And when openSM does the heavy sweep, what nodes would have their client rereg bit set, only the ones beyond the recovered link? also will openSM cycle the logical link state of those nodes (which is active!) through armed-active again or the only SET would be for the rereg bit? Or. From jackm at dev.mellanox.co.il Mon Apr 28 23:59:41 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Tue, 29 Apr 2008 09:59:41 +0300 Subject: [ofa-general] Re: [PATCH 2/8]: mthca/mlx4: avoid recycling old FMR R_Keys too soon In-Reply-To: References: <200804241106.57172.okir@lst.de> <200804241109.52448.okir@lst.de> Message-ID: <200804290959.41881.jackm@dev.mellanox.co.il> On Saturday 26 April 2008 00:33, Roland Dreier wrote: > Looks mostly OK... the only thing I worry about is in the Sinai > optimization case, do we run into trouble with bits getting carried into > the top bits of the key? > > Can someone from Mellanox review this more carefully? Olaf submitted such a patch (as an RFC) in February, and we had a discussion thread then: http://lists.openfabrics.org/pipermail/general/2008-February/046863.html We concluded at that time that the patch was OK. I also reviewed the patch again (especially the Sinai optimization), and the patch is OK there, too: The key-adjustment games are all within the index portion of the key. The spare-bits portion is outside this range (i.e., the most significant byte only). For Sinai, the increment used on the key when remapping is 0x10000000 (low order bit of most significant byte), so there is no influence on the index portion (which therefore remains constant, as it should). As such, there is no need to call adjust-key() when unmapping an fmr -- since the index portion of the key is not touched -- neither in map, nor in unmap. - Jack From dorfman.eli at gmail.com Tue Apr 29 00:33:07 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Tue, 29 Apr 2008 10:33:07 +0300 Subject: [ofa-general] [PATCH] IB/iSER: Count fmr alignment violations per session Message-ID: <694d48600804290033k61f717f7ob97d33b27e4c236f@mail.gmail.com> Count fmr alignment violations per session as part of the iscsi statistics. Signed-off-by: Eli Dorfman --- drivers/infiniband/ulp/iser/iscsi_iser.c | 4 +++- drivers/infiniband/ulp/iser/iser_memory.c | 2 ++ include/scsi/libiscsi.h | 1 + 3 files changed, 6 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index 451e601..df44fa7 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -472,13 +472,15 @@ iscsi_iser_conn_get_stats(struct iscsi_cls_conn *cls_conn, struct iscsi_stats *s stats->r2t_pdus = conn->r2t_pdus_cnt; /* always 0 */ stats->tmfcmd_pdus = conn->tmfcmd_pdus_cnt; stats->tmfrsp_pdus = conn->tmfrsp_pdus_cnt; - stats->custom_length = 3; + stats->custom_length = 4; strcpy(stats->custom[0].desc, "qp_tx_queue_full"); stats->custom[0].value = 0; /* TB iser_conn->qp_tx_queue_full; */ strcpy(stats->custom[1].desc, "fmr_map_not_avail"); stats->custom[1].value = 0; /* TB iser_conn->fmr_map_not_avail */; strcpy(stats->custom[2].desc, "eh_abort_cnt"); stats->custom[2].value = conn->eh_abort_cnt; + strcpy(stats->custom[3].desc, "fmr_unalign_cnt"); + stats->custom[3].value = conn->fmr_unalign_cnt; } static int diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index ee58199..cac50c4 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -423,6 +423,7 @@ void iser_dma_unmap_task_data(struct iscsi_iser_cmd_task *iser_ctask) int iser_reg_rdma_mem(struct iscsi_iser_cmd_task *iser_ctask, enum iser_data_dir cmd_dir) { + struct iscsi_conn *iscsi_conn = iser_ctask->iser_conn->iscsi_conn; struct iser_conn *ib_conn = iser_ctask->iser_conn->ib_conn; struct iser_device *device = ib_conn->device; struct ib_device *ibdev = device->ib_device; @@ -437,6 +438,7 @@ int iser_reg_rdma_mem(struct iscsi_iser_cmd_task *iser_ctask, aligned_len = iser_data_buf_aligned_len(mem, ibdev); if (aligned_len != mem->dma_nents) { + iscsi_conn->fmr_unalign_cnt++; iser_warn("rdma alignment violation %d/%d aligned\n", aligned_len, mem->size); iser_data_buf_dump(mem, ibdev); diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h index 7b90b63..cd3ca63 100644 --- a/include/scsi/libiscsi.h +++ b/include/scsi/libiscsi.h @@ -225,6 +225,7 @@ struct iscsi_conn { /* custom statistics */ uint32_t eh_abort_cnt; + uint32_t fmr_unalign_cnt; }; struct iscsi_pool { -- 1.5.5 From Jean-Francois.Neyroud at bull.net Tue Apr 29 01:17:38 2008 From: Jean-Francois.Neyroud at bull.net (Jean-Francois.Neyroud) Date: Tue, 29 Apr 2008 10:17:38 +0200 Subject: [ofa-general] perfquery causes kernel to be stuck in ib_unregister_mad_agent() function Message-ID: <4816D9A2.7040009@bull.net> If I attemp to query at the same time the performance counters on all nodes on a cluster ( 40 nodes) . perfquery causes kernel to be stuck in ib_unregister_mad_agent() function. Impossible to send CTRL-C or CTRL-Z to perfquery, it is stuck in the kernel. # pgrep perfquery 27578 # cat /proc/27578/wchan ib_unregister_mad_agent I have this problem with OFED-1.2.5 or 1.3 and with mthca or ConnectX, not tested with others HCA and OFED. Reproduceur with 2 nodes and without switch: # for i in `seq 1 100`; do perfquery >/dev/null 2>&1 & done # pgrep perfquery | while read pid; do echo "$pid: `cat /proc/$pid/wchan`"; echo; done | dshbak -c ---------------- [14936,14938-15029] ---------------- 0 ---------------- ---------------- ---------------- 14937 ---------------- flush_cpu_workqueue Does anyone know this problem ? Jean-Francois. From eli at dev.mellanox.co.il Tue Apr 29 02:17:33 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Tue, 29 Apr 2008 12:17:33 +0300 Subject: [ofa-general] [PATCH] IB/ipoib: set child MTU as the parent's Message-ID: <1209460653.28929.1.camel@mtls03> >From 71e918e23f7f8815f3248c1089f69680ae6a203b Mon Sep 17 00:00:00 2001 From: Eli Cohen Date: Tue, 29 Apr 2008 11:48:09 +0300 Subject: [PATCH] IB/ipoib: set child MTU as the parent's When the child joins the broadcast group reset the mtu to the real one. Signed-off-by: Eli Cohen --- drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c index 431fdea..872b670 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c @@ -90,6 +90,9 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) } priv->max_ib_mtu = ppriv->max_ib_mtu; + /* MTU will be reset when mcast join happens */ + priv->dev->mtu = IPOIB_UD_MTU(priv->max_ib_mtu); + priv->mcast_mtu = priv->admin_mtu = priv->dev->mtu; set_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags); priv->pkey = pkey; -- 1.5.5 From vlad at dev.mellanox.co.il Tue Apr 29 03:40:55 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 29 Apr 2008 13:40:55 +0300 Subject: [ofa-general] [PATCH 0/8] RDS patch set In-Reply-To: <200804241114.51260.okir@lst.de> References: <200804241114.51260.okir@lst.de> Message-ID: <4816FB37.8010407@dev.mellanox.co.il> Olaf Kirch wrote: > Hi all, > > here's another set of patches related to RDS. The patches can be found > in git://git.openfabrics.org/ofed_1_3/linux-2.6 > and git://git.openfabrics.org/ofed_1_3/rds-tools > > There are seven kernel patches. I would very much like to see the first > four of them in OFED 1.3.1 if possible. On the remaining 3, I'm not > particularly religious - I'm fine if they make it into 1.3.* at a later > time. > > RDS: Fix IB max_unacked_* sysctls > Straightforward bugfix. > > mthca/mlx4: avoid recycling old FMR R_Keys too soon > This is a re-run of a mthca patch I posted a while back; Jack > Morgenstein requested that I should make the same change in the > mlx4 driver. Here it is; review and feedback much appreciated. > > Reduce struct rds_ib_send_work size > RDS: Increase the default number of WRs > These two patches go together; they shrink the size of the > send work entry we allocate in favor of allocating more of them. > I would very much like to see these in OFED 1.3.1 > > RDS: Two small code reorgs in the connection code > RDS: Use IB for loopback > These also go together. For loopback traffic, we need to use > IB if available, instead of the special loopback transport currently > used. The reason is that lots of our tests run on single hosts over > loopback, and we want to stress things like RDMA. > > RDS: Implement rds ping > This is really a new feature. Essentially, ping over RDS. > > There's a companion patch to rds-tools that implements the rds-ping > user space utility that leverages the functionality added by the kernel > patch above. > > Olaf Applied to ofed_1_3/linux-2.6.git and to ofed_1_3/rds-tools.git. The following OFED build includes these patches: http://www.openfabrics.org/builds/ofed-1.3.1/OFED-1.3.1-20080429-0110.tgz Regards, Vladimir From hrosenstock at xsigo.com Tue Apr 29 03:49:20 2008 From: hrosenstock at xsigo.com (Hal Rosenstock) Date: Tue, 29 Apr 2008 03:49:20 -0700 Subject: [ofa-general] perfquery causes kernel to be stuck in ib_unregister_mad_agent() function In-Reply-To: <4816D9A2.7040009@bull.net> References: <4816D9A2.7040009@bull.net> Message-ID: <1209466160.689.433.camel@hrosenstock-ws.xsigo.com> Hi Jean-Francois, On Tue, 2008-04-29 at 10:17 +0200, Jean-Francois.Neyroud wrote: > If I attemp to query at the same time the performance counters on all > nodes on a cluster ( 40 nodes) . > perfquery causes kernel to be stuck in ib_unregister_mad_agent() function. > > Impossible to send CTRL-C or CTRL-Z to perfquery, it is stuck in the kernel. > # pgrep perfquery > 27578 > # cat /proc/27578/wchan > ib_unregister_mad_agent > > I have this problem with OFED-1.2.5 or 1.3 and with mthca or ConnectX, > not tested with others HCA and OFED. > > Reproduceur with 2 nodes and without switch: > > # for i in `seq 1 100`; do perfquery >/dev/null 2>&1 & done > > # pgrep perfquery | while read pid; do echo "$pid: `cat /proc/$pid/wchan`"; echo; done | dshbak -c > ---------------- > [14936,14938-15029] > ---------------- > 0 > ---------------- > > ---------------- > ---------------- > 14937 > ---------------- > flush_cpu_workqueue > > > Does anyone know this problem ? This could be related to the lock dependency issue discussed in the following thread: http://lists.openfabrics.org/pipermail/general/2008-January/044723.html You might want to look to the following for the actual fix: commit 2fe7e6f7c9f55eac24c5b3cdf56af29ab9b0ca81 Author: Roland Dreier Date: Fri Jan 25 14:15:42 2008 -0800 IB/umad: Simplify and fix locking In addition to being overly complex, the locking in user_mad.c is broken: there were multiple reports of deadlocks and lockdep warnings. In particular it seems that a single thread may end up trying to take the same rwsem for reading more than once, which is explicitly forbidden in the comments in . To solve this, we change the locking to use plain mutexes instead of rwsems. There is one mutex per open file, which protects the contents of the struct ib_umad_file, including the array of agents and list of queued packets; and there is one mutex per struct ib_umad_port, which protects the contents, including the list of open files. We never hold the file mutex across calls to functions like ib_unregister_mad_agent() , which can call back into other ib_umad code to queue a packet, and we always hold the port mutex as long as we need to make sure that a device is not hot-unplugged from under us. This even makes things nicer for users of the -rt patch, since we remove calls to downgrade_write() (which is not implemented in -rt). Signed-off-by: Roland Dreier I don't think this change was incorporated into either OFED 1.2.5 or 1.3. -- Hal > > Jean-Francois. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From hugh at veritas.com Tue Apr 29 03:49:11 2008 From: hugh at veritas.com (Hugh Dickins) Date: Tue, 29 Apr 2008 11:49:11 +0100 (BST) Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080429001052.GA8315@duo.random> References: <20080423002848.GA32618@sgi.com> <20080423163713.GC24536@duo.random> <20080423221928.GV24536@duo.random> <20080424064753.GH24536@duo.random> <20080424095112.GC30298@sgi.com> <20080424153943.GJ24536@duo.random> <20080424174145.GM24536@duo.random> <20080426131734.GB19717@sgi.com> <20080427122727.GO9514@duo.random> <20080429001052.GA8315@duo.random> Message-ID: On Tue, 29 Apr 2008, Andrea Arcangeli wrote: > > My point of view is that there was no rcu when I wrote that code, yet > there was no reference count and yet all locking looks still exactly > the same as I wrote it. There's even still the page_table_lock to > serialize threads taking the mmap_sem in read mode against the first > vma->anon_vma = anon_vma during the page fault. > > Frankly I've absolutely no idea why rcu is needed in all rmap code > when walking the page->mapping. Definitely the PG_locked is taken so > there's no way page->mapping could possibly go away under the rmap > code, hence the anon_vma can't go away as it's queued in the vma, and > the vma has to go away before the page is zapped out of the pte. [I'm scarcely following the mmu notifiers to-and-fro, which seems to be in good hands, amongst faster thinkers than me: who actually need and can test this stuff. Don't let me slow you down; but I can quickly clarify on this history.] No, the locking was different as you had it, Andrea: there was an extra bitspin lock, carried over from the pte_chains days (maybe we changed the name, maybe we disagreed over the name, I forget), which mainly guarded the page->mapcount. I thought that was one lock more than we needed, and eliminated it in favour of atomic page->mapcount in 2.6.9. Here's the relevant extracts from ChangeLog-2.6.9: [PATCH] rmaplock: PageAnon in mapping First of a batch of five patches to eliminate rmap's page_map_lock, replace its trylocking by spinlocking, and use anon_vma to speed up swapoff. Patches updated from the originals against 2.6.7-mm7: nothing new so I won't spam the list, but including Manfred's SLAB_DESTROY_BY_RCU fixes, and omitting the unuse_process mmap_sem fix already in 2.6.8-rc3. This patch: Replace the PG_anon page->flags bit by setting the lower bit of the pointer in page->mapping when it's anon_vma: PAGE_MAPPING_ANON bit. We're about to eliminate the locking which kept the flags and mapping in synch: it's much easier to work on a local copy of page->mapping, than worry about whether flags and mapping are in synch (though I imagine it could be done, at greater cost, with some barriers). [PATCH] rmaplock: kill page_map_lock The pte_chains rmap used pte_chain_lock (bit_spin_lock on PG_chainlock) to lock its pte_chains. We kept this (as page_map_lock: bit_spin_lock on PG_maplock) when we moved to objrmap. But the file objrmap locks its vma tree with mapping->i_mmap_lock, and the anon objrmap locks its vma list with anon_vma->lock: so isn't the page_map_lock superfluous? Pretty much, yes. The mapcount was protected by it, and needs to become an atomic: starting at -1 like page _count, so nr_mapped can be tracked precisely up and down. The last page_remove_rmap can't clear anon page mapping any more, because of races with page_add_rmap; from which some BUG_ONs must go for the same reason, but they've served their purpose. vmscan decisions are naturally racy, little change there beyond removing page_map_lock/unlock. But to stabilize the file-backed page->mapping against truncation while acquiring i_mmap_lock, page_referenced_file now needs page lock to be held even for refill_inactive_zone. There's a similar issue in acquiring anon_vma->lock, where page lock doesn't help: which this patch pretends to handle, but actually it needs the next. Roughly 10% cut off lmbench fork numbers on my 2*HT*P4. Must confess my testing failed to show the races even while they were knowingly exposed: would benefit from testing on racier equipment. [PATCH] rmaplock: SLAB_DESTROY_BY_RCU With page_map_lock gone, how to stabilize page->mapping's anon_vma while acquiring anon_vma->lock in page_referenced_anon and try_to_unmap_anon? The page cannot actually be freed (vmscan holds reference), but however much we check page_mapped (which guarantees that anon_vma is in use - or would guarantee that if we added suitable barriers), there's no locking against page becoming unmapped the instant after, then anon_vma freed. It's okay to take anon_vma->lock after it's freed, so long as it remains a struct anon_vma (its list would become empty, or perhaps reused for an unrelated anon_vma: but no problem since we always check that the page located is the right one); but corruption if that memory gets reused for some other purpose. This is not unique: it's liable to be problem whenever the kernel tries to approach a structure obliquely. It's generally solved with an atomic reference count; but one advantage of anon_vma over anonmm is that it does not have such a count, and it would be a backward step to add one. Therefore... implement SLAB_DESTROY_BY_RCU flag, to guarantee that such a kmem_cache_alloc'ed structure cannot get freed to other use while the rcu_read_lock is held i.e. preempt disabled; and use that for anon_vma. Fix concerns raised by Manfred: this flag is incompatible with poisoning and destructor, and kmem_cache_destroy needs to synchronize_kernel. I hope SLAB_DESTROY_BY_RCU may be useful elsewhere; but though it's safe for little anon_vma, I'd be reluctant to use it on any caches whose immediate shrinkage under pressure is important to the system. [PATCH] rmaplock: mm lock ordering With page_map_lock out of the way, there's no need for page_referenced and try_to_unmap to use trylocks - provided we switch anon_vma->lock and mm->page_table_lock around in anon_vma_prepare. Though I suppose it's possible that we'll find that vmscan makes better progress with trylocks than spinning - we're free to choose trylocks again if so. Try to update the mm lock ordering documentation in filemap.c. But I still find it confusing, and I've no idea of where to stop. So add an mm lock ordering list I can understand to rmap.c. [The fifth patch was about using anon_vma in swapoff, not relevant here.] So, going back to what you wrote: holding the page lock there is not enough to prevent the struct anon_vma going away beneath us. Hugh From ogerlitz at voltaire.com Tue Apr 29 04:17:05 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 29 Apr 2008 14:17:05 +0300 Subject: [ofa-general] perfquery causes kernel to be stuck in ib_unregister_mad_agent() function In-Reply-To: <4816D9A2.7040009@bull.net> References: <4816D9A2.7040009@bull.net> Message-ID: <481703B1.60900@voltaire.com> Jean-Francois.Neyroud wrote: > If I attemp to query at the same time the performance counters on all > nodes on a cluster ( 40 nodes) . > perfquery causes kernel to be stuck in ib_unregister_mad_agent() > function. > Impossible to send CTRL-C or CTRL-Z to perfquery, it is stuck in the > kernel. maybe with $ dmesg -c $ echo 1 > /proc/sysrq-trigger $ echo t > /proc/sysrq-trigger and then looking on the related kernel threads stacks from the dmesg (eg of ib_madX threads, etc) you would get more info that you can share. Or. From tziporet at dev.mellanox.co.il Tue Apr 29 04:39:16 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Tue, 29 Apr 2008 14:39:16 +0300 Subject: [ofa-general] install.sh question In-Reply-To: <481652F6.50008@mediaweb.com> References: <1207688301.1661.86.camel@localhost> <48141EC1.7010801@dev.mellanox.co.il> <481652F6.50008@mediaweb.com> Message-ID: <481708E4.4070306@mellanox.co.il> DK Smith wrote: > > Is the NEW & IMPROVED installer, install.pl, a drop in replacement for > build.sh? > > I recently wrote a set of build scripts that are used to build a > distribution (kernel + modules + root file system) for deployment > elsewhere. (i.e. a non-native build of everything including OFED). > > In the OFED 1.2 installer, I used this method of invocation: > > /build.sh -c > > wherein, build.sh locates the config file, "ofed.conf" in the same > directory. That worked. > > The statement about "run on all cluster nodes" appears to indicate a > non-native build is no-longer possible. > > The build.sh was removed from OFED 1.3 and it is explianed in the RN: 2.2 Package and install o There is a new install script. See OFED_Installation_Guide.txt for more details on the new installation and build procedures. o User space packages are now in different source RPMs (as opposed to one source RPM in previous OFED releases). o The option for a build without installing is not supported any more. o Added the script make-dist to generate tarball with kernel sources for each kernel. Tziporet From tziporet at dev.mellanox.co.il Tue Apr 29 05:00:44 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Tue, 29 Apr 2008 15:00:44 +0300 Subject: [ofa-general] Re: [ewg] OFED April 21 meeting summary In-Reply-To: References: <458BC6B0F287034F92FE78908BD01CE831A08338@mtlexch01.mtl.com> <6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com> Message-ID: <48170DEC.1020303@mellanox.co.il> Roland Dreier wrote: > > Also it is very important for us that IPoIB 2 kernel panics will be fixed ( > > https://bugs.openfabrics.org/show_bug.cgi?id=989, > > https://bugs.openfabrics.org/show_bug.cgi?id=985) > > > Both should not happen in upstream kernel: 989 - bug in a new optimization of OFED 1.3 (see bug report for details) 985 - bug in backports only (Eli will update the bug and resolution) Tziporet From dpn at isomerica.net Tue Apr 29 05:46:13 2008 From: dpn at isomerica.net (Dan Noe) Date: Tue, 29 Apr 2008 08:46:13 -0400 Subject: [ofa-general] perfquery causes kernel to be stuck in ib_unregister_mad_agent() function In-Reply-To: <481703B1.60900@voltaire.com> References: <4816D9A2.7040009@bull.net> <481703B1.60900@voltaire.com> Message-ID: <48171895.9050705@isomerica.net> Or Gerlitz wrote: > Jean-Francois.Neyroud wrote: >> If I attemp to query at the same time the performance counters on all >> nodes on a cluster ( 40 nodes) . >> perfquery causes kernel to be stuck in ib_unregister_mad_agent() >> function. >> Impossible to send CTRL-C or CTRL-Z to perfquery, it is stuck in the >> kernel. > maybe with > $ dmesg -c > $ echo 1 > /proc/sysrq-trigger > $ echo t > /proc/sysrq-trigger Hi, Depending on how recent your kernel is, you can also echo d > /proc/sysrq-trigger which will show the state of currently held locks. Hope that helps. Cheers, Dan -- /--------------- - - - - - - | Dan Noe | http://isomerica.net/~dpn/ From Jean-Francois.Neyroud at bull.net Tue Apr 29 05:55:48 2008 From: Jean-Francois.Neyroud at bull.net (Jean-Francois.Neyroud) Date: Tue, 29 Apr 2008 14:55:48 +0200 Subject: [ofa-general] perfquery causes kernel to be stuck in ib_unregister_mad_agent() function In-Reply-To: <1209466160.689.433.camel@hrosenstock-ws.xsigo.com> References: <4816D9A2.7040009@bull.net> <1209466160.689.433.camel@hrosenstock-ws.xsigo.com> Message-ID: <48171AD4.2020900@bull.net> Thanks Hal with this fix it's OK. Jean-Francois. > This could be related to the lock dependency issue discussed in the > following thread: > > http://lists.openfabrics.org/pipermail/general/2008-January/044723.html > > You might want to look to the following for the actual fix: > > commit 2fe7e6f7c9f55eac24c5b3cdf56af29ab9b0ca81 > Author: Roland Dreier > Date: Fri Jan 25 14:15:42 2008 -0800 > > From jackm at dev.mellanox.co.il Tue Apr 29 06:31:56 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Tue, 29 Apr 2008 16:31:56 +0300 Subject: [ofa-general] Re: [PATCH] mlx4_core: enable changing default max HCA resource limits at run time -- reposting In-Reply-To: <200804290940.37587.jackm@dev.mellanox.co.il> References: <200804281438.28417.jackm@dev.mellanox.co.il> <200804290940.37587.jackm@dev.mellanox.co.il> Message-ID: <200804291631.56685.jackm@dev.mellanox.co.il> On Tuesday 29 April 2008 09:40, Jack Morgenstein wrote: > P.S. > BTW, I think there is a bug in the mthca driver, which messes things > up if the profile numbers are NOT powers of 2: > (from mthca_make_profile, in file mthca_profile.c): > for (i = 0; i < MTHCA_RES_NUM; ++i) { > profile[i].type = i; > profile[i].log_num = max(ffs(profile[i].num) - 1, 0); > profile[i].size *= profile[i].num; > > should be > for (i = 0; i < MTHCA_RES_NUM; ++i) { > profile[i].type = i; > profile[i].num = roundup_pow_of_two(profile[i].num); > profile[i].log_num = ilog2(profile[i].num); > profile[i].size *= profile[i].num; > > since later the procedure assumes that all sizes are powers of 2. I was wrong -- sorry about that, Roland. I missed the procedure __mthca_check_profile_val() in file mthca_main.c, which does raise the profile values to the next (or same) power-of-2 value, so there is no bug. Still, I feel that it is much cleaner to require the user to specify a power-of-2 directly, rather than correct cases in which the user did not do so. I'm working on a patch for ib_mthca now, on top of your 2.6.26 tree, which will do the job. - Jack From andrea at qumranet.com Tue Apr 29 06:32:35 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 29 Apr 2008 15:32:35 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: References: <20080423221928.GV24536@duo.random> <20080424064753.GH24536@duo.random> <20080424095112.GC30298@sgi.com> <20080424153943.GJ24536@duo.random> <20080424174145.GM24536@duo.random> <20080426131734.GB19717@sgi.com> <20080427122727.GO9514@duo.random> <20080429001052.GA8315@duo.random> Message-ID: <20080429133235.GC8315@duo.random> Hi Hugh!! On Tue, Apr 29, 2008 at 11:49:11AM +0100, Hugh Dickins wrote: > [I'm scarcely following the mmu notifiers to-and-fro, which seems > to be in good hands, amongst faster thinkers than me: who actually > need and can test this stuff. Don't let me slow you down; but I > can quickly clarify on this history.] Still I think it'd be great if you could review mmu-notifier-core v14. You and Nick are the core VM maintainers so it'd be great to hear any feedback about it. I think it's fairly easy to classify the patch as obviously safe as long as mmu notifiers are disarmed. Here a link for your convenience. http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.25/mmu-notifier-v14/mmu-notifier-core > No, the locking was different as you had it, Andrea: there was an extra > bitspin lock, carried over from the pte_chains days (maybe we changed > the name, maybe we disagreed over the name, I forget), which mainly > guarded the page->mapcount. I thought that was one lock more than we > needed, and eliminated it in favour of atomic page->mapcount in 2.6.9. Thanks a lot for the explanation! From dks at mediaweb.com Tue Apr 29 06:50:52 2008 From: dks at mediaweb.com (DK Smith) Date: Tue, 29 Apr 2008 06:50:52 -0700 Subject: [ofa-general] install.sh question In-Reply-To: <481708E4.4070306@mellanox.co.il> References: <1207688301.1661.86.camel@localhost> <48141EC1.7010801@dev.mellanox.co.il> <481652F6.50008@mediaweb.com> <481708E4.4070306@mellanox.co.il> Message-ID: <481727BC.7000709@mediaweb.com> Hello, Thank you for the prompt response an the navigation to the Release Notes. > The build.sh was removed from OFED 1.3 and it is explianed in the RN: > 2.2 Package and install > o There is a new install script. See OFED_Installation_Guide.txt for > more details on the new installation and build procedures. > o User space packages are now in different source RPMs (as > opposed to > one source RPM in previous OFED releases). > o The option for a build without installing is not supported any > more. > o Added the script make-dist to generate tarball with kernel sources > for each kernel. > > Tziporet > What was the reason for the decision that building without installation is the way to go? I still have a consideration that remains ambiguous to me. I believe that my scenario requires a separate build and installation ... and non-natively too. :) In version 1.2.*, I specified the location of the kernel source that I was building against. Then I took the resulting RPM package and installed it into a root file system that is subsequently rolled into a special boot disk which is installed into a Linux "appliance". Is there a way to accomplish this with the new installer? The new installer appears to be restricted to native installations. Is this the case? If so, isn't this also a problem for other people? Usage: ./install.pl [-c |--all|--hpc|--basic] [-n|--net ] -c|--config . Example of the config file can be found under docs. -l|--prefix Set installation prefix. -p|--print-available Print available packages for current platform. And create corresponding ofed.conf file. -k|--kernel . Default on this system: 9.2.2 -s|--kernel-sources . Default on this system: /lib/modules/9.2.2/build --build32 Build 32-bit libraries. Relevant for x86_64 and ppc64 platforms --without-depcheck Skip Distro's libraries check -v|-vv|-vvv. Set verbosity level -q. Set quiet - no messages will be printed --all|--hpc|--basic Install all,hpc or basic packages correspondingly Cheers, DK From Brian.Murrell at Sun.COM Tue Apr 29 06:58:30 2008 From: Brian.Murrell at Sun.COM (Brian J. Murrell) Date: Tue, 29 Apr 2008 09:58:30 -0400 Subject: [ofa-general] install.sh question In-Reply-To: <481727BC.7000709@mediaweb.com> References: <1207688301.1661.86.camel@localhost> <48141EC1.7010801@dev.mellanox.co.il> <481652F6.50008@mediaweb.com> <481708E4.4070306@mellanox.co.il> <481727BC.7000709@mediaweb.com> Message-ID: <1209477510.16768.45.camel@pc.ilinx> On Tue, 2008-04-29 at 06:50 -0700, DK Smith wrote: > > The new installer appears to be restricted to native installations. Is > this the case? If so, isn't this also a problem for other people? If by this you mean "isn't it a problem for other people that the mode of operations is that you have to build RPMs and then install them on the build system in order to build other RPMs", _absolutely_ this is a problem! I am not allowed to "pollute" the pristine build system by installing random RPMs into it. To that end, I have created a build system for OFED 1.3 that builds and installs (and builds and installs, and builds and installs) all of the RPMs into a "tree" that I am free to make in my $HOME. It does not work as entirely smoothly as it could/should due to some (what I consider) breakage in the packages themselves but I seem to have been able to work around the breakage that I have found. I don't yet build all RPMs either though. I've only built enough to get me the test tools I needed at the time. But IMHO, the OFED build system should be able to complete building all RPMs without a) needing to be root and b) having to install intermediate RPMs on the build system. AFAIK, it can/does not do this currently. Cheers, b. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From dks at mediaweb.com Tue Apr 29 07:56:17 2008 From: dks at mediaweb.com (DK Smith) Date: Tue, 29 Apr 2008 07:56:17 -0700 Subject: [ofa-general] install.sh question In-Reply-To: <1209477510.16768.45.camel@pc.ilinx> References: <1207688301.1661.86.camel@localhost> <48141EC1.7010801@dev.mellanox.co.il> <481652F6.50008@mediaweb.com> <481708E4.4070306@mellanox.co.il> <481727BC.7000709@mediaweb.com> <1209477510.16768.45.camel@pc.ilinx> Message-ID: <48173711.9040007@mediaweb.com> Brian J. Murrell wrote: > On Tue, 2008-04-29 at 06:50 -0700, DK Smith wrote: >> The new installer appears to be restricted to native installations. Is >> this the case? If so, isn't this also a problem for other people? > > But IMHO, the OFED build system should be able to complete building all > RPMs without a) needing to be root and b) having to install intermediate > RPMs on the build system. AFAIK, it can/does not do this currently. > Hi and thanks for adding you voice to the chorus, err, I mean, duet. I did not want to seem ungrateful by complaining, too, about the root user thing. But seriously, how dangerous is that? ... to run some massive perl script of unknown quality as root. My own paranoia was beginning to make me worried. Am I too paranoid? LOL! So I appreciate this being mentioned, explicitly. :) Personally, I do not have good experiences with some vendor-supplied installer programs that claim that they must be run as root. As an example take QLogic's buggy installer for QLAxxxx FC product. The utility of Make seems to be often overlooked and/or under-used or simply misused. I assume that writing a custom installer is big decision and commitment. So then why not make such a large investment able to relocate the build output to another part of a file system? Or use the facilites that Make provides to design such functionality? The kernel build process seems to be a working model of such, that could be copied. A lot of flexibility can be achieved by parameterizing the K_VER and MODULES_INSTALL_DIR variables. Is there something specific to OFED that makes this sort of flexibility impossible? Cheers, DK From jackm at dev.mellanox.co.il Tue Apr 29 08:22:57 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Tue, 29 Apr 2008 18:22:57 +0300 Subject: [ofa-general] [PATCH] ib_mthca: use log values instead of numeric values when specifiying HCA resource maxes in module parameters Message-ID: <200804291822.57820.jackm@dev.mellanox.co.il> ib_mthca: change all HCA resource module parameters to be log values. Module parameters for overriding driver default maximum HCA resource quantities should be log values, not numeric values -- since these quantities should all be powers-of-2 anyway. Signed-off-by: Jack Morgenstein --- Roland, This is for kernel 2.6.26. (I generated it against your for-2.6.26 git tree). I put a check in the patch for detecting if the user specified a log or not, to make the transition from the old method (of numbers instead of logs) easier. Maybe add such a check to the mlx4 version, too? Jack index 9ebadd6..c9f9bbe 100644 --- a/drivers/infiniband/hw/mthca/mthca_main.c +++ b/drivers/infiniband/hw/mthca/mthca_main.c @@ -99,37 +99,88 @@ static struct mthca_profile hca_profile = { .uarc_size = MTHCA_DEFAULT_NUM_UARC_SIZE, /* Arbel only */ }; -module_param_named(num_qp, hca_profile.num_qp, int, 0444); -MODULE_PARM_DESC(num_qp, "maximum number of QPs per HCA"); +static struct mthca_profile mod_param_profile = { 0 }; +module_param_named(num_qp, mod_param_profile.num_qp, int, 0444); +MODULE_PARM_DESC(num_qp, "log maximum number of QPs per HCA (default 16)"); -module_param_named(rdb_per_qp, hca_profile.rdb_per_qp, int, 0444); -MODULE_PARM_DESC(rdb_per_qp, "number of RDB buffers per QP"); +module_param_named(rdb_per_qp, mod_param_profile.rdb_per_qp, int, 0444); +MODULE_PARM_DESC(rdb_per_qp, "log number of RDB buffers per QP (default 2)"); -module_param_named(num_cq, hca_profile.num_cq, int, 0444); -MODULE_PARM_DESC(num_cq, "maximum number of CQs per HCA"); +module_param_named(num_cq, mod_param_profile.num_cq, int, 0444); +MODULE_PARM_DESC(num_cq, "log maximum number of CQs per HCA (default 16)"); -module_param_named(num_mcg, hca_profile.num_mcg, int, 0444); -MODULE_PARM_DESC(num_mcg, "maximum number of multicast groups per HCA"); +module_param_named(num_mcg, mod_param_profile.num_mcg, int, 0444); +MODULE_PARM_DESC(num_mcg, "log maximum number of multicast groups per HCA" + " (default 13)"); -module_param_named(num_mpt, hca_profile.num_mpt, int, 0444); +module_param_named(num_mpt, mod_param_profile.num_mpt, int, 0444); MODULE_PARM_DESC(num_mpt, - "maximum number of memory protection table entries per HCA"); + "log maximum number of memory protection table entries per HCA" + " (default 17)"); -module_param_named(num_mtt, hca_profile.num_mtt, int, 0444); +module_param_named(num_mtt, mod_param_profile.num_mtt, int, 0444); MODULE_PARM_DESC(num_mtt, - "maximum number of memory translation table segments per HCA"); + "log maximum number of memory translation table segments per" + " HCA (default 20)"); -module_param_named(num_udav, hca_profile.num_udav, int, 0444); -MODULE_PARM_DESC(num_udav, "maximum number of UD address vectors per HCA"); +module_param_named(num_udav, mod_param_profile.num_udav, int, 0444); +MODULE_PARM_DESC(num_udav, "log maximum number of UD address vectors per HCA" + " (default 15)"); -module_param_named(fmr_reserved_mtts, hca_profile.fmr_reserved_mtts, int, 0444); +module_param_named(fmr_reserved_mtts, mod_param_profile.fmr_reserved_mtts, + int, 0444); MODULE_PARM_DESC(fmr_reserved_mtts, - "number of memory translation table segments reserved for FMR"); + "log number of memory translation table segments reserved for" + " FMR (default 18)"); static char mthca_version[] __devinitdata = DRV_NAME ": Mellanox InfiniBand HCA driver v" DRV_VERSION " (" DRV_RELDATE ")\n"; +static void process_mod_param_profile(void) +{ + if (mod_param_profile.num_qp > 31 || + mod_param_profile.rdb_per_qp > 31 || + mod_param_profile.num_cq > 31 || + mod_param_profile.num_mcg > 31 || + mod_param_profile.num_mpt > 31 || + mod_param_profile.num_mtt > 31 || + mod_param_profile.num_udav > 31 || + mod_param_profile.fmr_reserved_mtts > 31) { + printk(KERN_WARNING PFX "Value of one or more HCA resource" + " module parameters exceeds 31.\n"); + printk(KERN_WARNING PFX "Are you specifying LOG values?\n"); + printk(KERN_WARNING PFX "Reverting to using max default values" + " for all HCA resources.\n"); + return; + } + + hca_profile.num_qp = (mod_param_profile.num_qp ? + 1 << mod_param_profile.num_qp : + hca_profile.num_qp); + hca_profile.rdb_per_qp = (mod_param_profile.rdb_per_qp ? + 1 << mod_param_profile.rdb_per_qp : + hca_profile.rdb_per_qp); + hca_profile.num_cq = (mod_param_profile.num_cq ? + 1 << mod_param_profile.num_cq : + hca_profile.num_cq); + hca_profile.num_mcg = (mod_param_profile.num_mcg ? + 1 << mod_param_profile.num_mcg : + hca_profile.num_mcg); + hca_profile.num_mpt = (mod_param_profile.num_mpt ? + 1 << mod_param_profile.num_mpt : + hca_profile.num_mpt); + hca_profile.num_mtt = (mod_param_profile.num_mtt ? + 1 << mod_param_profile.num_mtt : + hca_profile.num_mtt); + hca_profile.num_udav = (mod_param_profile.num_udav ? + 1 << mod_param_profile.num_udav : + hca_profile.num_udav); + hca_profile.fmr_reserved_mtts = (mod_param_profile.fmr_reserved_mtts ? + 1 << mod_param_profile.fmr_reserved_mtts : + hca_profile.fmr_reserved_mtts); +} + static int mthca_tune_pci(struct mthca_dev *mdev) { if (!tune_pci) @@ -1364,6 +1415,7 @@ static int __init mthca_init(void) { int ret; + process_mod_param_profile(); mthca_validate_profile(); ret = mthca_catas_init(); From andrea at qumranet.com Tue Apr 29 08:30:52 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 29 Apr 2008 17:30:52 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: References: <20080423221928.GV24536@duo.random> <20080424064753.GH24536@duo.random> <20080424095112.GC30298@sgi.com> <20080424153943.GJ24536@duo.random> <20080424174145.GM24536@duo.random> <20080426131734.GB19717@sgi.com> <20080427122727.GO9514@duo.random> <20080429001052.GA8315@duo.random> Message-ID: <20080429153052.GE8315@duo.random> On Mon, Apr 28, 2008 at 06:28:06PM -0700, Christoph Lameter wrote: > On Tue, 29 Apr 2008, Andrea Arcangeli wrote: > > > Frankly I've absolutely no idea why rcu is needed in all rmap code > > when walking the page->mapping. Definitely the PG_locked is taken so > > there's no way page->mapping could possibly go away under the rmap > > code, hence the anon_vma can't go away as it's queued in the vma, and > > the vma has to go away before the page is zapped out of the pte. > > zap_pte_range can race with the rmap code and it does not take the page > lock. The page may not go away since a refcount was taken but the mapping > can go away. Without RCU you have no guarantee that the anon_vma is > existing when you take the lock. There's some room for improvement, like using down_read_trylock, if that succeeds we don't need to increase the refcount and we can keep the rcu_read_lock held instead. Secondly we don't need to increase the refcount in fork() when we queue the vma-copy in the anon_vma. You should init the refcount to 1 when the anon_vma is allocated, remove the atomic_inc from all code (except when down_read_trylock fails) and then change anon_vma_unlink to: up_write(&anon_vma->sem); if (empty) put_anon_vma(anon_vma); While the down_read_trylock surely won't help in AIM, the second change will reduce a bit the overhead in the VM core fast paths by avoiding all refcounting changes by checking the list_empty the same way the current code does. I really like how I designed the garbage collection through list_empty and that's efficient and I'd like to keep it. I however doubt this will bring us back to the same performance of the current spinlock version, as the real overhead should come out of overscheduling in down_write ai anon_vma_link. Here an initially spinning lock would help but that's gray area, it greatly depends on timings, and on very large systems where a cacheline wait with many cpus forking at the same time takes more than scheduling a semaphore may not slowdown performance that much. So I think the only way is a configuration option to switch the locking at compile time, then XPMEM will depend on that option to be on, I don't see a big deal and this guarantees embedded isn't screwed up by totally unnecessary locks on UP. From HNGUYEN at de.ibm.com Tue Apr 29 08:40:52 2008 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Tue, 29 Apr 2008 17:40:52 +0200 Subject: [ofa-general][PATCH] Re: mlx4: Completion EQ per cpu (MP support, Patch 10) Message-ID: Hi Roland! >> Each CQ is attached to an EQ and receives its completion interrupts from that EQ. >> >> CQ and EQ are not per port. >> >> Implementing this in in device layer allows all ULP's to use the feature. >> We do not expose EQ allocation API, because there is no point creating more EQs >> then CPUs. >CQ are not per port but netdevices are bounded to port (its correct that >few of them can be bounded to the same port, eg with different PKEYs or >VLAN tags), maybe it worth thinking on API that either let the ULP >dictate to what CPU/core they want the EQ serving this CQ direct its >interrupts or if the ULP doesn't care, let the driver allocate that in >round robin fashion. We've had some ehca code doing round robin scheme, which is an ehca specific policy. Do you have any thoughts on the approach you want to pursue? Will it be 2.6.26 or 2.6.27 instead? Thanks Nam From ossrosch at linux.vnet.ibm.com Tue Apr 29 08:44:15 2008 From: ossrosch at linux.vnet.ibm.com (Stefan Roscher) Date: Tue, 29 Apr 2008 17:44:15 +0200 Subject: [ofa-general] [PATCH] IB/ehca: Allocate event queue size depending on max number of CQs and QPs Message-ID: <200804291744.17235.ossrosch@linux.vnet.ibm.com> Signed-off-by: Stefan Roscher --- drivers/infiniband/hw/ehca/ehca_classes.h | 5 ++++ drivers/infiniband/hw/ehca/ehca_cq.c | 10 ++++++++ drivers/infiniband/hw/ehca/ehca_main.c | 36 +++++++++++++++++++++++++++- drivers/infiniband/hw/ehca/ehca_qp.c | 10 ++++++++ 4 files changed, 59 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h index 3d6d946..00bab60 100644 --- a/drivers/infiniband/hw/ehca/ehca_classes.h +++ b/drivers/infiniband/hw/ehca/ehca_classes.h @@ -66,6 +66,7 @@ struct ehca_av; #include "ehca_irq.h" #define EHCA_EQE_CACHE_SIZE 20 +#define EHCA_MAX_NUM_QUEUES 0xffff struct ehca_eqe_cache_entry { struct ehca_eqe *eqe; @@ -127,6 +128,8 @@ struct ehca_shca { /* MR pgsize: bit 0-3 means 4K, 64K, 1M, 16M respectively */ u32 hca_cap_mr_pgsize; int max_mtu; + atomic_t num_cqs; + atomic_t num_qps; }; struct ehca_pd { @@ -344,6 +347,8 @@ extern int ehca_use_hp_mr; extern int ehca_scaling_code; extern int ehca_lock_hcalls; extern int ehca_nr_ports; +extern int ehca_max_cq; +extern int ehca_max_qp; struct ipzu_queue_resp { u32 qe_size; /* queue entry size */ diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c b/drivers/infiniband/hw/ehca/ehca_cq.c index ec0cfcf..5b4f9a3 100644 --- a/drivers/infiniband/hw/ehca/ehca_cq.c +++ b/drivers/infiniband/hw/ehca/ehca_cq.c @@ -132,6 +132,14 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector, if (cqe >= 0xFFFFFFFF - 64 - additional_cqe) return ERR_PTR(-EINVAL); + if (atomic_read(&shca->num_cqs) >= ehca_max_cq) { + ehca_err(device, "Unable to create CQ, max number of %i " + "CQs reached.", ehca_max_cq); + ehca_err(device, "To increase the maximum number of CQs " + "use the number_of_cqs module parameter.\n"); + return ERR_PTR(-ENOSPC); + } + my_cq = kmem_cache_zalloc(cq_cache, GFP_KERNEL); if (!my_cq) { ehca_err(device, "Out of memory for ehca_cq struct device=%p", @@ -286,6 +294,7 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector, } } + atomic_inc(&shca->num_cqs); return cq; create_cq_exit4: @@ -359,6 +368,7 @@ int ehca_destroy_cq(struct ib_cq *cq) ipz_queue_dtor(NULL, &my_cq->ipz_queue); kmem_cache_free(cq_cache, my_cq); + atomic_dec(&shca->num_cqs); return 0; } diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index 6504897..401907f 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -68,6 +68,8 @@ int ehca_port_act_time = 30; int ehca_static_rate = -1; int ehca_scaling_code = 0; int ehca_lock_hcalls = -1; +int ehca_max_cq = -1; +int ehca_max_qp = -1; module_param_named(open_aqp1, ehca_open_aqp1, bool, S_IRUGO); module_param_named(debug_level, ehca_debug_level, int, S_IRUGO); @@ -79,6 +81,8 @@ module_param_named(poll_all_eqs, ehca_poll_all_eqs, bool, S_IRUGO); module_param_named(static_rate, ehca_static_rate, int, S_IRUGO); module_param_named(scaling_code, ehca_scaling_code, bool, S_IRUGO); module_param_named(lock_hcalls, ehca_lock_hcalls, bool, S_IRUGO); +module_param_named(number_of_cqs, ehca_max_cq, int, S_IRUGO); +module_param_named(number_of_qps, ehca_max_qp, int, S_IRUGO); MODULE_PARM_DESC(open_aqp1, "Open AQP1 on startup (default: no)"); @@ -104,6 +108,12 @@ MODULE_PARM_DESC(scaling_code, MODULE_PARM_DESC(lock_hcalls, "Serialize all hCalls made by the driver " "(default: autodetect)"); +MODULE_PARM_DESC(number_of_cqs, + "Max number of CQs which can be allocated " + "(default: autodetect)"); +MODULE_PARM_DESC(number_of_qps, + "Max number of QPs which can be allocated " + "(default: autodetect)"); DEFINE_RWLOCK(ehca_qp_idr_lock); DEFINE_RWLOCK(ehca_cq_idr_lock); @@ -355,6 +365,25 @@ static int ehca_sense_attributes(struct ehca_shca *shca) if (rblock->memory_page_size_supported & pgsize_map[i]) shca->hca_cap_mr_pgsize |= pgsize_map[i + 1]; + /* Set maximum number of CQs and QPs to calculate EQ size */ + if (ehca_max_qp == -1) + ehca_max_qp = min_t(int, rblock->max_qp, EHCA_MAX_NUM_QUEUES); + else if (ehca_max_qp < 1 || ehca_max_qp > rblock->max_qp) { + ehca_gen_err("Requested number of QPs is out of range (1 - %i) " + "specified by HW", rblock->max_qp); + ret = -EINVAL; + goto sense_attributes1; + } + + if (ehca_max_cq == -1) + ehca_max_cq = min_t(int, rblock->max_cq, EHCA_MAX_NUM_QUEUES); + else if (ehca_max_cq < 1 || ehca_max_cq > rblock->max_cq) { + ehca_gen_err("Requested number of CQs is out of range (1 - %i) " + "specified by HW", rblock->max_cq); + ret = -EINVAL; + goto sense_attributes1; + } + /* query max MTU from first port -- it's the same for all ports */ port = (struct hipz_query_port *)rblock; h_ret = hipz_h_query_port(shca->ipz_hca_handle, 1, port); @@ -684,7 +713,7 @@ static int __devinit ehca_probe(struct of_device *dev, struct ehca_shca *shca; const u64 *handle; struct ib_pd *ibpd; - int ret, i; + int ret, i, eq_size; handle = of_get_property(dev->node, "ibm,hca-handle", NULL); if (!handle) { @@ -705,6 +734,8 @@ static int __devinit ehca_probe(struct of_device *dev, return -ENOMEM; } mutex_init(&shca->modify_mutex); + atomic_set(&shca->num_cqs, 0); + atomic_set(&shca->num_qps, 0); for (i = 0; i < ARRAY_SIZE(shca->sport); i++) spin_lock_init(&shca->sport[i].mod_sqp_lock); @@ -724,8 +755,9 @@ static int __devinit ehca_probe(struct of_device *dev, goto probe1; } + eq_size = 2 * ehca_max_cq + 4 * ehca_max_qp; /* create event queues */ - ret = ehca_create_eq(shca, &shca->eq, EHCA_EQ, 2048); + ret = ehca_create_eq(shca, &shca->eq, EHCA_EQ, eq_size); if (ret) { ehca_err(&shca->ib_device, "Cannot create EQ."); goto probe1; diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c index 57bef11..73d9c4a 100644 --- a/drivers/infiniband/hw/ehca/ehca_qp.c +++ b/drivers/infiniband/hw/ehca/ehca_qp.c @@ -421,6 +421,14 @@ static struct ehca_qp *internal_create_qp( u32 swqe_size = 0, rwqe_size = 0, ib_qp_num; unsigned long flags; + if (atomic_read(&shca->num_qps) >= ehca_max_qp) { + ehca_err(pd->device, "Unable to create QP, max number of %i " + "QPs reached.", ehca_max_qp); + ehca_err(pd->device, "To increase the maximum number of QPs " + "use the number_of_qps module parameter.\n"); + return ERR_PTR(-ENOSPC); + } + if (init_attr->create_flags) return ERR_PTR(-EINVAL); @@ -797,6 +805,7 @@ static struct ehca_qp *internal_create_qp( } } + atomic_inc(&shca->num_qps); return my_qp; create_qp_exit6: @@ -1948,6 +1957,7 @@ static int internal_destroy_qp(struct ib_device *dev, struct ehca_qp *my_qp, if (HAS_SQ(my_qp)) ipz_queue_dtor(my_pd, &my_qp->ipz_squeue); kmem_cache_free(qp_cache, my_qp); + atomic_dec(&shca->num_qps); return 0; } -- 1.5.5 From holt at sgi.com Tue Apr 29 08:50:30 2008 From: holt at sgi.com (Robin Holt) Date: Tue, 29 Apr 2008 10:50:30 -0500 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080429153052.GE8315@duo.random> References: <20080424064753.GH24536@duo.random> <20080424095112.GC30298@sgi.com> <20080424153943.GJ24536@duo.random> <20080424174145.GM24536@duo.random> <20080426131734.GB19717@sgi.com> <20080427122727.GO9514@duo.random> <20080429001052.GA8315@duo.random> <20080429153052.GE8315@duo.random> Message-ID: <20080429155030.GB28944@sgi.com> > I however doubt this will bring us back to the same performance of the > current spinlock version, as the real overhead should come out of > overscheduling in down_write ai anon_vma_link. Here an initially > spinning lock would help but that's gray area, it greatly depends on > timings, and on very large systems where a cacheline wait with many > cpus forking at the same time takes more than scheduling a semaphore > may not slowdown performance that much. So I think the only way is a > configuration option to switch the locking at compile time, then XPMEM > will depend on that option to be on, I don't see a big deal and this > guarantees embedded isn't screwed up by totally unnecessary locks on UP. You have said this continually about a CONFIG option. I am unsure how that could be achieved. Could you provide a patch? Thanks, Robin From andrea at qumranet.com Tue Apr 29 09:03:40 2008 From: andrea at qumranet.com (Andrea Arcangeli) Date: Tue, 29 Apr 2008 18:03:40 +0200 Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers In-Reply-To: <20080429155030.GB28944@sgi.com> References: <20080424095112.GC30298@sgi.com> <20080424153943.GJ24536@duo.random> <20080424174145.GM24536@duo.random> <20080426131734.GB19717@sgi.com> <20080427122727.GO9514@duo.random> <20080429001052.GA8315@duo.random> <20080429153052.GE8315@duo.random> <20080429155030.GB28944@sgi.com> Message-ID: <20080429160340.GG8315@duo.random> On Tue, Apr 29, 2008 at 10:50:30AM -0500, Robin Holt wrote: > You have said this continually about a CONFIG option. I am unsure how > that could be achieved. Could you provide a patch? I'm busy with the reserved ram patch against 2.6.25 and latest kvm.git that is moving from pages to pfn for pci passthrough (that change will also remove the page pin with mmu notifiers). Unfortunately reserved-ram bugs out again in the blk-settings.c on real hardware. The fix I pushed in .25 for it, works when booting kvm (that's how I tested it) but on real hardware sata b_pfn happens to be 1 page less than the result of the min comparison and I'll have to figure out what happens (only .24 code works on real hardware..., at least my fix is surely better than the previous .25-pre code). I've other people waiting on that reserved-ram to be working, so once I've finished, I'll do the optimization to anon-vma (at least the removal of the unnecessary atomic_inc from fork) and add the config option. Christoph if you've interest in evolving anon-vma-sem and i_mmap_sem yourself in this direction, you're very welcome to go ahead while I finish sorting out reserved-ram. If you do, please let me know so we don't duplicate effort, and it'd be absolutely great if the patches could be incremental with #v14 so I can merge them trivially later and upload a new patchset once you're finished (the only outstanding fix you have to apply on top of #v14 that is already integrated in my patchset, is the i_mmap_sem deadlock fix I posted and that I'm sure you've already applied on top of #v14 before doing any more development on it). Thanks! From rdreier at cisco.com Tue Apr 29 09:36:32 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 29 Apr 2008 09:36:32 -0700 Subject: [ofa-general] Re: [PATCH 2/8]: mthca/mlx4: avoid recycling old FMR R_Keys too soon In-Reply-To: <200804290959.41881.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Tue, 29 Apr 2008 09:59:41 +0300") References: <200804241106.57172.okir@lst.de> <200804241109.52448.okir@lst.de> <200804290959.41881.jackm@dev.mellanox.co.il> Message-ID: > We concluded at that time that the patch was OK. > > I also reviewed the patch again (especially the Sinai optimization), and the patch is OK > there, too: Thanks for the really detailed explanation. I'll apply Olaf's patch for 2.6.26 From michael.heinz at qlogic.com Tue Apr 29 09:37:41 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Tue, 29 Apr 2008 11:37:41 -0500 Subject: [ofa-general] Can't Initialize an MT23108 HCA Message-ID: I thought I'd posted this question on the group before, but looking through my notes I couldn't find that I had. My apologies if this is a repeat. I installed OFED 1.3.0.0.4 on a system with an older, MT23108 HCA running 3.05 firmware. The HCA is known to work with QuickSilver. When I rebooted I got this: Feb 18 13:06:02 newberry kernel: ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) Feb 18 13:06:02 newberry kernel: ib_mthca: Initializing 0000:04:00.0 Feb 18 13:06:02 newberry kernel: ACPI: PCI interrupt 0000:04:00.0[A] -> GSI 28 (level, low) -> IRQ 217 Feb 18 13:06:02 newberry kernel: ib_mthca 0000:04:00.0: PCI device did not come back after reset, aborting. Feb 18 13:06:02 newberry kernel: ib_mthca 0000:04:00.0: Failed to reset HCA, aborting. The system is running RHEL4, update 4, x86_64. Are older HCAs supported with OFED or can only Arbel and Connect-X type HCAs usable? -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -------------- next part -------------- An HTML attachment was scrubbed... URL: From boris at mellanox.com Tue Apr 29 09:42:04 2008 From: boris at mellanox.com (Boris Shpolyansky) Date: Tue, 29 Apr 2008 09:42:04 -0700 Subject: [ofa-general] Can't Initialize an MT23108 HCA In-Reply-To: Message-ID: <1E3DCD1C63492545881FACB6063A57C10257C6DE@mtiexch01.mti.com> Hi Michael, MT23108 HCAs are supported. Please, update FW on your card to the latest version available from Mellanox web site at http://www.mellanox.com/support/firmware_table_IH.php Follow FW burning instructions provided there. Regards, Boris Shpolyansky Sr. Member of Technical Staff Applications Mellanox Technologies Inc. 2900 Stender Way Santa Clara, CA 95054 Tel.: (408) 916 0014 Fax: (408) 970 3403 Cell: (408) 834 9365 www.mellanox.com ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Mike Heinz Sent: Tuesday, April 29, 2008 9:38 AM To: general at lists.openfabrics.org Subject: [ofa-general] Can't Initialize an MT23108 HCA I thought I'd posted this question on the group before, but looking through my notes I couldn't find that I had. My apologies if this is a repeat. I installed OFED 1.3.0.0.4 on a system with an older, MT23108 HCA running 3.05 firmware. The HCA is known to work with QuickSilver. When I rebooted I got this: Feb 18 13:06:02 newberry kernel: ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) Feb 18 13:06:02 newberry kernel: ib_mthca: Initializing 0000:04:00.0 Feb 18 13:06:02 newberry kernel: ACPI: PCI interrupt 0000:04:00.0[A] -> GSI 28 (level, low) -> IRQ 217 Feb 18 13:06:02 newberry kernel: ib_mthca 0000:04:00.0: PCI device did not come back after reset, aborting. Feb 18 13:06:02 newberry kernel: ib_mthca 0000:04:00.0: Failed to reset HCA, aborting. The system is running RHEL4, update 4, x86_64. Are older HCAs supported with OFED or can only Arbel and Connect-X type HCAs usable? -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue Apr 29 09:43:42 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 29 Apr 2008 09:43:42 -0700 Subject: [ofa-general] Re: [PATCH] IB/ehca: Allocate event queue size depending on max number of CQs and QPs In-Reply-To: <200804291744.17235.ossrosch@linux.vnet.ibm.com> (Stefan Roscher's message of "Tue, 29 Apr 2008 17:44:15 +0200") References: <200804291744.17235.ossrosch@linux.vnet.ibm.com> Message-ID: > > Signed-off-by: Stefan Roscher Kind of an inadequate changelog ;) Is this a fix or an enhancement or what? > + if (atomic_read(&shca->num_cqs) >= ehca_max_cq) { > + if (atomic_read(&shca->num_qps) >= ehca_max_qp) { These are racy in the sense that multiple simultaneous calls to create_cq/create_qp might end up exceeding the ehca_max_cq limit. Is that an issue? You could close the race by using atomic_add_unless() and testing the return value (and being careful to do atomic_dec() on error paths after you bump num_cqs/num_qps). - R. From rdreier at cisco.com Tue Apr 29 09:48:25 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 29 Apr 2008 09:48:25 -0700 Subject: [ofa-general] Re: [PATCH] ib_mthca: use log values instead of numeric values when specifiying HCA resource maxes in module parameters In-Reply-To: <200804291822.57820.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Tue, 29 Apr 2008 18:22:57 +0300") References: <200804291822.57820.jackm@dev.mellanox.co.il> Message-ID: > Module parameters for overriding driver default maximum HCA resource > quantities should be log values, not numeric values -- since these > quantities should all be powers-of-2 anyway. Hmm, that's a creative answer to my objection about the mlx4 interface. given that mthca has had the old interface for nearly a year and a half, what do we gain from changing it now? > I put a check in the patch for detecting if the user specified a log or not, > to make the transition from the old method (of numbers instead of logs) > easier. Yes, that is nice. Would the plan be just to allow both methods? > Maybe add such a check to the mlx4 version, too? Definitely, and I think the mlx4 module parameter names should match too. But then it would make sense for mlx4 to allow setting parameter values by value and not by log, and then we end up with all the same code in both places, and so why not just have mlx4 set by value the same way as mthca? - R. From rdreier at cisco.com Tue Apr 29 10:02:44 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 29 Apr 2008 10:02:44 -0700 Subject: [ofa-general] Re: [ PATCH 3/3 ] RDMA/nes SFP+ cleanup In-Reply-To: <200804290426.m3T4QhJl018196@velma.neteffect.com> (Glenn Streiff's message of "Mon, 28 Apr 2008 23:26:43 -0500") References: <200804290426.m3T4QhJl018196@velma.neteffect.com> Message-ID: > Clean up the SFP+ patch. Why send a patch and then immediately a cleanup? Why not just clean the original patch? > - if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[mac_index] != NES_PHY_TYPE_PUMA_1G)) { > + if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[ mac_index ] != NES_PHY_TYPE_PUMA_1G)) { This type of change isn't a cleanup... kernel style prefers array[index] to array[ index ] and it seems most of this patch is making the change to the less-good way? - R. From pw at osc.edu Tue Apr 29 10:05:16 2008 From: pw at osc.edu (Pete Wyckoff) Date: Tue, 29 Apr 2008 13:05:16 -0400 Subject: [ofa-general] Re: [Ips] Calculating the VA in iSER header In-Reply-To: <694d48600804170413g4d54cd9g447abd345a1f6301@mail.gmail.com> References: <4804B03C.6060507@voltaire.com> <694d48600804160122l1cc97b8aka8986ee6deb7dec8@mail.gmail.com> <20080416144830.GC23861@osc.edu> <694d48600804170413g4d54cd9g447abd345a1f6301@mail.gmail.com> Message-ID: <20080429170516.GA8857@osc.edu> dorfman.eli at gmail.com wrote on Thu, 17 Apr 2008 14:13 +0300: > On Wed, Apr 16, 2008 at 6:46 PM, Roland Dreier wrote: > > > Agree with the interpretation of the spec, and it's probably a bit > > > clearer that way too. But we have working initiators and targets > > > that do it the "wrong" way. > > > > Yes... I guess the key question is whether there are any initiators that > > do things the "right" way. > > > > > > > 1. Flag day: all initiators and targets change at the same time. > > > Will see data corruption if someone unluckily runs one or the other > > > using old non-fixed code. > > > > Seems unacceptable to me... it doesn't make sense at all to break every > > setup in the world just to be "right" according to the spec. > > This will break only when both initiator and target will use > InitialR2T=No, which means allow unsolicited data. > As far as I know, STGT is not very common (and its version in RHEL5.1 > is considered experimental). Its default is also InitialR2T=Yes. > Voltaire's iSCSI over iSER target also uses default InitialR2T=Yes. > So it seems that nothing will break. I finally got a chance to look at this just now. I think you mean default is InitialR2T=No above, which means no unsolicited data. That is the default case, and true, the two different meanings of the initiator-supplied VA coincide. But you missed the impact of immediate data. We run with the defaults (I think) that say the first write request packet should be filled with a bit of the coming data stream. From iscsid.conf: # To enable immediate data (i.e., the initiator sends unsolicited data # with the iSCSI command packet), uncomment the following line: # # The default is Yes node.session.iscsi.ImmediateData = Yes Looking at the offset printed out by your patch, it is indeed non-zero for the first RDMA read. Please correct me if I am mistaken about this---you must have tested all four variations of with and without the patches on initiator and target side, but I did not. Hence I am still a bit unhappy about having to deal with the fallout, with no way to detect it. For our local use, I'll keep an older version of stgt in use until we switch to a new kernel, then merge up the target side change. It is a bother, but I can deal with it. For other institutions, this lockstep upgrade requirement will not be obvious until they debug the resulting data corruption. Still, I do understand why it would be nice to conform to the spec, and it is maybe a bit cleaner that way too. Maybe you can help with the bug reports on stgt-devel during the transition, and maintain and publish a patch to let it work with old kernels. -- Pete From gstreiff at NetEffect.com Tue Apr 29 10:16:11 2008 From: gstreiff at NetEffect.com (Glenn Streiff) Date: Tue, 29 Apr 2008 12:16:11 -0500 Subject: [ofa-general] RE: [ PATCH 3/3 ] RDMA/nes SFP+ cleanup In-Reply-To: Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC0795015C@venom2> > > Clean up the SFP+ patch. > > Why send a patch and then immediately a cleanup? Why not > just clean the > original patch? > > > - if ((nesadapter->OneG_Mode) && > (nesadapter->phy_type[mac_index] != NES_PHY_TYPE_PUMA_1G)) { > > + if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[ > mac_index ] != NES_PHY_TYPE_PUMA_1G)) { > > This type of change isn't a cleanup... kernel style prefers > > array[index] > > to > > array[ index ] > > and it seems most of this patch is making the change to the > less-good way? > > - R. > My bad, on the array index idiom. I can redo. With regard to post patch clean-ups, I recall you telling me that is was preferred to either front-load or back-load the cleanups in a patch series. I generally "cleaned-up" the entire functions rather than just the patched portion. If I do both together, then you'll get clean-up noise interspersed with functional deltas making functional review somewhat annoying in my opinion. Will be happy to redo as a single SFP patch and drop 3rd patch if that works better for you. In fact that is how I did it originally. :-) Glenn From rdreier at cisco.com Tue Apr 29 10:18:16 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 29 Apr 2008 10:18:16 -0700 Subject: [ofa-general] Re: [ PATCH 3/3 ] RDMA/nes SFP+ cleanup In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC0795015C@venom2> (Glenn Streiff's message of "Tue, 29 Apr 2008 12:16:11 -0500") References: <5E701717F2B2ED4EA60F87C8AA57B7CC0795015C@venom2> Message-ID: > My bad, on the array index idiom. I can redo. Yes, please do resend without that. > With regard to post patch clean-ups, I recall you telling me > that is was preferred to either front-load or back-load the > cleanups in a patch series. Yes, that is true. > I generally "cleaned-up" the entire functions rather than > just the patched portion. If I do both together, then you'll > get clean-up noise interspersed with functional deltas making > functional review somewhat annoying in my opinion. OK, got it. The changelog "Clean up the SFP+ patch." was misleading. - R. From gstreiff at NetEffect.com Tue Apr 29 10:23:58 2008 From: gstreiff at NetEffect.com (Glenn Streiff) Date: Tue, 29 Apr 2008 12:23:58 -0500 Subject: [ofa-general] RE: [ewg] RE: [ PATCH 3/3 ] RDMA/nes SFP+ cleanup In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC0795015C@venom2> Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC0795015D@venom2> > > > Clean up the SFP+ patch. > > > > Why send a patch and then immediately a cleanup? Why not > > just clean the > > original patch? > > > > > - if ((nesadapter->OneG_Mode) && > > (nesadapter->phy_type[mac_index] != NES_PHY_TYPE_PUMA_1G)) { > > > + if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[ > > mac_index ] != NES_PHY_TYPE_PUMA_1G)) { > > > > This type of change isn't a cleanup... kernel style prefers > > > > array[index] > > > > to > > > > array[ index ] > > > > and it seems most of this patch is making the change to the > > less-good way? > > > > - R. > > > > My bad, on the array index idiom. I can redo. > > With regard to post patch clean-ups, I recall you telling me > that is was preferred to either front-load or back-load the > cleanups in a patch series. > > I generally "cleaned-up" the entire functions rather than > just the patched portion. If I do both together, then you'll > get clean-up noise interspersed with functional deltas making > functional review somewhat annoying in my opinion. > Hmm...what I probably should of done was given a clean sfp-patch and then add peripheral cleanups to the functions as a subsequent patch. I'll go down that page. Glenn From michael.heinz at qlogic.com Tue Apr 29 10:29:44 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Tue, 29 Apr 2008 12:29:44 -0500 Subject: [ofa-general] Can't Initialize an MT23108 HCA In-Reply-To: <1E3DCD1C63492545881FACB6063A57C10257C6DE@mtiexch01.mti.com> References: <1E3DCD1C63492545881FACB6063A57C10257C6DE@mtiexch01.mti.com> Message-ID: Hey, Boris, I am, indeed, running current firmware. I'll try to isolate variables and see if I can focus on why this machine has a problem. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania ________________________________ From: Boris Shpolyansky [mailto:boris at mellanox.com] Sent: Tuesday, April 29, 2008 12:42 PM To: Mike Heinz; general at lists.openfabrics.org Subject: RE: [ofa-general] Can't Initialize an MT23108 HCA Hi Michael, MT23108 HCAs are supported. Please, update FW on your card to the latest version available from Mellanox web site at http://www.mellanox.com/support/firmware_table_IH.php Follow FW burning instructions provided there. Regards, Boris Shpolyansky Sr. Member of Technical Staff Applications Mellanox Technologies Inc. 2900 Stender Way Santa Clara, CA 95054 Tel.: (408) 916 0014 Fax: (408) 970 3403 Cell: (408) 834 9365 www.mellanox.com ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Mike Heinz Sent: Tuesday, April 29, 2008 9:38 AM To: general at lists.openfabrics.org Subject: [ofa-general] Can't Initialize an MT23108 HCA I thought I'd posted this question on the group before, but looking through my notes I couldn't find that I had. My apologies if this is a repeat. I installed OFED 1.3.0.0.4 on a system with an older, MT23108 HCA running 3.05 firmware. The HCA is known to work with QuickSilver. When I rebooted I got this: Feb 18 13:06:02 newberry kernel: ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) Feb 18 13:06:02 newberry kernel: ib_mthca: Initializing 0000:04:00.0 Feb 18 13:06:02 newberry kernel: ACPI: PCI interrupt 0000:04:00.0[A] -> GSI 28 (level, low) -> IRQ 217 Feb 18 13:06:02 newberry kernel: ib_mthca 0000:04:00.0: PCI device did not come back after reset, aborting. Feb 18 13:06:02 newberry kernel: ib_mthca 0000:04:00.0: Failed to reset HCA, aborting. The system is running RHEL4, update 4, x86_64. Are older HCAs supported with OFED or can only Arbel and Connect-X type HCAs usable? -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue Apr 29 10:33:37 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 29 Apr 2008 10:33:37 -0700 Subject: [ofa-general] Re: [PATCH} IB/iSER: Move high-volume debug output to higher debug levels In-Reply-To: <694d48600804280501q3cf74a10p2e1b73b4ac0d3d27@mail.gmail.com> (Eli Dorfman's message of "Mon, 28 Apr 2008 15:01:33 +0300") References: <694d48600804280501q3cf74a10p2e1b73b4ac0d3d27@mail.gmail.com> Message-ID: > +module_param_named(debug_level, iser_debug_level, int, > S_IRUGO|S_IWUSR|S_IWGRP); > +MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0 > (default:disabled)"); In addition to being line-wrapped this looks really funny... why add S_IWGRP? The ownership of parameter files is root:root so what do you get from changing from the current 0644 permissions? I applied the patch without this change, if there is a reason for this, please send the permission change separately. - R. From boris at mellanox.com Tue Apr 29 10:33:41 2008 From: boris at mellanox.com (Boris Shpolyansky) Date: Tue, 29 Apr 2008 10:33:41 -0700 Subject: [ofa-general] Can't Initialize an MT23108 HCA In-Reply-To: Message-ID: <1E3DCD1C63492545881FACB6063A57C10257C6F4@mtiexch01.mti.com> Hi Michael, You mentioned FW version 3.05, while the latest is 3.5.0. Please, verify. Boris ________________________________ From: Mike Heinz [mailto:michael.heinz at qlogic.com] Sent: Tuesday, April 29, 2008 10:30 AM To: Boris Shpolyansky; general at lists.openfabrics.org Subject: RE: [ofa-general] Can't Initialize an MT23108 HCA Hey, Boris, I am, indeed, running current firmware. I'll try to isolate variables and see if I can focus on why this machine has a problem. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania ________________________________ From: Boris Shpolyansky [mailto:boris at mellanox.com] Sent: Tuesday, April 29, 2008 12:42 PM To: Mike Heinz; general at lists.openfabrics.org Subject: RE: [ofa-general] Can't Initialize an MT23108 HCA Hi Michael, MT23108 HCAs are supported. Please, update FW on your card to the latest version available from Mellanox web site at http://www.mellanox.com/support/firmware_table_IH.php Follow FW burning instructions provided there. Regards, Boris Shpolyansky Sr. Member of Technical Staff Applications Mellanox Technologies Inc. 2900 Stender Way Santa Clara, CA 95054 Tel.: (408) 916 0014 Fax: (408) 970 3403 Cell: (408) 834 9365 www.mellanox.com ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Mike Heinz Sent: Tuesday, April 29, 2008 9:38 AM To: general at lists.openfabrics.org Subject: [ofa-general] Can't Initialize an MT23108 HCA I thought I'd posted this question on the group before, but looking through my notes I couldn't find that I had. My apologies if this is a repeat. I installed OFED 1.3.0.0.4 on a system with an older, MT23108 HCA running 3.05 firmware. The HCA is known to work with QuickSilver. When I rebooted I got this: Feb 18 13:06:02 newberry kernel: ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) Feb 18 13:06:02 newberry kernel: ib_mthca: Initializing 0000:04:00.0 Feb 18 13:06:02 newberry kernel: ACPI: PCI interrupt 0000:04:00.0[A] -> GSI 28 (level, low) -> IRQ 217 Feb 18 13:06:02 newberry kernel: ib_mthca 0000:04:00.0: PCI device did not come back after reset, aborting. Feb 18 13:06:02 newberry kernel: ib_mthca 0000:04:00.0: Failed to reset HCA, aborting. The system is running RHEL4, update 4, x86_64. Are older HCAs supported with OFED or can only Arbel and Connect-X type HCAs usable? -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue Apr 29 10:36:05 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 29 Apr 2008 10:36:05 -0700 Subject: [ofa-general] Re: [PATCH] IB/iSER: Count fmr alignment violations per session In-Reply-To: <694d48600804290033k61f717f7ob97d33b27e4c236f@mail.gmail.com> (Eli Dorfman's message of "Tue, 29 Apr 2008 10:33:07 +0300") References: <694d48600804290033k61f717f7ob97d33b27e4c236f@mail.gmail.com> Message-ID: thanks, applied From michael.heinz at qlogic.com Tue Apr 29 10:46:20 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Tue, 29 Apr 2008 12:46:20 -0500 Subject: [ofa-general] Can't Initialize an MT23108 HCA In-Reply-To: <1E3DCD1C63492545881FACB6063A57C10257C6F4@mtiexch01.mti.com> References: <1E3DCD1C63492545881FACB6063A57C10257C6F4@mtiexch01.mti.com> Message-ID: Boris, The HCA is a Silverstorm version that reports to QuickSilver as "3.05.0000rc01" which is equivalent to Mellanox "3.5.000". The "rc01" means it was the first (and only) OEM build of that firmware. I just compared the MLX files for the QuickSilver firmware and one I just downloaded from Mellanox and they are identical. I'm looking for another suitable machine to see if I get the same behavior. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania ________________________________ From: Boris Shpolyansky [mailto:boris at mellanox.com] Sent: Tuesday, April 29, 2008 1:34 PM To: Mike Heinz; general at lists.openfabrics.org Subject: RE: [ofa-general] Can't Initialize an MT23108 HCA Hi Michael, You mentioned FW version 3.05, while the latest is 3.5.0. Please, verify. Boris ________________________________ From: Mike Heinz [mailto:michael.heinz at qlogic.com] Sent: Tuesday, April 29, 2008 10:30 AM To: Boris Shpolyansky; general at lists.openfabrics.org Subject: RE: [ofa-general] Can't Initialize an MT23108 HCA Hey, Boris, I am, indeed, running current firmware. I'll try to isolate variables and see if I can focus on why this machine has a problem. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania ________________________________ From: Boris Shpolyansky [mailto:boris at mellanox.com] Sent: Tuesday, April 29, 2008 12:42 PM To: Mike Heinz; general at lists.openfabrics.org Subject: RE: [ofa-general] Can't Initialize an MT23108 HCA Hi Michael, MT23108 HCAs are supported. Please, update FW on your card to the latest version available from Mellanox web site at http://www.mellanox.com/support/firmware_table_IH.php Follow FW burning instructions provided there. Regards, Boris Shpolyansky Sr. Member of Technical Staff Applications Mellanox Technologies Inc. 2900 Stender Way Santa Clara, CA 95054 Tel.: (408) 916 0014 Fax: (408) 970 3403 Cell: (408) 834 9365 www.mellanox.com ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Mike Heinz Sent: Tuesday, April 29, 2008 9:38 AM To: general at lists.openfabrics.org Subject: [ofa-general] Can't Initialize an MT23108 HCA I thought I'd posted this question on the group before, but looking through my notes I couldn't find that I had. My apologies if this is a repeat. I installed OFED 1.3.0.0.4 on a system with an older, MT23108 HCA running 3.05 firmware. The HCA is known to work with QuickSilver. When I rebooted I got this: Feb 18 13:06:02 newberry kernel: ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) Feb 18 13:06:02 newberry kernel: ib_mthca: Initializing 0000:04:00.0 Feb 18 13:06:02 newberry kernel: ACPI: PCI interrupt 0000:04:00.0[A] -> GSI 28 (level, low) -> IRQ 217 Feb 18 13:06:02 newberry kernel: ib_mthca 0000:04:00.0: PCI device did not come back after reset, aborting. Feb 18 13:06:02 newberry kernel: ib_mthca 0000:04:00.0: Failed to reset HCA, aborting. The system is running RHEL4, update 4, x86_64. Are older HCAs supported with OFED or can only Arbel and Connect-X type HCAs usable? -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue Apr 29 10:48:32 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 29 Apr 2008 10:48:32 -0700 Subject: [ofa-general] Can't Initialize an MT23108 HCA In-Reply-To: (Mike Heinz's message of "Tue, 29 Apr 2008 12:46:20 -0500") References: <1E3DCD1C63492545881FACB6063A57C10257C6F4@mtiexch01.mti.com> Message-ID: > I'm looking for another suitable machine to see if I get the same > behavior. What are the details of the machine where you're seeing this problem? I seem to recall some ancient Dell systems had problems with PCI-X HCAs not reappearing on PCI after an HCA reset. Also it might be worth checking that your BIOS is up-to-date. - R. From michael.heinz at qlogic.com Tue Apr 29 10:51:10 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Tue, 29 Apr 2008 12:51:10 -0500 Subject: [ofa-general] Can't Initialize an MT23108 HCA In-Reply-To: References: <1E3DCD1C63492545881FACB6063A57C10257C6F4@mtiexch01.mti.com> Message-ID: It's an older Opteron box, circa 2004-2005 or so. I'm trying to find an Intel box I can test with. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Roland Dreier [mailto:rdreier at cisco.com] Sent: Tuesday, April 29, 2008 1:49 PM To: Mike Heinz Cc: Boris Shpolyansky; general at lists.openfabrics.org Subject: Re: [ofa-general] Can't Initialize an MT23108 HCA > I'm looking for another suitable machine to see if I get the same > behavior. What are the details of the machine where you're seeing this problem? I seem to recall some ancient Dell systems had problems with PCI-X HCAs not reappearing on PCI after an HCA reset. Also it might be worth checking that your BIOS is up-to-date. - R. From rdreier at cisco.com Tue Apr 29 11:10:32 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 29 Apr 2008 11:10:32 -0700 Subject: [ofa-general] Re: [PATCH v2] IB/ipoib: Split CQs for IPOIB UD In-Reply-To: <1209370487.11248.1.camel@mtls03> (Eli Cohen's message of "Mon, 28 Apr 2008 11:14:47 +0300") References: <1209370487.11248.1.camel@mtls03> Message-ID: Thanks, applied, with some fixups -- this patch seemed to be against some tree I don't have, maybe OFED?? for example: > + if (unlikely(priv->tx_outstanding > MAX_SEND_CQE)) > + poll_tx(priv); > + > return; > > drop: I didn't see any version of ipoib_ib.c ever in the kernel tree that had a drop: label. - R. From rdreier at cisco.com Tue Apr 29 11:15:17 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 29 Apr 2008 11:15:17 -0700 Subject: [ofa-general][PATCH] Re: mlx4: Completion EQ per cpu (MP support, Patch 10) In-Reply-To: (Hoang-Nam Nguyen's message of "Tue, 29 Apr 2008 17:40:52 +0200") References: Message-ID: > >CQ are not per port but netdevices are bounded to port (its correct that > >few of them can be bounded to the same port, eg with different PKEYs or > >VLAN tags), maybe it worth thinking on API that either let the ULP > >dictate to what CPU/core they want the EQ serving this CQ direct its > >interrupts or if the ULP doesn't care, let the driver allocate that in > >round robin fashion. > We've had some ehca code doing round robin scheme, which is an ehca > specific > policy. > Do you have any thoughts on the approach you want to pursue? I would just like to see an approach that is fully thought through and gives a way for applications/kernel drivers to choose a CQ vector based on some information about what CPU it will go to. If we want to add a way to allow a request for round-robin, that is fine, but I don't think we want to change the default to round-robin, unless someone can come up with a workload where it actually helps. > Will it be 2.6.26 or 2.6.27 instead? Given that we always seem to start this discussion at the end of the merge window, and then no one follows up, it may be never... certainly not 2.6.26 at this point. - R. From ossrosch at linux.vnet.ibm.com Tue Apr 29 11:15:36 2008 From: ossrosch at linux.vnet.ibm.com (Stefan Roscher) Date: Tue, 29 Apr 2008 20:15:36 +0200 Subject: [ofa-general] [REPOST][PATCH] IB/ehca: Allocate event queue size depending on max number of CQs and QPs In-Reply-To: References: <200804291744.17235.ossrosch@linux.vnet.ibm.com> Message-ID: <200804292015.38321.ossrosch@linux.vnet.ibm.com> If a lot of QPs fall into Error state at once and the EQ of the respective HCA is too small, it might overrun, causing the eHCA driver to stop processing completion events and call application software's completion handlers, effectively causing traffic to stop. Fix this by limiting available QPs and CQs to a customizable max count, and determining EQ size based on these counts and a worst-case assumption. Signed-off-by: Stefan Roscher --- Reposted based on Roland's comments: - use atomic_add_unless instead of atomic_read - inf% changelog increase ;) drivers/infiniband/hw/ehca/ehca_classes.h | 5 ++++ drivers/infiniband/hw/ehca/ehca_cq.c | 11 +++++++++ drivers/infiniband/hw/ehca/ehca_main.c | 36 +++++++++++++++++++++++++++- drivers/infiniband/hw/ehca/ehca_qp.c | 26 +++++++++++++++++++- 4 files changed, 74 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h index 3d6d946..00bab60 100644 --- a/drivers/infiniband/hw/ehca/ehca_classes.h +++ b/drivers/infiniband/hw/ehca/ehca_classes.h @@ -66,6 +66,7 @@ struct ehca_av; #include "ehca_irq.h" #define EHCA_EQE_CACHE_SIZE 20 +#define EHCA_MAX_NUM_QUEUES 0xffff struct ehca_eqe_cache_entry { struct ehca_eqe *eqe; @@ -127,6 +128,8 @@ struct ehca_shca { /* MR pgsize: bit 0-3 means 4K, 64K, 1M, 16M respectively */ u32 hca_cap_mr_pgsize; int max_mtu; + atomic_t num_cqs; + atomic_t num_qps; }; struct ehca_pd { @@ -344,6 +347,8 @@ extern int ehca_use_hp_mr; extern int ehca_scaling_code; extern int ehca_lock_hcalls; extern int ehca_nr_ports; +extern int ehca_max_cq; +extern int ehca_max_qp; struct ipzu_queue_resp { u32 qe_size; /* queue entry size */ diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c b/drivers/infiniband/hw/ehca/ehca_cq.c index ec0cfcf..5540b27 100644 --- a/drivers/infiniband/hw/ehca/ehca_cq.c +++ b/drivers/infiniband/hw/ehca/ehca_cq.c @@ -132,10 +132,19 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector, if (cqe >= 0xFFFFFFFF - 64 - additional_cqe) return ERR_PTR(-EINVAL); + if (!atomic_add_unless(&shca->num_cqs, 1, ehca_max_cq)) { + ehca_err(device, "Unable to create CQ, max number of %i " + "CQs reached.", ehca_max_cq); + ehca_err(device, "To increase the maximum number of CQs " + "use the number_of_cqs module parameter.\n"); + return ERR_PTR(-ENOSPC); + } + my_cq = kmem_cache_zalloc(cq_cache, GFP_KERNEL); if (!my_cq) { ehca_err(device, "Out of memory for ehca_cq struct device=%p", device); + atomic_dec(&shca->num_cqs); return ERR_PTR(-ENOMEM); } @@ -305,6 +314,7 @@ create_cq_exit2: create_cq_exit1: kmem_cache_free(cq_cache, my_cq); + atomic_dec(&shca->num_cqs); return cq; } @@ -359,6 +369,7 @@ int ehca_destroy_cq(struct ib_cq *cq) ipz_queue_dtor(NULL, &my_cq->ipz_queue); kmem_cache_free(cq_cache, my_cq); + atomic_dec(&shca->num_cqs); return 0; } diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index 6504897..482103e 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -68,6 +68,8 @@ int ehca_port_act_time = 30; int ehca_static_rate = -1; int ehca_scaling_code = 0; int ehca_lock_hcalls = -1; +int ehca_max_cq = -1; +int ehca_max_qp = -1; module_param_named(open_aqp1, ehca_open_aqp1, bool, S_IRUGO); module_param_named(debug_level, ehca_debug_level, int, S_IRUGO); @@ -79,6 +81,8 @@ module_param_named(poll_all_eqs, ehca_poll_all_eqs, bool, S_IRUGO); module_param_named(static_rate, ehca_static_rate, int, S_IRUGO); module_param_named(scaling_code, ehca_scaling_code, bool, S_IRUGO); module_param_named(lock_hcalls, ehca_lock_hcalls, bool, S_IRUGO); +module_param_named(number_of_cqs, ehca_max_cq, int, S_IRUGO); +module_param_named(number_of_qps, ehca_max_qp, int, S_IRUGO); MODULE_PARM_DESC(open_aqp1, "Open AQP1 on startup (default: no)"); @@ -104,6 +108,12 @@ MODULE_PARM_DESC(scaling_code, MODULE_PARM_DESC(lock_hcalls, "Serialize all hCalls made by the driver " "(default: autodetect)"); +MODULE_PARM_DESC(number_of_cqs, + "Max number of CQs which can be allocated " + "(default: autodetect)"); +MODULE_PARM_DESC(number_of_qps, + "Max number of QPs which can be allocated " + "(default: autodetect)"); DEFINE_RWLOCK(ehca_qp_idr_lock); DEFINE_RWLOCK(ehca_cq_idr_lock); @@ -355,6 +365,25 @@ static int ehca_sense_attributes(struct ehca_shca *shca) if (rblock->memory_page_size_supported & pgsize_map[i]) shca->hca_cap_mr_pgsize |= pgsize_map[i + 1]; + /* Set maximum number of CQs and QPs to calculate EQ size */ + if (ehca_max_qp == -1) + ehca_max_qp = min_t(int, rblock->max_qp, EHCA_MAX_NUM_QUEUES); + else if (ehca_max_qp < 1 || ehca_max_qp > rblock->max_qp) { + ehca_gen_err("Requested number of QPs is out of range (1 - %i) " + "specified by HW", rblock->max_qp); + ret = -EINVAL; + goto sense_attributes1; + } + + if (ehca_max_cq == -1) + ehca_max_cq = min_t(int, rblock->max_cq, EHCA_MAX_NUM_QUEUES); + else if (ehca_max_cq < 1 || ehca_max_cq > rblock->max_cq) { + ehca_gen_err("Requested number of CQs is out of range (1 - %i) " + "specified by HW", rblock->max_cq); + ret = -EINVAL; + goto sense_attributes1; + } + /* query max MTU from first port -- it's the same for all ports */ port = (struct hipz_query_port *)rblock; h_ret = hipz_h_query_port(shca->ipz_hca_handle, 1, port); @@ -684,7 +713,7 @@ static int __devinit ehca_probe(struct of_device *dev, struct ehca_shca *shca; const u64 *handle; struct ib_pd *ibpd; - int ret, i; + int ret, i, eq_size; handle = of_get_property(dev->node, "ibm,hca-handle", NULL); if (!handle) { @@ -705,6 +734,8 @@ static int __devinit ehca_probe(struct of_device *dev, return -ENOMEM; } mutex_init(&shca->modify_mutex); + atomic_set(&shca->num_cqs, 0); + atomic_set(&shca->num_qps, 0); for (i = 0; i < ARRAY_SIZE(shca->sport); i++) spin_lock_init(&shca->sport[i].mod_sqp_lock); @@ -724,8 +755,9 @@ static int __devinit ehca_probe(struct of_device *dev, goto probe1; } + eq_size = 2 * ehca_max_cq + 4 * ehca_max_qp; /* create event queues */ - ret = ehca_create_eq(shca, &shca->eq, EHCA_EQ, 2048); + ret = ehca_create_eq(shca, &shca->eq, EHCA_EQ, eq_size); if (ret) { ehca_err(&shca->ib_device, "Cannot create EQ."); goto probe1; diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c index 57bef11..18fba92 100644 --- a/drivers/infiniband/hw/ehca/ehca_qp.c +++ b/drivers/infiniband/hw/ehca/ehca_qp.c @@ -421,8 +421,18 @@ static struct ehca_qp *internal_create_qp( u32 swqe_size = 0, rwqe_size = 0, ib_qp_num; unsigned long flags; - if (init_attr->create_flags) + if (!atomic_add_unless(&shca->num_qps, 1, ehca_max_qp)) { + ehca_err(pd->device, "Unable to create QP, max number of %i " + "QPs reached.", ehca_max_qp); + ehca_err(pd->device, "To increase the maximum number of QPs " + "use the number_of_qps module parameter.\n"); + return ERR_PTR(-ENOSPC); + } + + if (init_attr->create_flags) { + atomic_dec(&shca->num_qps); return ERR_PTR(-EINVAL); + } memset(&parms, 0, sizeof(parms)); qp_type = init_attr->qp_type; @@ -431,6 +441,7 @@ static struct ehca_qp *internal_create_qp( init_attr->sq_sig_type != IB_SIGNAL_ALL_WR) { ehca_err(pd->device, "init_attr->sg_sig_type=%x not allowed", init_attr->sq_sig_type); + atomic_dec(&shca->num_qps); return ERR_PTR(-EINVAL); } @@ -455,6 +466,7 @@ static struct ehca_qp *internal_create_qp( if (is_llqp && has_srq) { ehca_err(pd->device, "LLQPs can't have an SRQ"); + atomic_dec(&shca->num_qps); return ERR_PTR(-EINVAL); } @@ -466,6 +478,7 @@ static struct ehca_qp *internal_create_qp( ehca_err(pd->device, "no more than three SGEs " "supported for SRQ pd=%p max_sge=%x", pd, init_attr->cap.max_recv_sge); + atomic_dec(&shca->num_qps); return ERR_PTR(-EINVAL); } } @@ -477,6 +490,7 @@ static struct ehca_qp *internal_create_qp( qp_type != IB_QPT_SMI && qp_type != IB_QPT_GSI) { ehca_err(pd->device, "wrong QP Type=%x", qp_type); + atomic_dec(&shca->num_qps); return ERR_PTR(-EINVAL); } @@ -490,6 +504,7 @@ static struct ehca_qp *internal_create_qp( "or max_rq_wr=%x for RC LLQP", init_attr->cap.max_send_wr, init_attr->cap.max_recv_wr); + atomic_dec(&shca->num_qps); return ERR_PTR(-EINVAL); } break; @@ -497,6 +512,7 @@ static struct ehca_qp *internal_create_qp( if (!EHCA_BMASK_GET(HCA_CAP_UD_LL_QP, shca->hca_cap)) { ehca_err(pd->device, "UD LLQP not supported " "by this adapter"); + atomic_dec(&shca->num_qps); return ERR_PTR(-ENOSYS); } if (!(init_attr->cap.max_send_sge <= 5 @@ -508,20 +524,22 @@ static struct ehca_qp *internal_create_qp( "or max_recv_sge=%x for UD LLQP", init_attr->cap.max_send_sge, init_attr->cap.max_recv_sge); + atomic_dec(&shca->num_qps); return ERR_PTR(-EINVAL); } else if (init_attr->cap.max_send_wr > 255) { ehca_err(pd->device, "Invalid Number of " "max_send_wr=%x for UD QP_TYPE=%x", init_attr->cap.max_send_wr, qp_type); + atomic_dec(&shca->num_qps); return ERR_PTR(-EINVAL); } break; default: ehca_err(pd->device, "unsupported LL QP Type=%x", qp_type); + atomic_dec(&shca->num_qps); return ERR_PTR(-EINVAL); - break; } } else { int max_sge = (qp_type == IB_QPT_UD || qp_type == IB_QPT_SMI @@ -533,6 +551,7 @@ static struct ehca_qp *internal_create_qp( "send_sge=%x recv_sge=%x max_sge=%x", init_attr->cap.max_send_sge, init_attr->cap.max_recv_sge, max_sge); + atomic_dec(&shca->num_qps); return ERR_PTR(-EINVAL); } } @@ -543,6 +562,7 @@ static struct ehca_qp *internal_create_qp( my_qp = kmem_cache_zalloc(qp_cache, GFP_KERNEL); if (!my_qp) { ehca_err(pd->device, "pd=%p not enough memory to alloc qp", pd); + atomic_dec(&shca->num_qps); return ERR_PTR(-ENOMEM); } @@ -823,6 +843,7 @@ create_qp_exit1: create_qp_exit0: kmem_cache_free(qp_cache, my_qp); + atomic_dec(&shca->num_qps); return ERR_PTR(ret); } @@ -1948,6 +1969,7 @@ static int internal_destroy_qp(struct ib_device *dev, struct ehca_qp *my_qp, if (HAS_SQ(my_qp)) ipz_queue_dtor(my_pd, &my_qp->ipz_squeue); kmem_cache_free(qp_cache, my_qp); + atomic_dec(&shca->num_qps); return 0; } -- 1.5.5 From rdreier at cisco.com Tue Apr 29 11:18:05 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 29 Apr 2008 11:18:05 -0700 Subject: [ofa-general] Re: [PATCH] IB/ipoib: set child MTU as the parent's In-Reply-To: <1209460653.28929.1.camel@mtls03> (Eli Cohen's message of "Tue, 29 Apr 2008 12:17:33 +0300") References: <1209460653.28929.1.camel@mtls03> Message-ID: > When the child joins the broadcast group reset the mtu to > the real one. This changelog is a little too short for me to understand what this is fixing. It seems that child devices are left with a bogus MTU until they complete their multicast join, is that it? > + priv->dev->mtu = IPOIB_UD_MTU(priv->max_ib_mtu); > + priv->mcast_mtu = priv->admin_mtu = priv->dev->mtu; Do child devices also need to copy over the checksum offload/LSO stuff from the parent? - R. From rdreier at cisco.com Tue Apr 29 11:20:28 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 29 Apr 2008 11:20:28 -0700 Subject: [ofa-general] Re: [REPOST][PATCH] IB/ehca: Allocate event queue size depending on max number of CQs and QPs In-Reply-To: <200804292015.38321.ossrosch@linux.vnet.ibm.com> (Stefan Roscher's message of "Tue, 29 Apr 2008 20:15:36 +0200") References: <200804291744.17235.ossrosch@linux.vnet.ibm.com> <200804292015.38321.ossrosch@linux.vnet.ibm.com> Message-ID: thanks, makes sense, applied. fast turnaround too ;) From rdreier at cisco.com Tue Apr 29 11:25:49 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 29 Apr 2008 11:25:49 -0700 Subject: [ofa-general] Re: [PATCH 2/8]: mthca/mlx4: avoid recycling old FMR R_Keys too soon In-Reply-To: <200804241109.52448.okir@lst.de> (Olaf Kirch's message of "Thu, 24 Apr 2008 11:09:51 +0200") References: <200804241106.57172.okir@lst.de> <200804241108.58748.okir@lst.de> <200804241109.52448.okir@lst.de> Message-ID: > Content-Transfer-Encoding: quoted-printable ugh, mangled patch. simple enough that I applied it by hand as separate patches to mthca and mlx4. - R. From a-b-j at aeroandspaceusa.com Tue Apr 29 11:30:48 2008 From: a-b-j at aeroandspaceusa.com (Andre Hargrove) Date: Tue, 29 Apr 2008 19:30:48 +0100 Subject: [ofa-general] i still remember you Message-ID: <01c8aa2f$86b12400$c9171ac3@a-b-j> Hello! I am tired today. I am nice girl that would like to chat with you. Email me at Julia at themayle.cn only, because I am using my friend's email to write this. Will send some of my pictures From michael.heinz at qlogic.com Tue Apr 29 11:59:28 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Tue, 29 Apr 2008 13:59:28 -0500 Subject: [ofa-general] Can't Initialize an MT23108 HCA In-Reply-To: References: <1E3DCD1C63492545881FACB6063A57C10257C6F4@mtiexch01.mti.com> Message-ID: Roland, Boris, Good news for you, bad news for me. When I switched a different machine over to a Tavor HCA, the HCA came up as expected. So, the problem is either with the particular machine or particular HCA. I'll keep playing to see if I can isolate the important factor, but it doesn't look like an OFED problem. (unless it's a problem with early Opteron boxes or something like that...) -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Roland Dreier [mailto:rdreier at cisco.com] Sent: Tuesday, April 29, 2008 1:49 PM To: Mike Heinz Cc: Boris Shpolyansky; general at lists.openfabrics.org Subject: Re: [ofa-general] Can't Initialize an MT23108 HCA > I'm looking for another suitable machine to see if I get the same > behavior. What are the details of the machine where you're seeing this problem? I seem to recall some ancient Dell systems had problems with PCI-X HCAs not reappearing on PCI after an HCA reset. Also it might be worth checking that your BIOS is up-to-date. - R. From eli at dev.mellanox.co.il Tue Apr 29 12:25:18 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Tue, 29 Apr 2008 22:25:18 +0300 Subject: [ofa-general] Re: [PATCH] IB/ipoib: set child MTU as the parent's In-Reply-To: References: <1209460653.28929.1.camel@mtls03> Message-ID: <4e6a6b3c0804291225v465e02a4u725018431e94d038@mail.gmail.com> On Tue, Apr 29, 2008 at 9:18 PM, Roland Dreier wrote: > > This changelog is a little too short for me to understand what this is > fixing. It seems that child devices are left with a bogus MTU until > they complete their multicast join, is that it? The situation is even worse since even when multicast join completes, the device's MTU will not be updated since the following statment dev->mtu = min(priv->mcast_mtu, priv->admin_mtu); at ipoib_mcast_join_task() yields zero since admin mtu is zero. > > Do child devices also need to copy over the checksum offload/LSO stuff > from the parent? > I think they do but it would require using two fields for flags at the private data. priv->flags would save flags that relate to the state of the net device, and say, priv->cap_flags, to save stuff like LRO, checksum or any other stuff related to capabilities. What do you think? From rdreier at cisco.com Tue Apr 29 12:46:47 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 29 Apr 2008 12:46:47 -0700 Subject: [ofa-general] Re: [PATCH] IB/ipoib: set child MTU as the parent's In-Reply-To: <4e6a6b3c0804291225v465e02a4u725018431e94d038@mail.gmail.com> (Eli Cohen's message of "Tue, 29 Apr 2008 22:25:18 +0300") References: <1209460653.28929.1.camel@mtls03> <4e6a6b3c0804291225v465e02a4u725018431e94d038@mail.gmail.com> Message-ID: > I think they do but it would require using two fields for flags at the > private data. priv->flags would > save flags that relate to the state of the net device, and say, > priv->cap_flags, to save stuff like LRO, > checksum or any other stuff related to capabilities. What do you think? We could do that or just copy only the flags that should be copied when creating a child device. - R. From eli at dev.mellanox.co.il Tue Apr 29 12:47:12 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Tue, 29 Apr 2008 22:47:12 +0300 Subject: [ofa-general] Re: [PATCH v2] IB/ipoib: Split CQs for IPOIB UD In-Reply-To: References: <1209370487.11248.1.camel@mtls03> Message-ID: <4e6a6b3c0804291247g5fc3cd6dw357d7a877f48ceee@mail.gmail.com> > > I didn't see any version of ipoib_ib.c ever in the kernel tree that had > a drop: label. > I must have had some patches stacked in my git tree. Thanks. From eli at dev.mellanox.co.il Tue Apr 29 12:50:19 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Tue, 29 Apr 2008 22:50:19 +0300 Subject: [ofa-general] Re: [PATCH] IB/ipoib: set child MTU as the parent's In-Reply-To: References: <1209460653.28929.1.camel@mtls03> <4e6a6b3c0804291225v465e02a4u725018431e94d038@mail.gmail.com> Message-ID: <4e6a6b3c0804291250p2e2cb4dfk4319f25f04bf13c7@mail.gmail.com> > > We could do that or just copy only the flags that should be copied when > creating a child device. > Or we could define a "clone" function that will have the wisdom of which flags to copy. From rdreier at cisco.com Tue Apr 29 13:19:22 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 29 Apr 2008 13:19:22 -0700 Subject: [ofa-general] Re: [PATCH] IB/ipoib: set child MTU as the parent's In-Reply-To: <1209460653.28929.1.camel@mtls03> (Eli Cohen's message of "Tue, 29 Apr 2008 12:17:33 +0300") References: <1209460653.28929.1.camel@mtls03> Message-ID: anyway, I applied this at least. From gstreiff at neteffect.com Tue Apr 29 13:24:30 2008 From: gstreiff at neteffect.com (Glenn Streiff) Date: Tue, 29 Apr 2008 15:24:30 -0500 Subject: [ofa-general] [ PATCH 2/3 v2 ] RDMA/nes SFP+ enablement Message-ID: <200804292024.m3TKOU3w023065@velma.neteffect.com> From: Eric Schneider This patch enables the iw_nes module for NetEffect RNICs to support additional PHYs including SFP+ optical transceivers (referred to as ARGUS in the code). Signed-off-by: Eric Schneider Signed-off-by: Glenn Streiff --- Roland, here is the cleaned up sfp patch. drivers/infiniband/hw/nes/nes.h | 4 - drivers/infiniband/hw/nes/nes_hw.c | 221 +++++++++++++++++++++++++++++---- drivers/infiniband/hw/nes/nes_hw.h | 6 + drivers/infiniband/hw/nes/nes_nic.c | 72 +++++++---- drivers/infiniband/hw/nes/nes_utils.c | 10 - 5 files changed, 249 insertions(+), 64 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h index 484b5e3..1f9f7bf 100644 --- a/drivers/infiniband/hw/nes/nes.h +++ b/drivers/infiniband/hw/nes/nes.h @@ -536,8 +536,8 @@ int nes_register_ofa_device(struct nes_i int nes_read_eeprom_values(struct nes_device *, struct nes_adapter *); void nes_write_1G_phy_reg(struct nes_device *, u8, u8, u16); void nes_read_1G_phy_reg(struct nes_device *, u8, u8, u16 *); -void nes_write_10G_phy_reg(struct nes_device *, u16, u8, u16); -void nes_read_10G_phy_reg(struct nes_device *, u16, u8); +void nes_write_10G_phy_reg(struct nes_device *, u16, u8, u16, u16); +void nes_read_10G_phy_reg(struct nes_device *, u8, u8, u16); struct nes_cqp_request *nes_get_cqp_request(struct nes_device *); void nes_post_cqp_request(struct nes_device *, struct nes_cqp_request *, int); int nes_arp_table(struct nes_device *, u32, u8 *, u32); diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index 197eee9..0887ed5 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -1208,11 +1208,16 @@ int nes_init_phy(struct nes_device *nesd { struct nes_adapter *nesadapter = nesdev->nesadapter; u32 counter = 0; + u32 sds_common_control0; u32 mac_index = nesdev->mac_index; - u32 tx_config; + u32 tx_config = 0; u16 phy_data; + u32 temp_phy_data = 0; + u32 temp_phy_data2 = 0; + u32 i = 0; - if (nesadapter->OneG_Mode) { + if ((nesadapter->OneG_Mode) && + (nesadapter->phy_type[mac_index] != NES_PHY_TYPE_PUMA_1G)) { nes_debug(NES_DBG_PHY, "1G PHY, mac_index = %d.\n", mac_index); if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_1G) { printk(PFX "%s: Programming mdc config for 1G\n", __func__); @@ -1278,12 +1283,116 @@ int nes_init_phy(struct nes_device *nesd nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], &phy_data); nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], phy_data | 0x0300); } else { - if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_IRIS) { + if ((nesadapter->phy_type[mac_index] == NES_PHY_TYPE_IRIS) || + (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_ARGUS)) { /* setup 10G MDIO operation */ tx_config = nes_read_indexed(nesdev, NES_IDX_MAC_TX_CONFIG); tx_config |= 0x14; nes_write_indexed(nesdev, NES_IDX_MAC_TX_CONFIG, tx_config); } + if ((nesadapter->phy_type[mac_index] == NES_PHY_TYPE_ARGUS)) { + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee); + + temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + mdelay(10); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee); + temp_phy_data2 = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + + /* if firmware is already running (like from a driver un-load/load, don't do anything. */ + if (temp_phy_data == temp_phy_data2) { + /* configure QT2505 AMCC PHY */ + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0x0000, 0x8000); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc300, 0x0000); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc302, 0x0044); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc318, 0x0052); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc319, 0x0008); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc31a, 0x0098); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0026, 0x0E00); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0027, 0x0000); + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0028, 0xA528); + + /* + * remove micro from reset; chip boots from ROM, + * uploads EEPROM f/w image, uC executes f/w + */ + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc300, 0x0002); + + /* wait for heart beat to start to know loading is done */ + counter = 0; + do { + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee); + temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + if (counter++ > 1000) { + nes_debug(NES_DBG_PHY, "AMCC PHY- breaking from heartbeat check \n"); + break; + } + mdelay(100); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee); + temp_phy_data2 = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + } while ((temp_phy_data2 == temp_phy_data)); + + /* wait for tracking to start to know f/w is good to go */ + counter = 0; + do { + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7fd); + temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + if (counter++ > 1000) { + nes_debug(NES_DBG_PHY, "AMCC PHY- breaking from status check \n"); + break; + } + mdelay(1000); + /* + * nes_debug(NES_DBG_PHY, "AMCC PHY- phy_status not ready yet = 0x%02X\n", + * temp_phy_data); + */ + } while (((temp_phy_data & 0xff) != 0x50) && ((temp_phy_data & 0xff) != 0x70)); + + /* set LOS Control invert RXLOSB_I_PADINV */ + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd003, 0x0000); + /* set LOS Control to mask of RXLOSB_I */ + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc314, 0x0042); + /* set LED1 to input mode (LED1 and LED2 share same LED) */ + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd006, 0x0007); + /* set LED2 to RX link_status and activity */ + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd007, 0x000A); + /* set LED3 to RX link_status */ + nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd008, 0x0009); + + /* + * reset the res-calibration on t2 serdes; + * ensures it is stable after the amcc phy is stable + */ + + sds_common_control0 = nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0); + sds_common_control0 |= 0x1; + nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0, sds_common_control0); + + /* release the res-calibration reset */ + sds_common_control0 &= 0xfffffffe; + nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0, sds_common_control0); + + i = 0; + while (((nes_read32(nesdev->regs+NES_SOFTWARE_RESET) & 0x00000040) != 0x00000040) + && (i++ < 5000)) { + /* mdelay(1); */ + } + + /* + * wait for link train done before moving on, + * or will get an interupt storm + */ + counter = 0; + do { + temp_phy_data = nes_read_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 + + (0x200 * (nesdev->mac_index & 1))); + if (counter++ > 1000) { + nes_debug(NES_DBG_PHY, "AMCC PHY- breaking from link train wait \n"); + break; + } + mdelay(1); + } while (((temp_phy_data & 0x0f1f0000) != 0x0f0f0000)); + } + } } return 0; } @@ -2107,6 +2216,8 @@ static void nes_process_mac_intr(struct u32 u32temp; u16 phy_data; u16 temp_phy_data; + u32 pcs_val = 0x0f0f0000; + u32 pcs_mask = 0x0f1f0000; spin_lock_irqsave(&nesadapter->phy_lock, flags); if (nesadapter->mac_sw_state[mac_number] != NES_MAC_SW_IDLE) { @@ -2170,13 +2281,30 @@ static void nes_process_mac_intr(struct nes_debug(NES_DBG_PHY, "Eth SERDES Common Status: 0=0x%08X, 1=0x%08X\n", nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_STATUS0), nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_STATUS0+0x200)); - pcs_control_status = nes_read_indexed(nesdev, - NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index&1)*0x200)); - pcs_control_status = nes_read_indexed(nesdev, - NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index&1)*0x200)); + + if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_PUMA_1G) { + switch (mac_index) { + case 1: + case 3: + pcs_control_status = nes_read_indexed(nesdev, + NES_IDX_PHY_PCS_CONTROL_STATUS0 + 0x200); + break; + default: + pcs_control_status = nes_read_indexed(nesdev, + NES_IDX_PHY_PCS_CONTROL_STATUS0); + break; + } + } else { + pcs_control_status = nes_read_indexed(nesdev, + NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index & 1) * 0x200)); + pcs_control_status = nes_read_indexed(nesdev, + NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index & 1) * 0x200)); + } + nes_debug(NES_DBG_PHY, "PCS PHY Control/Status%u: 0x%08X\n", mac_index, pcs_control_status); - if (nesadapter->OneG_Mode) { + if ((nesadapter->OneG_Mode) && + (nesadapter->phy_type[mac_index] != NES_PHY_TYPE_PUMA_1G)) { u32temp = 0x01010000; if (nesadapter->port_count > 2) { u32temp |= 0x02020000; @@ -2185,24 +2313,59 @@ static void nes_process_mac_intr(struct phy_data = 0; nes_debug(NES_DBG_PHY, "PCS says the link is down\n"); } - } else if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_IRIS) { - nes_read_10G_phy_reg(nesdev, 1, nesadapter->phy_index[mac_index]); - temp_phy_data = (u16)nes_read_indexed(nesdev, - NES_IDX_MAC_MDIO_CONTROL); - u32temp = 20; - do { - nes_read_10G_phy_reg(nesdev, 1, nesadapter->phy_index[mac_index]); - phy_data = (u16)nes_read_indexed(nesdev, - NES_IDX_MAC_MDIO_CONTROL); - if ((phy_data == temp_phy_data) || (!(--u32temp))) - break; - temp_phy_data = phy_data; - } while (1); - nes_debug(NES_DBG_PHY, "%s: Phy data = 0x%04X, link was %s.\n", - __func__, phy_data, nesadapter->mac_link_down ? "DOWN" : "UP"); - } else { - phy_data = (0x0f0f0000 == (pcs_control_status & 0x0f1f0000)) ? 4 : 0; + switch (nesadapter->phy_type[mac_index]) { + case NES_PHY_TYPE_IRIS: + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1); + temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + u32temp = 20; + do { + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1); + phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + if ((phy_data == temp_phy_data) || (!(--u32temp))) + break; + temp_phy_data = phy_data; + } while (1); + nes_debug(NES_DBG_PHY, "%s: Phy data = 0x%04X, link was %s.\n", + __func__, phy_data, nesadapter->mac_link_down[mac_index] ? "DOWN" : "UP"); + break; + + case NES_PHY_TYPE_ARGUS: + /* clear the alarms */ + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0x0008); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc001); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc002); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc005); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc006); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 0x9003); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 0x9004); + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 0x9005); + /* check link status */ + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1); + temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + u32temp = 100; + do { + nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1); + + phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); + if ((phy_data == temp_phy_data) || (!(--u32temp))) + break; + temp_phy_data = phy_data; + } while (1); + nes_debug(NES_DBG_PHY, "%s: Phy data = 0x%04X, link was %s.\n", + __func__, phy_data, nesadapter->mac_link_down ? "DOWN" : "UP"); + break; + + case NES_PHY_TYPE_PUMA_1G: + if (mac_index < 2) + pcs_val = pcs_mask = 0x01010000; + else + pcs_val = pcs_mask = 0x02020000; + /* fall through */ + default: + phy_data = (pcs_val == (pcs_control_status & pcs_mask)) ? 0x4 : 0x0; + break; + } } if (phy_data & 0x0004) { @@ -2211,8 +2374,8 @@ static void nes_process_mac_intr(struct nes_debug(NES_DBG_PHY, "The Link is UP!!. linkup was %d\n", nesvnic->linkup); if (nesvnic->linkup == 0) { - printk(PFX "The Link is now up for port %u, netdev %p.\n", - mac_index, nesvnic->netdev); + printk(PFX "The Link is now up for port %s, netdev %p.\n", + nesvnic->netdev->name, nesvnic->netdev); if (netif_queue_stopped(nesvnic->netdev)) netif_start_queue(nesvnic->netdev); nesvnic->linkup = 1; @@ -2225,8 +2388,8 @@ static void nes_process_mac_intr(struct nes_debug(NES_DBG_PHY, "The Link is Down!!. linkup was %d\n", nesvnic->linkup); if (nesvnic->linkup == 1) { - printk(PFX "The Link is now down for port %u, netdev %p.\n", - mac_index, nesvnic->netdev); + printk(PFX "The Link is now down for port %s, netdev %p.\n", + nesvnic->netdev->name, nesvnic->netdev); if (!(netif_queue_stopped(nesvnic->netdev))) netif_stop_queue(nesvnic->netdev); nesvnic->linkup = 0; diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h index 1363995..7d47f92 100644 --- a/drivers/infiniband/hw/nes/nes_hw.h +++ b/drivers/infiniband/hw/nes/nes_hw.h @@ -35,8 +35,10 @@ #define __NES_HW_H #include -#define NES_PHY_TYPE_1G 2 -#define NES_PHY_TYPE_IRIS 3 +#define NES_PHY_TYPE_1G 2 +#define NES_PHY_TYPE_IRIS 3 +#define NES_PHY_TYPE_ARGUS 4 +#define NES_PHY_TYPE_PUMA_1G 5 #define NES_PHY_TYPE_PUMA_10G 6 #define NES_MULTICAST_PF_MAX 8 diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c index 6998af0..d65a846 100644 --- a/drivers/infiniband/hw/nes/nes_nic.c +++ b/drivers/infiniband/hw/nes/nes_nic.c @@ -1377,21 +1377,29 @@ static int nes_netdev_get_settings(struc et_cmd->duplex = DUPLEX_FULL; et_cmd->port = PORT_MII; + if (nesadapter->OneG_Mode) { - et_cmd->supported = SUPPORTED_1000baseT_Full|SUPPORTED_Autoneg; - et_cmd->advertising = ADVERTISED_1000baseT_Full|ADVERTISED_Autoneg; et_cmd->speed = SPEED_1000; - nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[nesdev->mac_index], - &phy_data); - if (phy_data&0x1000) { - et_cmd->autoneg = AUTONEG_ENABLE; + if (nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_PUMA_1G) { + et_cmd->supported = SUPPORTED_1000baseT_Full; + et_cmd->advertising = ADVERTISED_1000baseT_Full; + et_cmd->autoneg = AUTONEG_DISABLE; + et_cmd->transceiver = XCVR_INTERNAL; + et_cmd->phy_address = nesdev->mac_index; } else { - et_cmd->autoneg = AUTONEG_DISABLE; + et_cmd->supported = SUPPORTED_1000baseT_Full | SUPPORTED_Autoneg; + et_cmd->advertising = ADVERTISED_1000baseT_Full | ADVERTISED_Autoneg; + nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[nesdev->mac_index], &phy_data); + if (phy_data & 0x1000) + et_cmd->autoneg = AUTONEG_ENABLE; + else + et_cmd->autoneg = AUTONEG_DISABLE; + et_cmd->transceiver = XCVR_EXTERNAL; + et_cmd->phy_address = nesadapter->phy_index[nesdev->mac_index]; } - et_cmd->transceiver = XCVR_EXTERNAL; - et_cmd->phy_address = nesadapter->phy_index[nesdev->mac_index]; } else { - if (nesadapter->phy_type[nesvnic->logical_port] == NES_PHY_TYPE_IRIS) { + if ((nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_IRIS) || + (nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_ARGUS)) { et_cmd->transceiver = XCVR_EXTERNAL; et_cmd->port = PORT_FIBRE; et_cmd->supported = SUPPORTED_FIBRE; @@ -1422,7 +1430,8 @@ static int nes_netdev_set_settings(struc struct nes_adapter *nesadapter = nesdev->nesadapter; u16 phy_data; - if (nesadapter->OneG_Mode) { + if ((nesadapter->OneG_Mode) && + (nesadapter->phy_type[nesdev->mac_index] != NES_PHY_TYPE_PUMA_1G)) { nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[nesdev->mac_index], &phy_data); if (et_cmd->autoneg) { @@ -1615,27 +1624,34 @@ struct net_device *nes_netdev_init(struc list_add_tail(&nesvnic->list, &nesdev->nesadapter->nesvnic_list[nesdev->mac_index]); if ((nesdev->netdev_count == 0) && - (PCI_FUNC(nesdev->pcidev->devfn) == nesdev->mac_index)) { - nes_debug(NES_DBG_INIT, "Setting up PHY interrupt mask. Using register index 0x%04X\n", - NES_IDX_PHY_PCS_CONTROL_STATUS0+(0x200*(nesvnic->logical_port&1))); + ((PCI_FUNC(nesdev->pcidev->devfn) == nesdev->mac_index) || + ((nesdev->nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_PUMA_1G) && + (((PCI_FUNC(nesdev->pcidev->devfn) == 1) && (nesdev->mac_index == 2)) || + ((PCI_FUNC(nesdev->pcidev->devfn) == 2) && (nesdev->mac_index == 1)))))) { + /* + * nes_debug(NES_DBG_INIT, "Setting up PHY interrupt mask. Using register index 0x%04X\n", + * NES_IDX_PHY_PCS_CONTROL_STATUS0 + (0x200 * (nesvnic->logical_port & 1))); + */ u32temp = nes_read_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 + - (0x200*(nesvnic->logical_port&1))); - u32temp |= 0x00200000; - nes_write_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 + - (0x200*(nesvnic->logical_port&1)), u32temp); + (0x200 * (nesdev->mac_index & 1))); + if (nesdev->nesadapter->phy_type[nesdev->mac_index] != NES_PHY_TYPE_PUMA_1G) { + u32temp |= 0x00200000; + nes_write_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 + + (0x200 * (nesdev->mac_index & 1)), u32temp); + } + u32temp = nes_read_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 + - (0x200*(nesvnic->logical_port&1)) ); + (0x200 * (nesdev->mac_index & 1))); + if ((u32temp&0x0f1f0000) == 0x0f0f0000) { - if (nesdev->nesadapter->phy_type[nesvnic->logical_port] == NES_PHY_TYPE_IRIS) { + if (nesdev->nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_IRIS) { nes_init_phy(nesdev); - nes_read_10G_phy_reg(nesdev, 1, - nesdev->nesadapter->phy_index[nesvnic->logical_port]); + nes_read_10G_phy_reg(nesdev, nesdev->nesadapter->phy_index[nesdev->mac_index], 1, 1); temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); u32temp = 20; do { - nes_read_10G_phy_reg(nesdev, 1, - nesdev->nesadapter->phy_index[nesvnic->logical_port]); + nes_read_10G_phy_reg(nesdev, nesdev->nesadapter->phy_index[nesdev->mac_index], 1, 1); phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL); if ((phy_data == temp_phy_data) || (!(--u32temp))) @@ -1652,6 +1668,14 @@ struct net_device *nes_netdev_init(struc nes_debug(NES_DBG_INIT, "The Link is UP!!.\n"); nesvnic->linkup = 1; } + } else if (nesdev->nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_PUMA_1G) { + nes_debug(NES_DBG_INIT, "mac_index=%d, logical_port=%d, u32temp=0x%04X, PCI_FUNC=%d\n", + nesdev->mac_index, nesvnic->logical_port, u32temp, PCI_FUNC(nesdev->pcidev->devfn)); + if (((nesdev->mac_index < 2) && ((u32temp&0x01010000) == 0x01010000)) || + ((nesdev->mac_index > 1) && ((u32temp&0x02020000) == 0x02020000))) { + nes_debug(NES_DBG_INIT, "The Link is UP!!.\n"); + nesvnic->linkup = 1; + } } /* clear the MAC interrupt status, assumes direct logical to physical mapping */ u32temp = nes_read_indexed(nesdev, NES_IDX_MAC_INT_STATUS + (0x200 * nesdev->mac_index)); diff --git a/drivers/infiniband/hw/nes/nes_utils.c b/drivers/infiniband/hw/nes/nes_utils.c index c6d5631..fe83d1b 100644 --- a/drivers/infiniband/hw/nes/nes_utils.c +++ b/drivers/infiniband/hw/nes/nes_utils.c @@ -444,15 +444,13 @@ void nes_read_1G_phy_reg(struct nes_devi /** * nes_write_10G_phy_reg */ -void nes_write_10G_phy_reg(struct nes_device *nesdev, u16 phy_reg, - u8 phy_addr, u16 data) +void nes_write_10G_phy_reg(struct nes_device *nesdev, u16 phy_addr, u8 dev_addr, u16 phy_reg, + u16 data) { - u32 dev_addr; u32 port_addr; u32 u32temp; u32 counter; - dev_addr = 1; port_addr = phy_addr; /* set address */ @@ -492,14 +490,12 @@ void nes_write_10G_phy_reg(struct nes_de * This routine only issues the read, the data must be read * separately. */ -void nes_read_10G_phy_reg(struct nes_device *nesdev, u16 phy_reg, u8 phy_addr) +void nes_read_10G_phy_reg(struct nes_device *nesdev, u8 phy_addr, u8 dev_addr, u16 phy_reg) { - u32 dev_addr; u32 port_addr; u32 u32temp; u32 counter; - dev_addr = 1; port_addr = phy_addr; /* set address */ From gstreiff at neteffect.com Tue Apr 29 13:25:01 2008 From: gstreiff at neteffect.com (Glenn Streiff) Date: Tue, 29 Apr 2008 15:25:01 -0500 Subject: [ofa-general] [ PATCH 3/3 v2 ] RDMA/nes Formatting cleanup Message-ID: <200804292025.m3TKP1im023075@velma.neteffect.com> Various cleanups: Change // to /* .. */ Place whitespace around binary operators. Trim down a few long lines. Some minor alignment formatting for better readability. Remove some silly tabs. Signed-off-by: Glenn Streiff --- Roland, this is the replacement patch for "RDMA/nes SFP+ cleanup". I've fixed the whitespace issue with the array indices and swept through a bit more code. Feelings will not be hurt if I still don't have it right...can always punt on this patch if necessary. Glenn drivers/infiniband/hw/nes/nes_cm.c | 8 +-- drivers/infiniband/hw/nes/nes_hw.c | 103 +++++++++++++++++---------------- drivers/infiniband/hw/nes/nes_hw.h | 2 - drivers/infiniband/hw/nes/nes_nic.c | 96 ++++++++++++++++--------------- drivers/infiniband/hw/nes/nes_verbs.c | 2 - 5 files changed, 109 insertions(+), 102 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index d940fc2..9a4b40f 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -594,7 +594,7 @@ static void nes_cm_timer_tick(unsigned l continue; } /* this seems like the correct place, but leave send entry unprotected */ - // spin_unlock_irqrestore(&cm_node->retrans_list_lock, flags); + /* spin_unlock_irqrestore(&cm_node->retrans_list_lock, flags); */ atomic_inc(&send_entry->skb->users); cm_packets_retrans++; nes_debug(NES_DBG_CM, "Retransmitting send_entry %p for node %p," @@ -1335,7 +1335,7 @@ static int process_packet(struct nes_cm_ cm_node->loc_addr, cm_node->loc_port, cm_node->rem_addr, cm_node->rem_port, cm_node->state, atomic_read(&cm_node->ref_count)); - // create event + /* create event */ cm_node->state = NES_CM_STATE_CLOSED; create_event(cm_node, NES_CM_EVENT_ABORTED); @@ -1669,7 +1669,7 @@ static struct nes_cm_node *mini_cm_conne if (!cm_node) return NULL; - // set our node side to client (active) side + /* set our node side to client (active) side */ cm_node->tcp_cntxt.client = 1; cm_node->tcp_cntxt.rcv_wscale = NES_CM_DEFAULT_RCV_WND_SCALE; @@ -1694,7 +1694,7 @@ static struct nes_cm_node *mini_cm_conne loopbackremotenode->mpa_frame_size = mpa_frame_size - sizeof(struct ietf_mpa_frame); - // we are done handling this state, set node to a TSA state + /* we are done handling this state, set node to a TSA state */ cm_node->state = NES_CM_STATE_TSA; cm_node->tcp_cntxt.rcv_nxt = loopbackremotenode->tcp_cntxt.loc_seq_num; loopbackremotenode->tcp_cntxt.rcv_nxt = cm_node->tcp_cntxt.loc_seq_num; diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index 0887ed5..1c02639 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -833,7 +833,7 @@ static void nes_init_csr_ne020(struct ne nes_write_indexed(nesdev, 0x00000900, 0x20000001); nes_write_indexed(nesdev, 0x000060C0, 0x0000028e); nes_write_indexed(nesdev, 0x000060C8, 0x00000020); - // + nes_write_indexed(nesdev, 0x000001EC, 0x7b2625a0); /* nes_write_indexed(nesdev, 0x000001EC, 0x5f2625a0); */ @@ -1229,7 +1229,7 @@ int nes_init_phy(struct nes_device *nesd nes_read_1G_phy_reg(nesdev, 1, nesadapter->phy_index[mac_index], &phy_data); nes_debug(NES_DBG_PHY, "Phy data from register 1 phy address %u = 0x%X.\n", nesadapter->phy_index[mac_index], phy_data); - nes_write_1G_phy_reg(nesdev, 23, nesadapter->phy_index[mac_index], 0xb000); + nes_write_1G_phy_reg(nesdev, 23, nesadapter->phy_index[mac_index], 0xb000); /* Reset the PHY */ nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], 0x8000); @@ -1363,7 +1363,7 @@ int nes_init_phy(struct nes_device *nesd * ensures it is stable after the amcc phy is stable */ - sds_common_control0 = nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0); + sds_common_control0 = nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0); sds_common_control0 |= 0x1; nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0, sds_common_control0); @@ -1372,7 +1372,7 @@ int nes_init_phy(struct nes_device *nesd nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0, sds_common_control0); i = 0; - while (((nes_read32(nesdev->regs+NES_SOFTWARE_RESET) & 0x00000040) != 0x00000040) + while (((nes_read32(nesdev->regs + NES_SOFTWARE_RESET) & 0x00000040) != 0x00000040) && (i++ < 5000)) { /* mdelay(1); */ } @@ -1649,10 +1649,10 @@ int nes_init_nic_qp(struct nes_device *n } u64temp = (u64)nesvnic->nic.sq_pbase; - nic_context->context_words[NES_NIC_CTX_SQ_LOW_IDX] = cpu_to_le32((u32)u64temp); + nic_context->context_words[NES_NIC_CTX_SQ_LOW_IDX] = cpu_to_le32((u32)u64temp); nic_context->context_words[NES_NIC_CTX_SQ_HIGH_IDX] = cpu_to_le32((u32)(u64temp >> 32)); u64temp = (u64)nesvnic->nic.rq_pbase; - nic_context->context_words[NES_NIC_CTX_RQ_LOW_IDX] = cpu_to_le32((u32)u64temp); + nic_context->context_words[NES_NIC_CTX_RQ_LOW_IDX] = cpu_to_le32((u32)u64temp); nic_context->context_words[NES_NIC_CTX_RQ_HIGH_IDX] = cpu_to_le32((u32)(u64temp >> 32)); cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = cpu_to_le32(NES_CQP_CREATE_QP | @@ -1704,7 +1704,7 @@ int nes_init_nic_qp(struct nes_device *n nic_rqe = &nesvnic->nic.rq_vbase[counter]; nic_rqe->wqe_words[NES_NIC_RQ_WQE_LENGTH_1_0_IDX] = cpu_to_le32(nesvnic->max_frame_size); nic_rqe->wqe_words[NES_NIC_RQ_WQE_LENGTH_3_2_IDX] = 0; - nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_LOW_IDX] = cpu_to_le32((u32)pmem); + nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_LOW_IDX] = cpu_to_le32((u32)pmem); nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_HIGH_IDX] = cpu_to_le32((u32)((u64)pmem >> 32)); nesvnic->nic.rx_skb[counter] = skb; } @@ -1728,13 +1728,13 @@ int nes_init_nic_qp(struct nes_device *n jumbomode = 1; nes_nic_init_timer_defaults(nesdev, jumbomode); } - nesvnic->lro_mgr.max_aggr = NES_LRO_MAX_AGGR; - nesvnic->lro_mgr.max_desc = NES_MAX_LRO_DESCRIPTORS; - nesvnic->lro_mgr.lro_arr = nesvnic->lro_desc; + nesvnic->lro_mgr.max_aggr = NES_LRO_MAX_AGGR; + nesvnic->lro_mgr.max_desc = NES_MAX_LRO_DESCRIPTORS; + nesvnic->lro_mgr.lro_arr = nesvnic->lro_desc; nesvnic->lro_mgr.get_skb_header = nes_lro_get_skb_hdr; - nesvnic->lro_mgr.features = LRO_F_NAPI | LRO_F_EXTRACT_VLAN_ID; - nesvnic->lro_mgr.dev = netdev; - nesvnic->lro_mgr.ip_summed = CHECKSUM_UNNECESSARY; + nesvnic->lro_mgr.features = LRO_F_NAPI | LRO_F_EXTRACT_VLAN_ID; + nesvnic->lro_mgr.dev = netdev; + nesvnic->lro_mgr.ip_summed = CHECKSUM_UNNECESSARY; nesvnic->lro_mgr.ip_summed_aggr = CHECKSUM_UNNECESSARY; return 0; } @@ -1755,8 +1755,8 @@ void nes_destroy_nic_qp(struct nes_vnic /* Free remaining NIC receive buffers */ while (nesvnic->nic.rq_head != nesvnic->nic.rq_tail) { - nic_rqe = &nesvnic->nic.rq_vbase[nesvnic->nic.rq_tail]; - wqe_frag = (u64)le32_to_cpu(nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_LOW_IDX]); + nic_rqe = &nesvnic->nic.rq_vbase[nesvnic->nic.rq_tail]; + wqe_frag = (u64)le32_to_cpu(nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_LOW_IDX]); wqe_frag |= ((u64)le32_to_cpu(nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_HIGH_IDX])) << 32; pci_unmap_single(nesdev->pcidev, (dma_addr_t)wqe_frag, nesvnic->max_frame_size, PCI_DMA_FROMDEVICE); @@ -1839,17 +1839,17 @@ int nes_napi_isr(struct nes_device *nesd /* iff NIC, process here, else wait for DPC */ if ((int_stat) && ((int_stat & 0x0000ff00) == int_stat)) { nesdev->napi_isr_ran = 0; - nes_write32(nesdev->regs+NES_INT_STAT, - (int_stat & - ~(NES_INT_INTF|NES_INT_TIMER|NES_INT_MAC0|NES_INT_MAC1|NES_INT_MAC2|NES_INT_MAC3))); + nes_write32(nesdev->regs + NES_INT_STAT, + (int_stat & + ~(NES_INT_INTF | NES_INT_TIMER | NES_INT_MAC0 | NES_INT_MAC1 | NES_INT_MAC2 | NES_INT_MAC3))); /* Process the CEQs */ nes_process_ceq(nesdev, &nesdev->nesadapter->ceq[nesdev->nic_ceq_index]); if (unlikely((((nesadapter->et_rx_coalesce_usecs_irq) && - (!nesadapter->et_use_adaptive_rx_coalesce)) || - ((nesadapter->et_use_adaptive_rx_coalesce) && - (nesdev->deepcq_count > nesadapter->et_pkt_rate_low)))) ) { + (!nesadapter->et_use_adaptive_rx_coalesce)) || + ((nesadapter->et_use_adaptive_rx_coalesce) && + (nesdev->deepcq_count > nesadapter->et_pkt_rate_low))))) { if ((nesdev->int_req & NES_INT_TIMER) == 0) { /* Enable Periodic timer interrupts */ nesdev->int_req |= NES_INT_TIMER; @@ -1927,12 +1927,12 @@ void nes_dpc(unsigned long param) } if (int_stat) { - if (int_stat & ~(NES_INT_INTF|NES_INT_TIMER|NES_INT_MAC0| - NES_INT_MAC1|NES_INT_MAC2|NES_INT_MAC3)) { + if (int_stat & ~(NES_INT_INTF | NES_INT_TIMER | NES_INT_MAC0| + NES_INT_MAC1|NES_INT_MAC2 | NES_INT_MAC3)) { /* Ack the interrupts */ nes_write32(nesdev->regs+NES_INT_STAT, - (int_stat & ~(NES_INT_INTF|NES_INT_TIMER|NES_INT_MAC0| - NES_INT_MAC1|NES_INT_MAC2|NES_INT_MAC3))); + (int_stat & ~(NES_INT_INTF | NES_INT_TIMER | NES_INT_MAC0| + NES_INT_MAC1 | NES_INT_MAC2 | NES_INT_MAC3))); } temp_int_stat = int_stat; @@ -1997,8 +1997,8 @@ void nes_dpc(unsigned long param) } } /* Don't use the interface interrupt bit stay in loop */ - int_stat &= ~NES_INT_INTF|NES_INT_TIMER|NES_INT_MAC0| - NES_INT_MAC1|NES_INT_MAC2|NES_INT_MAC3; + int_stat &= ~NES_INT_INTF | NES_INT_TIMER | NES_INT_MAC0 | + NES_INT_MAC1 | NES_INT_MAC2 | NES_INT_MAC3; } while ((int_stat != 0) && (loop_counter++ < MAX_DPC_ITERATIONS)); if (timer_ints == 1) { @@ -2009,9 +2009,9 @@ void nes_dpc(unsigned long param) nesdev->timer_only_int_count = 0; nesdev->int_req &= ~NES_INT_TIMER; nes_write32(nesdev->regs + NES_INTF_INT_MASK, ~(nesdev->intf_int_req)); - nes_write32(nesdev->regs+NES_INT_MASK, ~nesdev->int_req); + nes_write32(nesdev->regs + NES_INT_MASK, ~nesdev->int_req); } else { - nes_write32(nesdev->regs+NES_INT_MASK, 0x0000ffff|(~nesdev->int_req)); + nes_write32(nesdev->regs+NES_INT_MASK, 0x0000ffff | (~nesdev->int_req)); } } else { if (unlikely(nesadapter->et_use_adaptive_rx_coalesce)) @@ -2019,7 +2019,7 @@ void nes_dpc(unsigned long param) nes_nic_init_timer(nesdev); } nesdev->timer_only_int_count = 0; - nes_write32(nesdev->regs+NES_INT_MASK, 0x0000ffff|(~nesdev->int_req)); + nes_write32(nesdev->regs+NES_INT_MASK, 0x0000ffff | (~nesdev->int_req)); } } else { nesdev->timer_only_int_count = 0; @@ -2068,7 +2068,7 @@ static void nes_process_ceq(struct nes_d do { if (le32_to_cpu(ceq->ceq_vbase[head].ceqe_words[NES_CEQE_CQ_CTX_HIGH_IDX]) & NES_CEQE_VALID) { - u64temp = (((u64)(le32_to_cpu(ceq->ceq_vbase[head].ceqe_words[NES_CEQE_CQ_CTX_HIGH_IDX])))<<32) | + u64temp = (((u64)(le32_to_cpu(ceq->ceq_vbase[head].ceqe_words[NES_CEQE_CQ_CTX_HIGH_IDX]))) << 32) | ((u64)(le32_to_cpu(ceq->ceq_vbase[head].ceqe_words[NES_CEQE_CQ_CTX_LOW_IDX]))); u64temp <<= 1; cq = *((struct nes_hw_cq **)&u64temp); @@ -2096,7 +2096,7 @@ static void nes_process_ceq(struct nes_d */ static void nes_process_aeq(struct nes_device *nesdev, struct nes_hw_aeq *aeq) { -// u64 u64temp; + /* u64 u64temp; */ u32 head; u32 aeq_size; u32 aeqe_misc; @@ -2115,8 +2115,10 @@ static void nes_process_aeq(struct nes_d if (aeqe_misc & (NES_AEQE_QP|NES_AEQE_CQ)) { if (aeqe_cq_id >= NES_FIRST_QPN) { /* dealing with an accelerated QP related AE */ -// u64temp = (((u64)(le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_CTXT_HIGH_IDX])))<<32) | -// ((u64)(le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_CTXT_LOW_IDX]))); + /* + * u64temp = (((u64)(le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_CTXT_HIGH_IDX]))) << 32) | + * ((u64)(le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_CTXT_LOW_IDX]))); + */ nes_process_iwarp_aeqe(nesdev, (struct nes_hw_aeqe *)aeqe); } else { /* TODO: dealing with a CQP related AE */ @@ -2464,8 +2466,10 @@ void nes_nic_ce_handler(struct nes_devic /* bump past the vlan tag */ wqe_fragment_length++; if (le16_to_cpu(wqe_fragment_length[wqe_fragment_index]) != 0) { - u64temp = (u64) le32_to_cpu(nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG0_LOW_IDX+wqe_fragment_index*2]); - u64temp += ((u64)le32_to_cpu(nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG0_HIGH_IDX+wqe_fragment_index*2]))<<32; + u64temp = (u64) le32_to_cpu(nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG0_LOW_IDX + + wqe_fragment_index * 2]); + u64temp += ((u64)le32_to_cpu(nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG0_HIGH_IDX + + wqe_fragment_index * 2])) << 32; bus_address = (dma_addr_t)u64temp; if (test_and_clear_bit(nesnic->sq_tail, nesnic->first_frag_overflow)) { pci_unmap_single(nesdev->pcidev, @@ -2475,8 +2479,10 @@ void nes_nic_ce_handler(struct nes_devic } for (; wqe_fragment_index < 5; wqe_fragment_index++) { if (wqe_fragment_length[wqe_fragment_index]) { - u64temp = le32_to_cpu(nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG0_LOW_IDX+wqe_fragment_index*2]); - u64temp += ((u64)le32_to_cpu(nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG0_HIGH_IDX+wqe_fragment_index*2]))<<32; + u64temp = le32_to_cpu(nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG0_LOW_IDX + + wqe_fragment_index * 2]); + u64temp += ((u64)le32_to_cpu(nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG0_HIGH_IDX + + wqe_fragment_index * 2])) <<32; bus_address = (dma_addr_t)u64temp; pci_unmap_page(nesdev->pcidev, bus_address, @@ -2523,7 +2529,7 @@ void nes_nic_ce_handler(struct nes_devic if (atomic_read(&nesvnic->rx_skbs_needed) > (nesvnic->nic.rq_size>>1)) { nes_write32(nesdev->regs+NES_CQE_ALLOC, cq->cq_number | (cqe_count << 16)); -// nesadapter->tune_timer.cq_count += cqe_count; + /* nesadapter->tune_timer.cq_count += cqe_count; */ nesdev->currcq_count += cqe_count; cqe_count = 0; nes_replenish_nic_rq(nesvnic); @@ -2598,7 +2604,7 @@ void nes_nic_ce_handler(struct nes_devic /* Replenish Nic CQ */ nes_write32(nesdev->regs+NES_CQE_ALLOC, cq->cq_number | (cqe_count << 16)); -// nesdev->nesadapter->tune_timer.cq_count += cqe_count; + /* nesdev->nesadapter->tune_timer.cq_count += cqe_count; */ nesdev->currcq_count += cqe_count; cqe_count = 0; } @@ -2626,7 +2632,7 @@ void nes_nic_ce_handler(struct nes_devic cq->cqe_allocs_pending = cqe_count; if (unlikely(nesadapter->et_use_adaptive_rx_coalesce)) { -// nesdev->nesadapter->tune_timer.cq_count += cqe_count; + /* nesdev->nesadapter->tune_timer.cq_count += cqe_count; */ nesdev->currcq_count += cqe_count; nes_nic_tune_timer(nesdev); } @@ -2661,7 +2667,7 @@ static void nes_cqp_ce_handler(struct ne if (le32_to_cpu(cq->cq_vbase[head].cqe_words[NES_CQE_OPCODE_IDX]) & NES_CQE_VALID) { u64temp = (((u64)(le32_to_cpu(cq->cq_vbase[head]. - cqe_words[NES_CQE_COMP_COMP_CTX_HIGH_IDX])))<<32) | + cqe_words[NES_CQE_COMP_COMP_CTX_HIGH_IDX]))) << 32) | ((u64)(le32_to_cpu(cq->cq_vbase[head]. cqe_words[NES_CQE_COMP_COMP_CTX_LOW_IDX]))); cqp = *((struct nes_hw_cqp **)&u64temp); @@ -2678,7 +2684,7 @@ static void nes_cqp_ce_handler(struct ne } u64temp = (((u64)(le32_to_cpu(nesdev->cqp.sq_vbase[cqp->sq_tail]. - wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX])))<<32) | + wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX]))) << 32) | ((u64)(le32_to_cpu(nesdev->cqp.sq_vbase[cqp->sq_tail]. wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX]))); cqp_request = *((struct nes_cqp_request **)&u64temp); @@ -2715,7 +2721,7 @@ static void nes_cqp_ce_handler(struct ne } else { nes_debug(NES_DBG_CQP, "CQP request %p (opcode 0x%02X) freed.\n", cqp_request, - le32_to_cpu(cqp_request->cqp_wqe.wqe_words[NES_CQP_WQE_OPCODE_IDX])&0x3f); + le32_to_cpu(cqp_request->cqp_wqe.wqe_words[NES_CQP_WQE_OPCODE_IDX]) & 0x3f); if (cqp_request->dynamic) { kfree(cqp_request); } else { @@ -2729,7 +2735,7 @@ static void nes_cqp_ce_handler(struct ne } cq->cq_vbase[head].cqe_words[NES_CQE_OPCODE_IDX] = 0; - nes_write32(nesdev->regs+NES_CQE_ALLOC, cq->cq_number | (1 << 16)); + nes_write32(nesdev->regs + NES_CQE_ALLOC, cq->cq_number | (1 << 16)); if (++cqp->sq_tail >= cqp->sq_size) cqp->sq_tail = 0; @@ -2798,13 +2804,13 @@ static void nes_process_iwarp_aeqe(struc nes_debug(NES_DBG_AEQ, "\n"); aeq_info = le32_to_cpu(aeqe->aeqe_words[NES_AEQE_MISC_IDX]); if ((NES_AEQE_INBOUND_RDMA&aeq_info) || (!(NES_AEQE_QP&aeq_info))) { - context = le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_CTXT_LOW_IDX]); + context = le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_CTXT_LOW_IDX]); context += ((u64)le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_CTXT_HIGH_IDX])) << 32; } else { aeqe_context = le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_CTXT_LOW_IDX]); aeqe_context += ((u64)le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_CTXT_HIGH_IDX])) << 32; context = (unsigned long)nesadapter->qp_table[le32_to_cpu( - aeqe->aeqe_words[NES_AEQE_COMP_QP_CQ_ID_IDX])-NES_FIRST_QPN]; + aeqe->aeqe_words[NES_AEQE_COMP_QP_CQ_ID_IDX]) - NES_FIRST_QPN]; BUG_ON(!context); } @@ -2817,7 +2823,6 @@ static void nes_process_iwarp_aeqe(struc le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_QP_CQ_ID_IDX]), aeqe, nes_tcp_state_str[tcp_state], nes_iwarp_state_str[iwarp_state]); - switch (async_event_id) { case NES_AEQE_AEID_LLP_FIN_RECEIVED: nesqp = *((struct nes_qp **)&context); @@ -3221,7 +3226,7 @@ void nes_manage_arp_cache(struct net_dev cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= cpu_to_le32(NES_CQP_ARP_VALID); cqp_wqe->wqe_words[NES_CQP_ARP_WQE_MAC_ADDR_LOW_IDX] = cpu_to_le32( (((u32)mac_addr[2]) << 24) | (((u32)mac_addr[3]) << 16) | - (((u32)mac_addr[4]) << 8) | (u32)mac_addr[5]); + (((u32)mac_addr[4]) << 8) | (u32)mac_addr[5]); cqp_wqe->wqe_words[NES_CQP_ARP_WQE_MAC_HIGH_IDX] = cpu_to_le32( (((u32)mac_addr[0]) << 16) | (u32)mac_addr[1]); } else { diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h index 7d47f92..6e58c44 100644 --- a/drivers/infiniband/hw/nes/nes_hw.h +++ b/drivers/infiniband/hw/nes/nes_hw.h @@ -969,7 +969,7 @@ #define DEFAULT_JUMBO_NES_QL_HIGH 128 #define NES_NIC_CQ_DOWNWARD_TREND 16 struct nes_hw_tune_timer { - //u16 cq_count; + /* u16 cq_count; */ u16 threshold_low; u16 threshold_target; u16 threshold_high; diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c index d65a846..1b0938c 100644 --- a/drivers/infiniband/hw/nes/nes_nic.c +++ b/drivers/infiniband/hw/nes/nes_nic.c @@ -185,12 +185,13 @@ static int nes_netdev_open(struct net_de nic_active |= nic_active_bit; nes_write_indexed(nesdev, NES_IDX_NIC_BROADCAST_ON, nic_active); - macaddr_high = ((u16)netdev->dev_addr[0]) << 8; + macaddr_high = ((u16)netdev->dev_addr[0]) << 8; macaddr_high += (u16)netdev->dev_addr[1]; - macaddr_low = ((u32)netdev->dev_addr[2]) << 24; - macaddr_low += ((u32)netdev->dev_addr[3]) << 16; - macaddr_low += ((u32)netdev->dev_addr[4]) << 8; - macaddr_low += (u32)netdev->dev_addr[5]; + + macaddr_low = ((u32)netdev->dev_addr[2]) << 24; + macaddr_low += ((u32)netdev->dev_addr[3]) << 16; + macaddr_low += ((u32)netdev->dev_addr[4]) << 8; + macaddr_low += (u32)netdev->dev_addr[5]; /* Program the various MAC regs */ for (i = 0; i < NES_MAX_PORT_COUNT; i++) { @@ -451,7 +452,7 @@ #define NES_MAX_TSO_FRAGS 18 __le16 *wqe_fragment_length; u32 nr_frags; u32 original_first_length; -// u64 *wqe_fragment_address; + /* u64 *wqe_fragment_address; */ /* first fragment (0) is used by copy buffer */ u16 wqe_fragment_index=1; u16 hoffset; @@ -461,11 +462,12 @@ #define NES_MAX_TSO_FRAGS 18 u32 old_head; u32 wqe_misc; - /* nes_debug(NES_DBG_NIC_TX, "%s Request to tx NIC packet length %u, headlen %u," - " (%u frags), tso_size=%u\n", - netdev->name, skb->len, skb_headlen(skb), - skb_shinfo(skb)->nr_frags, skb_is_gso(skb)); - */ + /* + * nes_debug(NES_DBG_NIC_TX, "%s Request to tx NIC packet length %u, headlen %u," + * " (%u frags), tso_size=%u\n", + * netdev->name, skb->len, skb_headlen(skb), + * skb_shinfo(skb)->nr_frags, skb_is_gso(skb)); + */ if (!netif_carrier_ok(netdev)) return NETDEV_TX_OK; @@ -795,12 +797,12 @@ static int nes_netdev_set_mac_address(st memcpy(netdev->dev_addr, mac_addr->sa_data, netdev->addr_len); printk(PFX "%s: Address length = %d, Address = %s\n", __func__, netdev->addr_len, print_mac(mac, mac_addr->sa_data)); - macaddr_high = ((u16)netdev->dev_addr[0]) << 8; + macaddr_high = ((u16)netdev->dev_addr[0]) << 8; macaddr_high += (u16)netdev->dev_addr[1]; - macaddr_low = ((u32)netdev->dev_addr[2]) << 24; - macaddr_low += ((u32)netdev->dev_addr[3]) << 16; - macaddr_low += ((u32)netdev->dev_addr[4]) << 8; - macaddr_low += (u32)netdev->dev_addr[5]; + macaddr_low = ((u32)netdev->dev_addr[2]) << 24; + macaddr_low += ((u32)netdev->dev_addr[3]) << 16; + macaddr_low += ((u32)netdev->dev_addr[4]) << 8; + macaddr_low += (u32)netdev->dev_addr[5]; for (i = 0; i < NES_MAX_PORT_COUNT; i++) { if (nesvnic->qp_nic_index[i] == 0xf) { @@ -881,12 +883,12 @@ static void nes_netdev_set_multicast_lis print_mac(mac, multicast_addr->dmi_addr), perfect_filter_register_address+(mc_index * 8), mc_nic_index); - macaddr_high = ((u16)multicast_addr->dmi_addr[0]) << 8; + macaddr_high = ((u16)multicast_addr->dmi_addr[0]) << 8; macaddr_high += (u16)multicast_addr->dmi_addr[1]; - macaddr_low = ((u32)multicast_addr->dmi_addr[2]) << 24; - macaddr_low += ((u32)multicast_addr->dmi_addr[3]) << 16; - macaddr_low += ((u32)multicast_addr->dmi_addr[4]) << 8; - macaddr_low += (u32)multicast_addr->dmi_addr[5]; + macaddr_low = ((u32)multicast_addr->dmi_addr[2]) << 24; + macaddr_low += ((u32)multicast_addr->dmi_addr[3]) << 16; + macaddr_low += ((u32)multicast_addr->dmi_addr[4]) << 8; + macaddr_low += (u32)multicast_addr->dmi_addr[5]; nes_write_indexed(nesdev, perfect_filter_register_address+(mc_index * 8), macaddr_low); @@ -910,23 +912,23 @@ static void nes_netdev_set_multicast_lis /** * nes_netdev_change_mtu */ -static int nes_netdev_change_mtu(struct net_device *netdev, int new_mtu) +static int nes_netdev_change_mtu(struct net_device *netdev, int new_mtu) { struct nes_vnic *nesvnic = netdev_priv(netdev); - struct nes_device *nesdev = nesvnic->nesdev; - int ret = 0; - u8 jumbomode=0; + struct nes_device *nesdev = nesvnic->nesdev; + int ret = 0; + u8 jumbomode = 0; - if ((new_mtu < ETH_ZLEN) || (new_mtu > max_mtu)) + if ((new_mtu < ETH_ZLEN) || (new_mtu > max_mtu)) return -EINVAL; - netdev->mtu = new_mtu; + netdev->mtu = new_mtu; nesvnic->max_frame_size = new_mtu + VLAN_ETH_HLEN; if (netdev->mtu > 1500) { jumbomode=1; } - nes_nic_init_timer_defaults(nesdev, jumbomode); + nes_nic_init_timer_defaults(nesdev, jumbomode); if (netif_running(netdev)) { nes_netdev_stop(netdev); @@ -1225,14 +1227,14 @@ static int nes_netdev_set_coalesce(struc struct ethtool_coalesce *et_coalesce) { struct nes_vnic *nesvnic = netdev_priv(netdev); - struct nes_device *nesdev = nesvnic->nesdev; + struct nes_device *nesdev = nesvnic->nesdev; struct nes_adapter *nesadapter = nesdev->nesadapter; struct nes_hw_tune_timer *shared_timer = &nesadapter->tune_timer; unsigned long flags; - spin_lock_irqsave(&nesadapter->periodic_timer_lock, flags); + spin_lock_irqsave(&nesadapter->periodic_timer_lock, flags); if (et_coalesce->rx_max_coalesced_frames_low) { - shared_timer->threshold_low = et_coalesce->rx_max_coalesced_frames_low; + shared_timer->threshold_low = et_coalesce->rx_max_coalesced_frames_low; } if (et_coalesce->rx_max_coalesced_frames_irq) { shared_timer->threshold_target = et_coalesce->rx_max_coalesced_frames_irq; @@ -1252,14 +1254,14 @@ static int nes_netdev_set_coalesce(struc nesadapter->et_rx_coalesce_usecs_irq = et_coalesce->rx_coalesce_usecs_irq; if (et_coalesce->use_adaptive_rx_coalesce) { nesadapter->et_use_adaptive_rx_coalesce = 1; - nesadapter->timer_int_limit = NES_TIMER_INT_LIMIT_DYNAMIC; + nesadapter->timer_int_limit = NES_TIMER_INT_LIMIT_DYNAMIC; nesadapter->et_rx_coalesce_usecs_irq = 0; if (et_coalesce->pkt_rate_low) { - nesadapter->et_pkt_rate_low = et_coalesce->pkt_rate_low; + nesadapter->et_pkt_rate_low = et_coalesce->pkt_rate_low; } } else { nesadapter->et_use_adaptive_rx_coalesce = 0; - nesadapter->timer_int_limit = NES_TIMER_INT_LIMIT; + nesadapter->timer_int_limit = NES_TIMER_INT_LIMIT; if (nesadapter->et_rx_coalesce_usecs_irq) { nes_write32(nesdev->regs+NES_PERIODIC_CONTROL, 0x80000000 | ((u32)(nesadapter->et_rx_coalesce_usecs_irq*8))); @@ -1276,28 +1278,28 @@ static int nes_netdev_get_coalesce(struc struct ethtool_coalesce *et_coalesce) { struct nes_vnic *nesvnic = netdev_priv(netdev); - struct nes_device *nesdev = nesvnic->nesdev; + struct nes_device *nesdev = nesvnic->nesdev; struct nes_adapter *nesadapter = nesdev->nesadapter; struct ethtool_coalesce temp_et_coalesce; struct nes_hw_tune_timer *shared_timer = &nesadapter->tune_timer; unsigned long flags; memset(&temp_et_coalesce, 0, sizeof(temp_et_coalesce)); - temp_et_coalesce.rx_coalesce_usecs_irq = nesadapter->et_rx_coalesce_usecs_irq; - temp_et_coalesce.use_adaptive_rx_coalesce = nesadapter->et_use_adaptive_rx_coalesce; - temp_et_coalesce.rate_sample_interval = nesadapter->et_rate_sample_interval; + temp_et_coalesce.rx_coalesce_usecs_irq = nesadapter->et_rx_coalesce_usecs_irq; + temp_et_coalesce.use_adaptive_rx_coalesce = nesadapter->et_use_adaptive_rx_coalesce; + temp_et_coalesce.rate_sample_interval = nesadapter->et_rate_sample_interval; temp_et_coalesce.pkt_rate_low = nesadapter->et_pkt_rate_low; spin_lock_irqsave(&nesadapter->periodic_timer_lock, flags); - temp_et_coalesce.rx_max_coalesced_frames_low = shared_timer->threshold_low; - temp_et_coalesce.rx_max_coalesced_frames_irq = shared_timer->threshold_target; + temp_et_coalesce.rx_max_coalesced_frames_low = shared_timer->threshold_low; + temp_et_coalesce.rx_max_coalesced_frames_irq = shared_timer->threshold_target; temp_et_coalesce.rx_max_coalesced_frames_high = shared_timer->threshold_high; - temp_et_coalesce.rx_coalesce_usecs_low = shared_timer->timer_in_use_min; + temp_et_coalesce.rx_coalesce_usecs_low = shared_timer->timer_in_use_min; temp_et_coalesce.rx_coalesce_usecs_high = shared_timer->timer_in_use_max; if (nesadapter->et_use_adaptive_rx_coalesce) { temp_et_coalesce.rx_coalesce_usecs_irq = shared_timer->timer_in_use; } spin_unlock_irqrestore(&nesadapter->periodic_timer_lock, flags); - memcpy(et_coalesce, &temp_et_coalesce, sizeof(*et_coalesce)); + memcpy(et_coalesce, &temp_et_coalesce, sizeof(*et_coalesce)); return 0; } @@ -1376,7 +1378,7 @@ static int nes_netdev_get_settings(struc u16 phy_data; et_cmd->duplex = DUPLEX_FULL; - et_cmd->port = PORT_MII; + et_cmd->port = PORT_MII; if (nesadapter->OneG_Mode) { et_cmd->speed = SPEED_1000; @@ -1401,13 +1403,13 @@ static int nes_netdev_get_settings(struc if ((nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_IRIS) || (nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_ARGUS)) { et_cmd->transceiver = XCVR_EXTERNAL; - et_cmd->port = PORT_FIBRE; - et_cmd->supported = SUPPORTED_FIBRE; + et_cmd->port = PORT_FIBRE; + et_cmd->supported = SUPPORTED_FIBRE; et_cmd->advertising = ADVERTISED_FIBRE; et_cmd->phy_address = nesadapter->phy_index[nesdev->mac_index]; } else { et_cmd->transceiver = XCVR_INTERNAL; - et_cmd->supported = SUPPORTED_10000baseT_Full; + et_cmd->supported = SUPPORTED_10000baseT_Full; et_cmd->advertising = ADVERTISED_10000baseT_Full; et_cmd->phy_address = nesdev->mac_index; } @@ -1438,7 +1440,7 @@ static int nes_netdev_set_settings(struc /* Turn on Full duplex, Autoneg, and restart autonegotiation */ phy_data |= 0x1300; } else { - // Turn off autoneg + /* Turn off autoneg */ phy_data &= ~0x1000; } nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[nesdev->mac_index], diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c index ee74f7c..3436430 100644 --- a/drivers/infiniband/hw/nes/nes_verbs.c +++ b/drivers/infiniband/hw/nes/nes_verbs.c @@ -1266,7 +1266,7 @@ static struct ib_qp *nes_create_qp(struc sq_size = init_attr->cap.max_send_wr; rq_size = init_attr->cap.max_recv_wr; - // check if the encoded sizes are OK or not... + /* check if the encoded sizes are OK or not... */ sq_encoded_size = nes_get_encoded_size(&sq_size); rq_encoded_size = nes_get_encoded_size(&rq_size); From rdreier at cisco.com Tue Apr 29 13:32:19 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 29 Apr 2008 13:32:19 -0700 Subject: [ofa-general] Re: [ PATCH 3/3 v2 ] RDMA/nes Formatting cleanup In-Reply-To: <200804292025.m3TKP1im023075@velma.neteffect.com> (Glenn Streiff's message of "Tue, 29 Apr 2008 15:25:01 -0500") References: <200804292025.m3TKP1im023075@velma.neteffect.com> Message-ID: All looks fine, I applied all three of your patches. Thanks From rdreier at cisco.com Tue Apr 29 13:57:33 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 29 Apr 2008 13:57:33 -0700 Subject: [ofa-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will get a last batch of changes before 2.6.26-rc1: Eli Cohen (2): IPoIB: Use separate CQ for UD send completions IPoIB: Copy child MTU from parent Eli Dorfman (2): IB/iser: Move high-volume debug output to higher debug level IB/iser: Count FMR alignment violations per session Eric Schneider (1): RDMA/nes: Add support for SFP+ PHY Faisal Latif (1): RDMA/nes: Use LRO Glenn Streiff (1): RDMA/nes: Formatting cleanup Hoang-Nam Nguyen (1): IB/ehca: handle negative return value from ibmebus_request_irq() properly Olaf Kirch (2): mlx4_core: Avoid recycling old FMR R_Keys too soon IB/mthca: Avoid recycling old FMR R_Keys too soon Roland Dreier (1): IB/mthca: Avoid changing userspace ABI to handle DMA write barrier attribute Stefan Roscher (1): IB/ehca: Allocate event queue size depending on max number of CQs and QPs Steve Wise (3): RDMA/cxgb3: Correctly serialize peer abort path RDMA/cxgb3: Set the max_mr_size device attribute correctly RDMA/cxgb3: Support peer-2-peer connection setup Yevgeny Petrilin (1): mlx4_core: Add a way to set the "collapsed" CQ flag drivers/infiniband/hw/cxgb3/cxio_hal.c | 18 ++- drivers/infiniband/hw/cxgb3/cxio_hal.h | 1 + drivers/infiniband/hw/cxgb3/cxio_wr.h | 21 ++- drivers/infiniband/hw/cxgb3/iwch.c | 1 + drivers/infiniband/hw/cxgb3/iwch.h | 1 + drivers/infiniband/hw/cxgb3/iwch_cm.c | 167 ++++++++---- drivers/infiniband/hw/cxgb3/iwch_cm.h | 2 + drivers/infiniband/hw/cxgb3/iwch_provider.c | 2 +- drivers/infiniband/hw/cxgb3/iwch_provider.h | 3 + drivers/infiniband/hw/cxgb3/iwch_qp.c | 60 ++++- drivers/infiniband/hw/ehca/ehca_classes.h | 5 + drivers/infiniband/hw/ehca/ehca_cq.c | 11 + drivers/infiniband/hw/ehca/ehca_eq.c | 35 ++-- drivers/infiniband/hw/ehca/ehca_main.c | 36 +++- drivers/infiniband/hw/ehca/ehca_qp.c | 26 ++- drivers/infiniband/hw/mlx4/cq.c | 2 +- drivers/infiniband/hw/mthca/mthca_mr.c | 13 - drivers/infiniband/hw/mthca/mthca_provider.c | 14 +- drivers/infiniband/hw/mthca/mthca_provider.h | 1 + drivers/infiniband/hw/mthca/mthca_user.h | 10 +- drivers/infiniband/hw/nes/Kconfig | 1 + drivers/infiniband/hw/nes/nes.c | 4 + drivers/infiniband/hw/nes/nes.h | 5 +- drivers/infiniband/hw/nes/nes_cm.c | 8 +- drivers/infiniband/hw/nes/nes_hw.c | 371 ++++++++++++++++++++------ drivers/infiniband/hw/nes/nes_hw.h | 19 +- drivers/infiniband/hw/nes/nes_nic.c | 180 ++++++++----- drivers/infiniband/hw/nes/nes_utils.c | 10 +- drivers/infiniband/hw/nes/nes_verbs.c | 2 +- drivers/infiniband/ulp/ipoib/ipoib.h | 7 +- drivers/infiniband/ulp/ipoib/ipoib_cm.c | 8 +- drivers/infiniband/ulp/ipoib/ipoib_ethtool.c | 2 +- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 45 ++-- drivers/infiniband/ulp/ipoib/ipoib_main.c | 3 +- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 39 ++- drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 3 + drivers/infiniband/ulp/iser/iscsi_iser.c | 4 +- drivers/infiniband/ulp/iser/iscsi_iser.h | 7 + drivers/infiniband/ulp/iser/iser_memory.c | 9 +- drivers/net/cxgb3/version.h | 2 +- drivers/net/mlx4/cq.c | 4 +- drivers/net/mlx4/mr.c | 6 - include/linux/mlx4/device.h | 3 +- include/scsi/libiscsi.h | 1 + 44 files changed, 845 insertions(+), 327 deletions(-) From rdreier at cisco.com Tue Apr 29 14:41:48 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 29 Apr 2008 14:41:48 -0700 Subject: [ofa-general] Re: [PATCH v2] IB/ipoib: Split CQs for IPOIB UD In-Reply-To: (Roland Dreier's message of "Tue, 29 Apr 2008 11:10:32 -0700") References: <1209370487.11248.1.camel@mtls03> Message-ID: Umm... a little late now that I asked Linus to pull but I realized that this patch is somewhat buggy by design: You make the send CQ unsignaled, so you never get TX completion events. But this means that if the send queue ever fills up, we'll do netif_stop_queue() and then never reap a TX completion to restart the queue... I guess if we do netif_stop_queue() then we had better start a timer or something to kick us sometime in the future. Or we could request an event for the send CQ only when the send queue is full. But polling from either a timer or CQ event leads to locking issues against polling from the send path... - R. From rdreier at cisco.com Tue Apr 29 14:49:37 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 29 Apr 2008 14:49:37 -0700 Subject: [ofa-general] Re: [PATCH v2] IB/ipoib: Split CQs for IPOIB UD In-Reply-To: (Roland Dreier's message of "Tue, 29 Apr 2008 14:41:48 -0700") References: <1209370487.11248.1.camel@mtls03> Message-ID: By the way, this isn't just theoretical -- I'm not smart enough to realize this except that I just saw: ib1: TX ring full, stopping kernel net queue NETDEV WATCHDOG: ib1: transmit timed out ib1: transmit timeout: latency 1240 msecs ib1: queue stopped 1, tx_head 5291313, tx_tail 5291255 and of course it never recovers. From akepner at sgi.com Tue Apr 29 15:16:22 2008 From: akepner at sgi.com (akepner at sgi.com) Date: Tue, 29 Apr 2008 15:16:22 -0700 Subject: IPoIB - "TX ring full" (was: Re: [ofa-general] Re: [PATCH v2] IB/ipoib: Split CQs for IPOIB UD) In-Reply-To: References: <1209370487.11248.1.camel@mtls03> Message-ID: <20080429221622.GL30919@sgi.com> On Tue, Apr 29, 2008 at 02:49:37PM -0700, Roland Dreier wrote: > By the way, this isn't just theoretical -- I'm not smart enough to > realize this except that I just saw: > > ib1: TX ring full, stopping kernel net queue > NETDEV WATCHDOG: ib1: transmit timed out > ib1: transmit timeout: latency 1240 msecs > ib1: queue stopped 1, tx_head 5291313, tx_tail 5291255 > It's very interesting to me that you mention this. I'm in the midst of debugging a similar problem, but with IPoIB circa OFED 1.2. Found 2 problems: 1) In connected mode it's possible to get into a situation where one (or more) IPoIB-CM send queues fill up (no completions ever happen for them for some reason), while all the other CM send queues are empty. Of course the empty TX queues don't generate completions either, so nothing ever restarts the xmit queue and one bad connection kills IPoIB. We have had IPoIB stuck "forever" in this situation. Simple, brutal fix is to do ipoib_flush_paths() in ipoib_timeout(). 2) We also see situations very similar to what you describe above. The IPoIB-UD send queue fills and never restarts. (Of course it's nothing to do with the patch that was being discussed in this thread, this is with OFED 1.2-rc2, and also OFED 1.2.) I don't see how case (2) is possible with circa OFED 1.2 code. Can anyone clue me in? -- Arthur From arlin.r.davis at intel.com Tue Apr 29 19:45:27 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Tue, 29 Apr 2008 19:45:27 -0700 Subject: [ofa-general] [PATCH 1/1][dat1.2] dat: cleanup error handling with static registry parsing of dat.conf Message-ID: <000001c8aa6c$41658020$daba020a@amr.corp.intel.com> change asserts to return codes, add log messages, and report errors via open instead of asserts during dat library load. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dat/udat/linux/dat_osd.c | 2 +- dat/udat/udat_sr_parser.c | 382 +++++++++++++++++---------------------------- 2 files changed, 144 insertions(+), 240 deletions(-) diff --git a/dat/udat/linux/dat_osd.c b/dat/udat/linux/dat_osd.c index d6a5747..e1725e5 100644 --- a/dat/udat/linux/dat_osd.c +++ b/dat/udat/linux/dat_osd.c @@ -76,7 +76,7 @@ typedef enum * * *********************************************************************/ -static DAT_OS_DBG_TYPE_VAL g_dbg_type = 0; +static DAT_OS_DBG_TYPE_VAL g_dbg_type = DAT_OS_DBG_TYPE_ERROR; static DAT_OS_DBG_DEST g_dbg_dest = DAT_OS_DBG_DEST_STDOUT; diff --git a/dat/udat/udat_sr_parser.c b/dat/udat/udat_sr_parser.c index 64c4114..3959268 100644 --- a/dat/udat/udat_sr_parser.c +++ b/dat/udat/udat_sr_parser.c @@ -293,7 +293,7 @@ dat_sr_load (void) sr_file = dat_os_fopen (sr_path); if ( sr_file == NULL ) { - return DAT_INTERNAL_ERROR; + goto bail; } for (;;) @@ -308,17 +308,22 @@ dat_sr_load (void) } else { - dat_os_assert (!"unable to parse static registry file"); - break; + goto cleanup; } } - if ( 0 != dat_os_fclose (sr_file) ) - { - return DAT_INTERNAL_ERROR; - } + if (0 != dat_os_fclose (sr_file)) + goto bail; return DAT_SUCCESS; + +cleanup: + dat_os_fclose(sr_file); +bail: + dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, + "ERROR: unable to parse static registry file, dat.conf\n"); + return DAT_INTERNAL_ERROR; + } @@ -570,33 +575,22 @@ dat_sr_parse_ia_name ( DAT_SR_CONF_ENTRY *entry) { DAT_SR_TOKEN token; - DAT_RETURN status; - if ( DAT_SUCCESS != dat_sr_get_token (file, &token) ) - { - return DAT_INTERNAL_ERROR; - } + if (DAT_SUCCESS != dat_sr_get_token (file, &token)) + goto bail; - if ( DAT_SR_TOKEN_STRING != token.type ) - { - status = DAT_INTERNAL_ERROR; - } - else - { - entry->ia_name = token.value; - - status = DAT_SUCCESS; - } - - if ( DAT_SUCCESS != status ) - { - DAT_RETURN status_success; - - status_success = dat_sr_put_token (file, &token); - dat_os_assert ( DAT_SUCCESS == status_success); + if (DAT_SR_TOKEN_STRING != token.type) { + dat_sr_put_token (file, &token); + goto bail; } + entry->ia_name = token.value; + return DAT_SUCCESS; - return status; +bail: + dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, + " ERR: corrupt dat.conf entry field:" + " ia_name, file offset=%ld\n", ftell(file)); + return DAT_INTERNAL_ERROR; } @@ -610,39 +604,26 @@ dat_sr_parse_api ( DAT_SR_CONF_ENTRY *entry) { DAT_SR_TOKEN token; - DAT_RETURN status; - - if ( DAT_SUCCESS != dat_sr_get_token (file, &token) ) - { - return DAT_INTERNAL_ERROR; - } - - if ( DAT_SR_TOKEN_STRING != token.type ) - { - status = DAT_INTERNAL_ERROR; - } - else if ( DAT_SUCCESS != dat_sr_convert_api ( - token.value, &entry->api_version) ) - { - status = DAT_INTERNAL_ERROR; - } - else - { - dat_os_free (token.value, - (sizeof (char) * dat_os_strlen (token.value)) + 1); - status = DAT_SUCCESS; - } + if (DAT_SUCCESS != dat_sr_get_token (file, &token)) + goto bail; - if ( DAT_SUCCESS != status ) - { - DAT_RETURN status_success; + if (DAT_SR_TOKEN_STRING != token.type) + goto cleanup; - status_success = dat_sr_put_token (file, &token); - dat_os_assert ( DAT_SUCCESS == status_success); - } + if (DAT_SUCCESS != dat_sr_convert_api(token.value, &entry->api_version)) + goto cleanup; + + dat_os_free(token.value, (sizeof(char) * dat_os_strlen(token.value))+1); + return DAT_SUCCESS; - return status; +cleanup: + dat_sr_put_token (file, &token); +bail: + dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, + " ERR: corrupt dat.conf entry field:" + " api_ver, file offset=%ld\n", ftell(file)); + return DAT_INTERNAL_ERROR; } @@ -656,39 +637,27 @@ dat_sr_parse_thread_safety ( DAT_SR_CONF_ENTRY *entry) { DAT_SR_TOKEN token; - DAT_RETURN status; - - if ( DAT_SUCCESS != dat_sr_get_token (file, &token) ) - { - return DAT_INTERNAL_ERROR; - } - - if ( DAT_SR_TOKEN_STRING != token.type ) - { - status = DAT_INTERNAL_ERROR; - } - else if ( DAT_SUCCESS != dat_sr_convert_thread_safety ( - token.value, &entry->is_thread_safe) ) - { - status = DAT_INTERNAL_ERROR; - } - else - { - dat_os_free (token.value, - (sizeof (char) * dat_os_strlen (token.value)) + 1); - status = DAT_SUCCESS; - } + if (DAT_SUCCESS != dat_sr_get_token (file, &token)) + goto bail; - if ( DAT_SUCCESS != status ) - { - DAT_RETURN status_success; + if (DAT_SR_TOKEN_STRING != token.type) + goto cleanup; - status_success = dat_sr_put_token (file, &token); - dat_os_assert ( DAT_SUCCESS == status_success); - } + if (DAT_SUCCESS != dat_sr_convert_thread_safety( + token.value, &entry->is_thread_safe)) + goto cleanup; + + dat_os_free(token.value, (sizeof(char) * dat_os_strlen(token.value))+1); + return DAT_SUCCESS; - return status; +cleanup: + dat_sr_put_token (file, &token); +bail: + dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, + " ERR: corrupt dat.conf entry field:" + " thread_safety, file offset=%ld\n", ftell(file)); + return DAT_INTERNAL_ERROR; } @@ -702,39 +671,26 @@ dat_sr_parse_default ( DAT_SR_CONF_ENTRY *entry) { DAT_SR_TOKEN token; - DAT_RETURN status; - - if ( DAT_SUCCESS != dat_sr_get_token (file, &token) ) - { - return DAT_INTERNAL_ERROR; - } - - if ( DAT_SR_TOKEN_STRING != token.type ) - { - status = DAT_INTERNAL_ERROR; - } - else if ( DAT_SUCCESS != dat_sr_convert_default ( - token.value, &entry->is_default) ) - { - status = DAT_INTERNAL_ERROR; - } - else - { - dat_os_free (token.value, - (sizeof (char) * dat_os_strlen (token.value)) + 1); - status = DAT_SUCCESS; - } + if (DAT_SUCCESS != dat_sr_get_token (file, &token)) + goto bail; - if ( DAT_SUCCESS != status ) - { - DAT_RETURN status_success; + if (DAT_SR_TOKEN_STRING != token.type) + goto cleanup; - status_success = dat_sr_put_token (file, &token); - dat_os_assert ( DAT_SUCCESS == status_success); - } + if (DAT_SUCCESS != dat_sr_convert_default(token.value, &entry->is_default)) + goto cleanup; + + dat_os_free(token.value, (sizeof(char) * dat_os_strlen(token.value))+1); + return DAT_SUCCESS; - return status; +cleanup: + dat_sr_put_token (file, &token); +bail: + dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, + " ERR: corrupt dat.conf entry field:" + " default section, file offset=%ld\n", ftell(file)); + return DAT_INTERNAL_ERROR; } @@ -748,33 +704,22 @@ dat_sr_parse_lib_path ( DAT_SR_CONF_ENTRY *entry) { DAT_SR_TOKEN token; - DAT_RETURN status; - - if ( DAT_SUCCESS != dat_sr_get_token (file, &token) ) - { - return DAT_INTERNAL_ERROR; - } - if ( DAT_SR_TOKEN_STRING != token.type ) - { - status = DAT_INTERNAL_ERROR; - } - else - { - entry->lib_path = token.value; - - status = DAT_SUCCESS; - } - - if ( DAT_SUCCESS != status ) - { - DAT_RETURN status_success; + if (DAT_SUCCESS != dat_sr_get_token(file, &token)) + goto bail; - status_success = dat_sr_put_token (file, &token); - dat_os_assert ( DAT_SUCCESS == status_success); + if (DAT_SR_TOKEN_STRING != token.type) { + dat_sr_put_token (file, &token); + goto bail; } + entry->lib_path = token.value; + return DAT_SUCCESS; - return status; +bail: + dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, + " ERR: corrupt dat.conf entry field:" + " lib_path, file offset=%ld\n", ftell(file)); + return DAT_INTERNAL_ERROR; } /*********************************************************************** @@ -787,42 +732,29 @@ dat_sr_parse_provider_version ( DAT_SR_CONF_ENTRY *entry) { DAT_SR_TOKEN token; - DAT_RETURN status; - if ( DAT_SUCCESS != dat_sr_get_token (file, &token) ) - { - return DAT_INTERNAL_ERROR; - } + if (DAT_SUCCESS != dat_sr_get_token (file, &token)) + goto bail; - if ( DAT_SR_TOKEN_STRING != token.type ) - { - status = DAT_INTERNAL_ERROR; - } - else if ( DAT_SUCCESS != dat_sr_convert_provider_version ( - token.value, &entry->provider_version) ) - { - status = DAT_INTERNAL_ERROR; - } - else - { - dat_os_free (token.value, - (sizeof (char) * dat_os_strlen (token.value)) + 1); - - status = DAT_SUCCESS; - } + if (DAT_SR_TOKEN_STRING != token.type) + goto cleanup; - if ( DAT_SUCCESS != status ) - { - DAT_RETURN status_success; - - status_success = dat_sr_put_token (file, &token); - dat_os_assert ( DAT_SUCCESS == status_success); - } + if (DAT_SUCCESS != dat_sr_convert_provider_version( + token.value, &entry->provider_version)) + goto cleanup; + + dat_os_free(token.value, (sizeof(char) * dat_os_strlen(token.value))+1); + return DAT_SUCCESS; - return status; +cleanup: + dat_sr_put_token (file, &token); +bail: + dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, + " ERR: corrupt dat.conf entry field:" + " provider_ver, file offset=%ld\n", ftell(file)); + return DAT_INTERNAL_ERROR; } - /*********************************************************************** * Function: dat_sr_parse_ia_params ***********************************************************************/ @@ -833,33 +765,23 @@ dat_sr_parse_ia_params ( DAT_SR_CONF_ENTRY *entry) { DAT_SR_TOKEN token; - DAT_RETURN status; - - if ( DAT_SUCCESS != dat_sr_get_token (file, &token) ) - { - return DAT_INTERNAL_ERROR; - } - if ( DAT_SR_TOKEN_STRING != token.type ) - { - status = DAT_INTERNAL_ERROR; - } - else - { - entry->ia_params = token.value; + if (DAT_SUCCESS != dat_sr_get_token (file, &token)) + goto bail; - status = DAT_SUCCESS; + if (DAT_SR_TOKEN_STRING != token.type) { + dat_sr_put_token (file, &token); + goto bail; } - if ( DAT_SUCCESS != status ) - { - DAT_RETURN status_success; - - status_success = dat_sr_put_token (file, &token); - dat_os_assert ( DAT_SUCCESS == status_success); - } + entry->ia_params = token.value; + return DAT_SUCCESS; - return status; +bail: + dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, + " ERR: corrupt dat.conf entry field:" + " ia_params, file offset=%ld\n", ftell(file)); + return DAT_INTERNAL_ERROR; } @@ -873,33 +795,23 @@ dat_sr_parse_platform_params ( DAT_SR_CONF_ENTRY *entry) { DAT_SR_TOKEN token; - DAT_RETURN status; - - if ( DAT_SUCCESS != dat_sr_get_token (file, &token) ) - { - return DAT_INTERNAL_ERROR; - } - if ( DAT_SR_TOKEN_STRING != token.type ) - { - status = DAT_INTERNAL_ERROR; - } - else - { - entry->platform_params = token.value; + if (DAT_SUCCESS != dat_sr_get_token (file, &token)) + goto bail; - status = DAT_SUCCESS; + if (DAT_SR_TOKEN_STRING != token.type) { + dat_sr_put_token (file, &token); + goto bail; } - if ( DAT_SUCCESS != status ) - { - DAT_RETURN status_success; - - status_success = dat_sr_put_token (file, &token); - dat_os_assert ( DAT_SUCCESS == status_success); - } + entry->platform_params = token.value; + return DAT_SUCCESS; - return status; +bail: + dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, + " ERR: corrupt dat.conf entry field:" + " platform_params, file offset=%ld\n", ftell(file)); + return DAT_INTERNAL_ERROR; } @@ -913,32 +825,23 @@ dat_sr_parse_eoe ( DAT_SR_CONF_ENTRY *entry) { DAT_SR_TOKEN token; - DAT_RETURN status; - - if ( DAT_SUCCESS != dat_sr_get_token (file, &token) ) - { - return DAT_INTERNAL_ERROR; - } - - if ( (DAT_SR_TOKEN_EOF != token.type) && - (DAT_SR_TOKEN_EOR != token.type) ) - { - status = DAT_INTERNAL_ERROR; - } - else - { - status = DAT_SUCCESS; - } - if ( DAT_SUCCESS != status ) - { - DAT_RETURN status_success; + if (DAT_SUCCESS != dat_sr_get_token (file, &token)) + goto bail; - status_success = dat_sr_put_token (file, &token); - dat_os_assert ( DAT_SUCCESS == status_success); + if ((DAT_SR_TOKEN_EOF != token.type) && + (DAT_SR_TOKEN_EOR != token.type)) { + dat_sr_put_token (file, &token); + goto bail; } + + return DAT_SUCCESS; - return status; +bail: + dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, + " ERR: corrupt dat.conf entry field:" + " EOR, EOF, file offset=%ld\n", ftell(file)); + return DAT_INTERNAL_ERROR; } @@ -954,7 +857,8 @@ dat_sr_convert_api ( int i; int minor_i; - dat_os_assert ( 0 < dat_os_strlen (str) ); + if (dat_os_strlen(str) <= 0) + return DAT_INTERNAL_ERROR; if ( 'u' == str[0] ) { @@ -1078,8 +982,8 @@ dat_sr_convert_provider_version ( int i; int decimal_i; - dat_os_assert ( 0 < dat_os_strlen (str) ); - dat_os_assert ( NULL == provider_version->id ); + if ((dat_os_strlen(str) <= 0) || (NULL != provider_version->id)) + return DAT_INTERNAL_ERROR; status = DAT_SUCCESS; -- 1.5.2.5 From arlin.r.davis at intel.com Tue Apr 29 19:45:30 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Tue, 29 Apr 2008 19:45:30 -0700 Subject: [ofa-general] [PATCH 1/1][v2.0] dat: cleanup error handling with static registry parsing of dat.conf Message-ID: change asserts to return codes, add log messages, and report errors via open instead of asserts during dat library load. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dat/udat/linux/dat_osd.c | 2 +- dat/udat/udat_sr_parser.c | 382 +++++++++++++++++---------------------------- 2 files changed, 144 insertions(+), 240 deletions(-) diff --git a/dat/udat/linux/dat_osd.c b/dat/udat/linux/dat_osd.c index fa76c12..7305168 100644 --- a/dat/udat/linux/dat_osd.c +++ b/dat/udat/linux/dat_osd.c @@ -76,7 +76,7 @@ typedef enum * * *********************************************************************/ -static DAT_OS_DBG_TYPE_VAL g_dbg_type = 0; +static DAT_OS_DBG_TYPE_VAL g_dbg_type = DAT_OS_DBG_TYPE_ERROR; static DAT_OS_DBG_DEST g_dbg_dest = DAT_OS_DBG_DEST_STDOUT; diff --git a/dat/udat/udat_sr_parser.c b/dat/udat/udat_sr_parser.c index 5761e3b..904acff 100644 --- a/dat/udat/udat_sr_parser.c +++ b/dat/udat/udat_sr_parser.c @@ -297,7 +297,7 @@ dat_sr_load (void) sr_file = dat_os_fopen (sr_path); if ( sr_file == NULL ) { - return DAT_INTERNAL_ERROR; + goto bail; } for (;;) @@ -312,17 +312,22 @@ dat_sr_load (void) } else { - dat_os_assert (!"unable to parse static registry file"); - break; + goto cleanup; } } - if ( 0 != dat_os_fclose (sr_file) ) - { - return DAT_INTERNAL_ERROR; - } + if (0 != dat_os_fclose (sr_file)) + goto bail; return DAT_SUCCESS; + +cleanup: + dat_os_fclose(sr_file); +bail: + dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, + "ERROR: unable to parse static registry file, dat.conf\n"); + return DAT_INTERNAL_ERROR; + } @@ -574,33 +579,22 @@ dat_sr_parse_ia_name ( DAT_SR_CONF_ENTRY *entry) { DAT_SR_TOKEN token; - DAT_RETURN status; - if ( DAT_SUCCESS != dat_sr_get_token (file, &token) ) - { - return DAT_INTERNAL_ERROR; - } + if (DAT_SUCCESS != dat_sr_get_token (file, &token)) + goto bail; - if ( DAT_SR_TOKEN_STRING != token.type ) - { - status = DAT_INTERNAL_ERROR; - } - else - { - entry->ia_name = token.value; - - status = DAT_SUCCESS; - } - - if ( DAT_SUCCESS != status ) - { - DAT_RETURN status_success; - - status_success = dat_sr_put_token (file, &token); - dat_os_assert ( DAT_SUCCESS == status_success); + if (DAT_SR_TOKEN_STRING != token.type) { + dat_sr_put_token (file, &token); + goto bail; } + entry->ia_name = token.value; + return DAT_SUCCESS; - return status; +bail: + dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, + " ERR: corrupt dat.conf entry field:" + " ia_name, file offset=%ld\n", ftell(file)); + return DAT_INTERNAL_ERROR; } @@ -614,39 +608,26 @@ dat_sr_parse_api ( DAT_SR_CONF_ENTRY *entry) { DAT_SR_TOKEN token; - DAT_RETURN status; - - if ( DAT_SUCCESS != dat_sr_get_token (file, &token) ) - { - return DAT_INTERNAL_ERROR; - } - - if ( DAT_SR_TOKEN_STRING != token.type ) - { - status = DAT_INTERNAL_ERROR; - } - else if ( DAT_SUCCESS != dat_sr_convert_api ( - token.value, &entry->api_version) ) - { - status = DAT_INTERNAL_ERROR; - } - else - { - dat_os_free (token.value, - (sizeof (char) * dat_os_strlen (token.value)) + 1); - status = DAT_SUCCESS; - } + if (DAT_SUCCESS != dat_sr_get_token (file, &token)) + goto bail; - if ( DAT_SUCCESS != status ) - { - DAT_RETURN status_success; + if (DAT_SR_TOKEN_STRING != token.type) + goto cleanup; - status_success = dat_sr_put_token (file, &token); - dat_os_assert ( DAT_SUCCESS == status_success); - } + if (DAT_SUCCESS != dat_sr_convert_api(token.value, &entry->api_version)) + goto cleanup; + + dat_os_free(token.value, (sizeof(char) * dat_os_strlen(token.value))+1); + return DAT_SUCCESS; - return status; +cleanup: + dat_sr_put_token (file, &token); +bail: + dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, + " ERR: corrupt dat.conf entry field:" + " api_ver, file offset=%ld\n", ftell(file)); + return DAT_INTERNAL_ERROR; } @@ -660,39 +641,27 @@ dat_sr_parse_thread_safety ( DAT_SR_CONF_ENTRY *entry) { DAT_SR_TOKEN token; - DAT_RETURN status; - - if ( DAT_SUCCESS != dat_sr_get_token (file, &token) ) - { - return DAT_INTERNAL_ERROR; - } - - if ( DAT_SR_TOKEN_STRING != token.type ) - { - status = DAT_INTERNAL_ERROR; - } - else if ( DAT_SUCCESS != dat_sr_convert_thread_safety ( - token.value, &entry->is_thread_safe) ) - { - status = DAT_INTERNAL_ERROR; - } - else - { - dat_os_free (token.value, - (sizeof (char) * dat_os_strlen (token.value)) + 1); - status = DAT_SUCCESS; - } + if (DAT_SUCCESS != dat_sr_get_token (file, &token)) + goto bail; - if ( DAT_SUCCESS != status ) - { - DAT_RETURN status_success; + if (DAT_SR_TOKEN_STRING != token.type) + goto cleanup; - status_success = dat_sr_put_token (file, &token); - dat_os_assert ( DAT_SUCCESS == status_success); - } + if (DAT_SUCCESS != dat_sr_convert_thread_safety( + token.value, &entry->is_thread_safe)) + goto cleanup; + + dat_os_free(token.value, (sizeof(char) * dat_os_strlen(token.value))+1); + return DAT_SUCCESS; - return status; +cleanup: + dat_sr_put_token (file, &token); +bail: + dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, + " ERR: corrupt dat.conf entry field:" + " thread_safety, file offset=%ld\n", ftell(file)); + return DAT_INTERNAL_ERROR; } @@ -706,39 +675,26 @@ dat_sr_parse_default ( DAT_SR_CONF_ENTRY *entry) { DAT_SR_TOKEN token; - DAT_RETURN status; - - if ( DAT_SUCCESS != dat_sr_get_token (file, &token) ) - { - return DAT_INTERNAL_ERROR; - } - - if ( DAT_SR_TOKEN_STRING != token.type ) - { - status = DAT_INTERNAL_ERROR; - } - else if ( DAT_SUCCESS != dat_sr_convert_default ( - token.value, &entry->is_default) ) - { - status = DAT_INTERNAL_ERROR; - } - else - { - dat_os_free (token.value, - (sizeof (char) * dat_os_strlen (token.value)) + 1); - status = DAT_SUCCESS; - } + if (DAT_SUCCESS != dat_sr_get_token (file, &token)) + goto bail; - if ( DAT_SUCCESS != status ) - { - DAT_RETURN status_success; + if (DAT_SR_TOKEN_STRING != token.type) + goto cleanup; - status_success = dat_sr_put_token (file, &token); - dat_os_assert ( DAT_SUCCESS == status_success); - } + if (DAT_SUCCESS != dat_sr_convert_default(token.value, &entry->is_default)) + goto cleanup; + + dat_os_free(token.value, (sizeof(char) * dat_os_strlen(token.value))+1); + return DAT_SUCCESS; - return status; +cleanup: + dat_sr_put_token (file, &token); +bail: + dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, + " ERR: corrupt dat.conf entry field:" + " default section, file offset=%ld\n", ftell(file)); + return DAT_INTERNAL_ERROR; } @@ -752,33 +708,22 @@ dat_sr_parse_lib_path ( DAT_SR_CONF_ENTRY *entry) { DAT_SR_TOKEN token; - DAT_RETURN status; - - if ( DAT_SUCCESS != dat_sr_get_token (file, &token) ) - { - return DAT_INTERNAL_ERROR; - } - if ( DAT_SR_TOKEN_STRING != token.type ) - { - status = DAT_INTERNAL_ERROR; - } - else - { - entry->lib_path = token.value; - - status = DAT_SUCCESS; - } - - if ( DAT_SUCCESS != status ) - { - DAT_RETURN status_success; + if (DAT_SUCCESS != dat_sr_get_token(file, &token)) + goto bail; - status_success = dat_sr_put_token (file, &token); - dat_os_assert ( DAT_SUCCESS == status_success); + if (DAT_SR_TOKEN_STRING != token.type) { + dat_sr_put_token (file, &token); + goto bail; } + entry->lib_path = token.value; + return DAT_SUCCESS; - return status; +bail: + dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, + " ERR: corrupt dat.conf entry field:" + " lib_path, file offset=%ld\n", ftell(file)); + return DAT_INTERNAL_ERROR; } /*********************************************************************** @@ -791,42 +736,29 @@ dat_sr_parse_provider_version ( DAT_SR_CONF_ENTRY *entry) { DAT_SR_TOKEN token; - DAT_RETURN status; - if ( DAT_SUCCESS != dat_sr_get_token (file, &token) ) - { - return DAT_INTERNAL_ERROR; - } + if (DAT_SUCCESS != dat_sr_get_token (file, &token)) + goto bail; - if ( DAT_SR_TOKEN_STRING != token.type ) - { - status = DAT_INTERNAL_ERROR; - } - else if ( DAT_SUCCESS != dat_sr_convert_provider_version ( - token.value, &entry->provider_version) ) - { - status = DAT_INTERNAL_ERROR; - } - else - { - dat_os_free (token.value, - (sizeof (char) * dat_os_strlen (token.value)) + 1); - - status = DAT_SUCCESS; - } + if (DAT_SR_TOKEN_STRING != token.type) + goto cleanup; - if ( DAT_SUCCESS != status ) - { - DAT_RETURN status_success; - - status_success = dat_sr_put_token (file, &token); - dat_os_assert ( DAT_SUCCESS == status_success); - } + if (DAT_SUCCESS != dat_sr_convert_provider_version( + token.value, &entry->provider_version)) + goto cleanup; + + dat_os_free(token.value, (sizeof(char) * dat_os_strlen(token.value))+1); + return DAT_SUCCESS; - return status; +cleanup: + dat_sr_put_token (file, &token); +bail: + dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, + " ERR: corrupt dat.conf entry field:" + " provider_ver, file offset=%ld\n", ftell(file)); + return DAT_INTERNAL_ERROR; } - /*********************************************************************** * Function: dat_sr_parse_ia_params ***********************************************************************/ @@ -837,33 +769,23 @@ dat_sr_parse_ia_params ( DAT_SR_CONF_ENTRY *entry) { DAT_SR_TOKEN token; - DAT_RETURN status; - - if ( DAT_SUCCESS != dat_sr_get_token (file, &token) ) - { - return DAT_INTERNAL_ERROR; - } - if ( DAT_SR_TOKEN_STRING != token.type ) - { - status = DAT_INTERNAL_ERROR; - } - else - { - entry->ia_params = token.value; + if (DAT_SUCCESS != dat_sr_get_token (file, &token)) + goto bail; - status = DAT_SUCCESS; + if (DAT_SR_TOKEN_STRING != token.type) { + dat_sr_put_token (file, &token); + goto bail; } - if ( DAT_SUCCESS != status ) - { - DAT_RETURN status_success; - - status_success = dat_sr_put_token (file, &token); - dat_os_assert ( DAT_SUCCESS == status_success); - } + entry->ia_params = token.value; + return DAT_SUCCESS; - return status; +bail: + dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, + " ERR: corrupt dat.conf entry field:" + " ia_params, file offset=%ld\n", ftell(file)); + return DAT_INTERNAL_ERROR; } @@ -877,33 +799,23 @@ dat_sr_parse_platform_params ( DAT_SR_CONF_ENTRY *entry) { DAT_SR_TOKEN token; - DAT_RETURN status; - - if ( DAT_SUCCESS != dat_sr_get_token (file, &token) ) - { - return DAT_INTERNAL_ERROR; - } - if ( DAT_SR_TOKEN_STRING != token.type ) - { - status = DAT_INTERNAL_ERROR; - } - else - { - entry->platform_params = token.value; + if (DAT_SUCCESS != dat_sr_get_token (file, &token)) + goto bail; - status = DAT_SUCCESS; + if (DAT_SR_TOKEN_STRING != token.type) { + dat_sr_put_token (file, &token); + goto bail; } - if ( DAT_SUCCESS != status ) - { - DAT_RETURN status_success; - - status_success = dat_sr_put_token (file, &token); - dat_os_assert ( DAT_SUCCESS == status_success); - } + entry->platform_params = token.value; + return DAT_SUCCESS; - return status; +bail: + dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, + " ERR: corrupt dat.conf entry field:" + " platform_params, file offset=%ld\n", ftell(file)); + return DAT_INTERNAL_ERROR; } @@ -917,32 +829,23 @@ dat_sr_parse_eoe ( DAT_SR_CONF_ENTRY *entry) { DAT_SR_TOKEN token; - DAT_RETURN status; - - if ( DAT_SUCCESS != dat_sr_get_token (file, &token) ) - { - return DAT_INTERNAL_ERROR; - } - - if ( (DAT_SR_TOKEN_EOF != token.type) && - (DAT_SR_TOKEN_EOR != token.type) ) - { - status = DAT_INTERNAL_ERROR; - } - else - { - status = DAT_SUCCESS; - } - if ( DAT_SUCCESS != status ) - { - DAT_RETURN status_success; + if (DAT_SUCCESS != dat_sr_get_token (file, &token)) + goto bail; - status_success = dat_sr_put_token (file, &token); - dat_os_assert ( DAT_SUCCESS == status_success); + if ((DAT_SR_TOKEN_EOF != token.type) && + (DAT_SR_TOKEN_EOR != token.type)) { + dat_sr_put_token (file, &token); + goto bail; } + + return DAT_SUCCESS; - return status; +bail: + dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, + " ERR: corrupt dat.conf entry field:" + " EOR, EOF, file offset=%ld\n", ftell(file)); + return DAT_INTERNAL_ERROR; } @@ -958,7 +861,8 @@ dat_sr_convert_api ( int i; int minor_i; - dat_os_assert ( 0 < dat_os_strlen (str) ); + if (dat_os_strlen(str) <= 0) + return DAT_INTERNAL_ERROR; if ( 'u' == str[0] ) { @@ -1082,8 +986,8 @@ dat_sr_convert_provider_version ( int i; int decimal_i; - dat_os_assert ( 0 < dat_os_strlen (str) ); - dat_os_assert ( NULL == provider_version->id ); + if ((dat_os_strlen(str) <= 0) || (NULL != provider_version->id)) + return DAT_INTERNAL_ERROR; status = DAT_SUCCESS; -- 1.5.2.5 From okir at lst.de Tue Apr 29 22:44:58 2008 From: okir at lst.de (Olaf Kirch) Date: Wed, 30 Apr 2008 07:44:58 +0200 Subject: [ofa-general] Re: [PATCH 2/8]: mthca/mlx4: avoid recycling old FMR R_Keys too soon In-Reply-To: References: <200804241106.57172.okir@lst.de> <200804241109.52448.okir@lst.de> Message-ID: <200804300744.59654.okir@lst.de> On Tuesday 29 April 2008 20:25:49 Roland Dreier wrote: > > Content-Transfer-Encoding: quoted-printable > > ugh, mangled patch. Argh, sorry. /me whacks his WIMPish mailer Olaf -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir at lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax From erezz at voltaire.com Wed Apr 30 04:54:14 2008 From: erezz at voltaire.com (Erez Zilber) Date: Wed, 30 Apr 2008 14:54:14 +0300 Subject: [Stgt-devel] [ofa-general] Re: [Ips] Calculating the VA iniSER header References: <4804B03C.6060507@voltaire.com><694d48600804160122l1cc97b8aka8986ee6deb7dec8@mail.gmail.com><20080416144830.GC23861@osc.edu> <694d48600804170413g4d54cd9g447abd345a1f6301@mail.gmail.com> <20080429170516.GA8857@osc.edu> Message-ID: <39C75744D164D948A170E9792AF8E7CAF60D50@exil.voltaire.com> > Hi all, > > It appears the current Linux iSER initiator does not send the HELLO message when the connection transits to full feature phase. The stgt target also ignores this message (if it were to appear). Both of these implementations use a non-conformant iSER header (they add write_va and read_va fields, which incidentally do not appear to be used). Are these changes documented anywhere in the IB domain, or are these variations needed for another reason? > > If these deviations from the RFC are not needed and were to be fixed (along with the offset fix), then these implementations can detect the current mode of operation by examining the size > of the iSER header received. The choice to proceed in the broken way, or to terminate the connection (with big loud error messages) is the implementor's choice. Either way, the issue is detected and corruption avoided. > > Thoughts? Take a look at the iSER for IB annex: http://www.infinibandta.org/members/spec/Annex_iSER.PDF Erez From kensandars at hotmail.com Wed Apr 30 00:43:19 2008 From: kensandars at hotmail.com (Ken Sandars) Date: Wed, 30 Apr 2008 17:43:19 +1000 Subject: ***SPAM*** RE: [Stgt-devel] [ofa-general] Re: [Ips] Calculating the VA in iSER header In-Reply-To: <20080429170516.GA8857@osc.edu> References: <4804B03C.6060507@voltaire.com> <694d48600804160122l1cc97b8aka8986ee6deb7dec8@mail.gmail.com> <20080416144830.GC23861@osc.edu> <694d48600804170413g4d54cd9g447abd345a1f6301@mail.gmail.com> <20080429170516.GA8857@osc.edu> Message-ID: Hi all, It appears the current Linux iSER initiator does not send the HELLO message when the connection transits to full feature phase. The stgt target also ignores this message (if it were to appear). Both of these implementations use a non-conformant iSER header (they add write_va and read_va fields, which incidentally do not appear to be used). Are these changes documented anywhere in the IB domain, or are these variations needed for another reason? If these deviations from the RFC are not needed and were to be fixed (along with the offset fix), then these implementations can detect the current mode of operation by examining the size of the iSER header received. The choice to proceed in the broken way, or to terminate the connection (with big loud error messages) is the implementor's choice. Either way, the issue is detected and corruption avoided. Thoughts? Cheers Ken > Date: Tue, 29 Apr 2008 13:05:16 -0400 > From: pw at osc.edu > To: dorfman.eli at gmail.com > CC: stgt-devel at lists.berlios.de; rdreier at cisco.com; general at lists.openfabrics.org; mako at almaden.ibm.com; ips at ietf.org; open-iscsi at googlegroups.com > Subject: Re: [Stgt-devel] [ofa-general] Re: [Ips] Calculating the VA in iSER header > > dorfman.eli at gmail.com wrote on Thu, 17 Apr 2008 14:13 +0300: > > On Wed, Apr 16, 2008 at 6:46 PM, Roland Dreier wrote: > > > > Agree with the interpretation of the spec, and it's probably a bit > > > > clearer that way too. But we have working initiators and targets > > > > that do it the "wrong" way. > > > > > > Yes... I guess the key question is whether there are any initiators that > > > do things the "right" way. > > > > > > > > > > 1. Flag day: all initiators and targets change at the same time. > > > > Will see data corruption if someone unluckily runs one or the other > > > > using old non-fixed code. > > > > > > Seems unacceptable to me... it doesn't make sense at all to break every > > > setup in the world just to be "right" according to the spec. > > > > This will break only when both initiator and target will use > > InitialR2T=No, which means allow unsolicited data. > > As far as I know, STGT is not very common (and its version in RHEL5.1 > > is considered experimental). Its default is also InitialR2T=Yes. > > Voltaire's iSCSI over iSER target also uses default InitialR2T=Yes. > > So it seems that nothing will break. > > I finally got a chance to look at this just now. I think you mean > default is InitialR2T=No above, which means no unsolicited data. > That is the default case, and true, the two different meanings > of the initiator-supplied VA coincide. > > But you missed the impact of immediate data. We run with the > defaults (I think) that say the first write request packet should be > filled with a bit of the coming data stream. From iscsid.conf: > > # To enable immediate data (i.e., the initiator sends unsolicited data > # with the iSCSI command packet), uncomment the following line: > # > # The default is Yes > node.session.iscsi.ImmediateData = Yes > > Looking at the offset printed out by your patch, it is indeed > non-zero for the first RDMA read. Please correct me if I am > mistaken about this---you must have tested all four variations of > with and without the patches on initiator and target side, but I did > not. > > Hence I am still a bit unhappy about having to deal with the > fallout, with no way to detect it. For our local use, I'll keep an > older version of stgt in use until we switch to a new kernel, then > merge up the target side change. It is a bother, but I can deal > with it. For other institutions, this lockstep upgrade requirement > will not be obvious until they debug the resulting data corruption. > > Still, I do understand why it would be nice to conform to the spec, > and it is maybe a bit cleaner that way too. Maybe you can help with > the bug reports on stgt-devel during the transition, and maintain > and publish a patch to let it work with old kernels. > > -- Pete > _______________________________________________ > Stgt-devel mailing list > Stgt-devel at lists.berlios.de > https://lists.berlios.de/mailman/listinfo/stgt-devel _________________________________________________________________ Find the job of your dreams before someone else does http://mycareer.com.au/?s_cid=596064 -------------- next part -------------- An HTML attachment was scrubbed... URL: From pw at osc.edu Wed Apr 30 07:08:25 2008 From: pw at osc.edu (Pete Wyckoff) Date: Wed, 30 Apr 2008 10:08:25 -0400 Subject: [ofa-general] [PATCH] IB/iSER: Add module param to count alignment violations In-Reply-To: References: <694d48600804280510l25ee6f90t9eff86fd6743461@mail.gmail.com> Message-ID: <20080430140825.GB19339@osc.edu> rdreier at cisco.com wrote on Mon, 28 Apr 2008 08:51 -0700: > > Add read only module param to count alignment violations. > > I don't think a module parameter is the way to report statistics from > the kernel. Can't you just add a device attribute or something? Or > stick a file in debugfs? This is definitely a worthwhile change though. By monitoring this statistic we were able to get good insight to what our apps are doing to cause these alignment violations. I have a hacky patch that tries to export it via sysfs, but it doesn't clean up properly. The iscsi transport class defines the sysfs tree and doesn't give hooks to a particular device to add/change those entries, which is why this approach came out rather ugly. Hope Eli is willing to do this the right way; maybe debugfs is the way to go. -- Pete From monis at Voltaire.COM Wed Apr 30 07:12:42 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Wed, 30 Apr 2008 17:12:42 +0300 Subject: [ofa-general] [PATCH] IB/core: handle race between elements in qork queues after event Message-ID: <48187E5A.7040809@Voltaire.COM> This patch solves a race between elements in work queues that are carried out after an event occurs. When SM address handle becomes i nvalid and needs an update it is set to NULL and until update_sm_ah() is called, any request that needs sm_ah is replied with -EAGAIN return status. Signed-off-by: Moni Levy Signed-off-by: Moni Shoua --- drivers/infiniband/core/sa_query.c | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index cf474ec..19439d8 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -407,15 +407,27 @@ static void update_sm_ah(struct work_str static void ib_sa_event(struct ib_event_handler *handler, struct ib_event *event) { + if (event->event == IB_EVENT_PORT_ERR || event->event == IB_EVENT_PORT_ACTIVE || event->event == IB_EVENT_LID_CHANGE || event->event == IB_EVENT_PKEY_CHANGE || event->event == IB_EVENT_SM_CHANGE || event->event == IB_EVENT_CLIENT_REREGISTER) { - struct ib_sa_device *sa_dev; - sa_dev = container_of(handler, typeof(*sa_dev), event_handler); - + unsigned long flags; + struct ib_sa_device *sa_dev = + container_of(handler, typeof(*sa_dev), event_handler); + struct ib_sa_port *port = + &sa_dev->port[event->element.port_num - sa_dev->start_port]; + struct ib_sa_sm_ah *sm_ah; + + spin_lock_irqsave(&port->ah_lock, flags); + sm_ah = port->sm_ah; + port->sm_ah = NULL; + spin_unlock_irqrestore(&port->ah_lock, flags); + + if (sm_ah) + kref_put(&sm_ah->ref, free_sm_ah); schedule_work(&sa_dev->port[event->element.port_num - sa_dev->start_port].update_task); } @@ -780,6 +792,10 @@ int ib_sa_service_rec_query(struct ib_sa return -ENODEV; port = &sa_dev->port[port_num - sa_dev->start_port]; + if (!port->sm_ah) { + return -EAGAIN; + } + agent = port->agent; if (method != IB_MGMT_METHOD_GET && @@ -877,8 +893,12 @@ int ib_sa_mcmember_rec_query(struct ib_s return -ENODEV; port = &sa_dev->port[port_num - sa_dev->start_port]; - agent = port->agent; + if (!port->sm_ah) { + return -EAGAIN; + } + agent = port->agent; + query = kmalloc(sizeof *query, gfp_mask); if (!query) return -ENOMEM; From ogerlitz at voltaire.com Wed Apr 30 07:16:40 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 30 Apr 2008 17:16:40 +0300 Subject: [ofa-general] Re: [PATCH] IB/core: handle race between elements in qork queues after event In-Reply-To: <48187E5A.7040809@Voltaire.COM> References: <48187E5A.7040809@Voltaire.COM> Message-ID: <48187F48.1090701@voltaire.com> Moni Shoua wrote: > any request that needs sm_ah is replied with -EAGAIN return status. what about ib_sa_path_rec_get() Or. From monis at Voltaire.COM Wed Apr 30 07:37:06 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Wed, 30 Apr 2008 17:37:06 +0300 Subject: [ofa-general] Re: [PATCH] IB/core: handle race between elements in qork queues after event In-Reply-To: <48187F48.1090701@voltaire.com> References: <48187E5A.7040809@Voltaire.COM> <48187F48.1090701@voltaire.com> Message-ID: <48188412.305@Voltaire.COM> Or Gerlitz wrote: > Moni Shoua wrote: >> any request that needs sm_ah is replied with -EAGAIN return status. > what about ib_sa_path_rec_get() Could you please be more specific? What did I miss? From monis at Voltaire.COM Wed Apr 30 07:43:49 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Wed, 30 Apr 2008 17:43:49 +0300 Subject: [ofa-general] Re: [PATCH] IB/core: handle race between elements in qork queues after event In-Reply-To: <48188412.305@Voltaire.COM> References: <48187E5A.7040809@Voltaire.COM> <48187F48.1090701@voltaire.com> <48188412.305@Voltaire.COM> Message-ID: <481885A5.3070001@Voltaire.COM> Moni Shoua wrote: > Or Gerlitz wrote: >> Moni Shoua wrote: >>> any request that needs sm_ah is replied with -EAGAIN return status. >> what about ib_sa_path_rec_get() > > Could you please be more specific? What did I miss? > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > Ok, I think I understand what you mean. I should put an if in alloc_mad() as well. thanks From daniela at georgex.org Wed Apr 30 07:44:57 2008 From: daniela at georgex.org (Daniela George) Date: Wed, 30 Apr 2008 07:44:57 -0700 Subject: [ofa-general] Re: HP PCI-X 2-port 4X Fabric (HPC) Adapter In-Reply-To: References: Message-ID: <1209566697.6137.32.camel@blue> David, The best place to address this question is to the OpenFabrics general list (general at lists.openfabrics.org). I have cc'd that list. Thanks, Daniela On Wed, 2008-04-30 at 09:17 -0400, Shue, David CTR USAF AFMC AFRL/RITB wrote: > I have used the OFED-1.3 software to communicate to the Mellanox HPC I > use. However, the OFED-1.3 does not appear to work with the subject > HPC card. The card is an HPC 380299-B21. Is there any information > you may provide in how to communicate to this card? > > > > Thank you. > > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > David Shue > > Systems Specialist > > Computer Sciences Corporation > > <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< > > > > From rdreier at cisco.com Wed Apr 30 07:59:49 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 30 Apr 2008 07:59:49 -0700 Subject: [ofa-general] Re: HP PCI-X 2-port 4X Fabric (HPC) Adapter In-Reply-To: <1209566697.6137.32.camel@blue> (Daniela George's message of "Wed, 30 Apr 2008 07:44:57 -0700") References: <1209566697.6137.32.camel@blue> Message-ID: > > I have used the OFED-1.3 software to communicate to the Mellanox HPC I > > use. However, the OFED-1.3 does not appear to work with the subject > > HPC card. The card is an HPC 380299-B21. Is there any information > > you may provide in how to communicate to this card? What does lspci -vvvnn show for this card? What do you mean by "does not appear to work"? How does it fail exactly? - R. From pw at osc.edu Wed Apr 30 08:01:16 2008 From: pw at osc.edu (Pete Wyckoff) Date: Wed, 30 Apr 2008 11:01:16 -0400 Subject: [ofa-general] [PATCH] IB/iSER: Count fmr alignment violations per session In-Reply-To: <694d48600804290033k61f717f7ob97d33b27e4c236f@mail.gmail.com> References: <694d48600804290033k61f717f7ob97d33b27e4c236f@mail.gmail.com> Message-ID: <20080430150116.GA22791@osc.edu> dorfman.eli at gmail.com wrote on Tue, 29 Apr 2008 10:33 +0300: > Count fmr alignment violations per session > as part of the iscsi statistics. > > Signed-off-by: Eli Dorfman Brilliant. Thanks for this. -- Pete From rdreier at cisco.com Wed Apr 30 08:07:23 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 30 Apr 2008 08:07:23 -0700 Subject: [ofa-general] Re: [PATCH] IB/core: handle race between elements in qork queues after event In-Reply-To: <48187E5A.7040809@Voltaire.COM> (Moni Shoua's message of "Wed, 30 Apr 2008 17:12:42 +0300") References: <48187E5A.7040809@Voltaire.COM> Message-ID: > This patch solves a race between elements in work queues that are > carried out after an event occurs. When SM address handle becomes i > nvalid and needs an update it is set to NULL and until update_sm_ah() > is called, any request that needs sm_ah is replied with -EAGAIN return > status. What is the race? What is the effect of the race? Don't expect me to be psychic and guess what you're fixing. And if there is more information in an email thread or bugzilla entry, please include a link to it. Can this race between work queue entries be solved in a simpler way just by using a single-threaded workqueue? Your patch doesn't seem to change any consumers of this code. How do they cope with a -EAGAIN return value? > static void ib_sa_event(struct ib_event_handler *handler, struct ib_event *event) > { > + > if (event->event == IB_EVENT_PORT_ERR || No need to add a blank line here. > + if (!port->sm_ah) { > + return -EAGAIN; > + } No need for braces here. > + agent = port->agent; > + > query = kmalloc(sizeof *query, gfp_mask); blank line has trailing whitespace. Please investigate using checkpatch.pl. - R. From eli at dev.mellanox.co.il Wed Apr 30 09:06:26 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Wed, 30 Apr 2008 19:06:26 +0300 Subject: [ofa-general] Re: [PATCH v2] IB/ipoib: Split CQs for IPOIB UD In-Reply-To: References: <1209370487.11248.1.camel@mtls03> Message-ID: <1209571586.1790.5.camel@mtls03> On Tue, 2008-04-29 at 14:49 -0700, Roland Dreier wrote: > By the way, this isn't just theoretical -- I'm not smart enough to > realize this except that I just saw: > > ib1: TX ring full, stopping kernel net queue > NETDEV WATCHDOG: ib1: transmit timed out > ib1: transmit timeout: latency 1240 msecs > ib1: queue stopped 1, tx_head 5291313, tx_tail 5291255 > > and of course it never recovers. I started working on a fix for this by arming the send CQ when the QP reaches 63 outstanding requests and draining the CQ at the completion handler while holding priv->tx_lock. But I had another strange problem that I don't understand. If I just load and unload ib_ipoib, the system crashes showing messages that appear like there has been a memory corruption. If I comment out destroying the send CQ at ipoib_transport_dev_cleanup() the crashes disappear. Do you see this as well? From rdreier at cisco.com Wed Apr 30 09:14:10 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 30 Apr 2008 09:14:10 -0700 Subject: [ofa-general] Re: [PATCH v2] IB/ipoib: Split CQs for IPOIB UD In-Reply-To: <1209571586.1790.5.camel@mtls03> (Eli Cohen's message of "Wed, 30 Apr 2008 19:06:26 +0300") References: <1209370487.11248.1.camel@mtls03> <1209571586.1790.5.camel@mtls03> Message-ID: > I started working on a fix for this by arming the send CQ when the QP > reaches 63 outstanding requests and draining the CQ at the completion > handler while holding priv->tx_lock. OK (I hope 63 is replaced with something that is computed based on other constants though -- like you could arm the CQ when you're about to do netif_stop_queue())... seems like it should work. > But I had another strange problem that I don't understand. If I just > load and unload ib_ipoib, the system crashes showing messages that > appear like there has been a memory corruption. If I comment out > destroying the send CQ at ipoib_transport_dev_cleanup() the crashes > disappear. Do you see this as well? Not here... what tree are you running? - R. From rdreier at cisco.com Wed Apr 30 09:15:32 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 30 Apr 2008 09:15:32 -0700 Subject: [ofa-general] Re: [PATCH v2] IB/ipoib: Split CQs for IPOIB UD In-Reply-To: <1209571586.1790.5.camel@mtls03> (Eli Cohen's message of "Wed, 30 Apr 2008 19:06:26 +0300") References: <1209370487.11248.1.camel@mtls03> <1209571586.1790.5.camel@mtls03> Message-ID: > But I had another strange problem that I don't understand. If I just > load and unload ib_ipoib, the system crashes showing messages that > appear like there has been a memory corruption. If I comment out > destroying the send CQ at ipoib_transport_dev_cleanup() the crashes > disappear. Do you see this as well? Actually maybe I just saw this happen -- it did look like memory corruption but it wasn't an immediate crash. Will try to investigate. - R. From rdreier at cisco.com Wed Apr 30 09:16:40 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 30 Apr 2008 09:16:40 -0700 Subject: [ofa-general] Re: [PATCH v2] IB/ipoib: Split CQs for IPOIB UD In-Reply-To: (Roland Dreier's message of "Wed, 30 Apr 2008 09:15:32 -0700") References: <1209370487.11248.1.camel@mtls03> <1209571586.1790.5.camel@mtls03> Message-ID: > Actually maybe I just saw this happen -- it did look like memory > corruption but it wasn't an immediate crash. By the way, what kind of HCA are you using in your system? mthca or mlx4? - R. From eli at dev.mellanox.co.il Wed Apr 30 09:22:30 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Wed, 30 Apr 2008 19:22:30 +0300 Subject: [ofa-general] Re: [PATCH v2] IB/ipoib: Split CQs for IPOIB UD In-Reply-To: References: <1209370487.11248.1.camel@mtls03> <1209571586.1790.5.camel@mtls03> Message-ID: <1209572550.1790.7.camel@mtls03> On Wed, 2008-04-30 at 09:16 -0700, Roland Dreier wrote: > > Actually maybe I just saw this happen -- it did look like memory > > corruption but it wasn't an immediate crash. > > By the way, what kind of HCA are you using in your system? mthca or > mlx4? > I have both ConnectX and Arbel. From michaelc at cs.wisc.edu Wed Apr 30 09:27:22 2008 From: michaelc at cs.wisc.edu (Mike Christie) Date: Wed, 30 Apr 2008 11:27:22 -0500 Subject: [ofa-general] [PATCH] IB/iSER: Add module param to count alignment violations In-Reply-To: <20080430140825.GB19339@osc.edu> References: <694d48600804280510l25ee6f90t9eff86fd6743461@mail.gmail.com> <20080430140825.GB19339@osc.edu> Message-ID: <48189DEA.50108@cs.wisc.edu> Pete Wyckoff wrote: > rdreier at cisco.com wrote on Mon, 28 Apr 2008 08:51 -0700: >> > Add read only module param to count alignment violations. >> >> I don't think a module parameter is the way to report statistics from >> the kernel. Can't you just add a device attribute or something? Or >> stick a file in debugfs? > > This is definitely a worthwhile change though. By monitoring this > statistic we were able to get good insight to what our apps are > doing to cause these alignment violations. > > I have a hacky patch that tries to export it via sysfs, but it > doesn't clean up properly. The iscsi transport class defines the > sysfs tree and doesn't give hooks to a particular device to > add/change those entries, which is why this approach came out rather > ugly. Hope Eli is willing to do this the right way; maybe debugfs > is the way to go. > We have iscsi stats already. I thought Eli sent a patch to put this there already? If not then put this in the get_stats callout as one of the iser custom values. From michaelc at cs.wisc.edu Wed Apr 30 09:30:23 2008 From: michaelc at cs.wisc.edu (Mike Christie) Date: Wed, 30 Apr 2008 11:30:23 -0500 Subject: [ofa-general] [PATCH] IB/iSER: Add module param to count alignment violations In-Reply-To: <48189DEA.50108@cs.wisc.edu> References: <694d48600804280510l25ee6f90t9eff86fd6743461@mail.gmail.com> <20080430140825.GB19339@osc.edu> <48189DEA.50108@cs.wisc.edu> Message-ID: <48189E9F.1020606@cs.wisc.edu> Mike Christie wrote: > Pete Wyckoff wrote: >> rdreier at cisco.com wrote on Mon, 28 Apr 2008 08:51 -0700: >>> > Add read only module param to count alignment violations. >>> >>> I don't think a module parameter is the way to report statistics from >>> the kernel. Can't you just add a device attribute or something? Or >>> stick a file in debugfs? >> This is definitely a worthwhile change though. By monitoring this >> statistic we were able to get good insight to what our apps are >> doing to cause these alignment violations. >> >> I have a hacky patch that tries to export it via sysfs, but it >> doesn't clean up properly. The iscsi transport class defines the >> sysfs tree and doesn't give hooks to a particular device to >> add/change those entries, which is why this approach came out rather >> ugly. Hope Eli is willing to do this the right way; maybe debugfs >> is the way to go. >> > > We have iscsi stats already. I thought Eli sent a patch to put this > there already? If not then put this in the get_stats callout as one of > the iser custom values. > Nevermind I see the other mail. From sean.hefty at intel.com Wed Apr 30 09:38:13 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 30 Apr 2008 09:38:13 -0700 Subject: [ofa-general] beginner resources In-Reply-To: <6978b4af0804230620p560c33c5hfa8385a57bbed80c@mail.gmail.com> References: <6978b4af0804230620p560c33c5hfa8385a57bbed80c@mail.gmail.com> Message-ID: <000a01c8aae0$95d30550$f2d8180a@amr.corp.intel.com> I had a look at the rping example and I'm trying to use Roland Dreier's examples. But my example simply doesn't work. I'm totally new to this so please bare with me. If someone has time to have a look at http://pastebin.com/m708b032c and http://pastebin.com/m13673097 It would be helpful if you explained what the problem is, and post the relevant code directly to the list. - Sean -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Wed Apr 30 09:45:55 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 30 Apr 2008 11:45:55 -0500 Subject: [ofa-general] iwarp-specific async events Message-ID: <4818A243.1090201@opengridcomputing.com> Hey Roland, I'm looking for a good way to trigger iwarp QP flushing on a normal disconnect for user mode QPs. The async event notification provider ops function is one way I can do it easily with the currently infrastructure, if we add some new event types. For example, if a fatal error occurs on a QP which causes the connection to be aborted, then the kernel driver will mark the user qp as "in error" and post a FATAL_QP event. When the app reaps that event, the libcxgb3 async event ops function will flush the user's qp. However for a normal non fatal close, no async event is posted. But one should be. The iWARP verbs specify many async event types that I think we need to add at some point. Case in point: LLP Close Complete (qp event) - The TCP connection completed and no SQ WQEs were flushed (normal close) There is a whole slew of other events. The above event, however, is key in that libcxgb3 could trigger a qp flush when this event is reaped by the application. Currently, the flushing of the QP is only triggered by fatal connections errors as described above and/or if the application tries to post on a QP that has been marked in error by the kernel. However, If the app does neither, then the flush never happens. There are other ways to tackle this cxgb3 problem: - enabling the providers to get a callback on rdma-cm event reaping. So reaping the DISCONNECTED event would cause the qp to be flushed. - I could hack this into the cxgb3 provider kernel driver so it can mark a user mode CQ with state that tells it to go flush any QPs it owns that are in error. Thus the next time the application polls, the poll logic would go flush any qps in error. I'm opting for the simplest change, which I think is adding new async events and changing the iwarp driver to post them at the right times. Thoughts? Thanks, Steve. From swise at opengridcomputing.com Wed Apr 30 09:51:14 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 30 Apr 2008 11:51:14 -0500 Subject: [ofa-general] beginner resources In-Reply-To: <000a01c8aae0$95d30550$f2d8180a@amr.corp.intel.com> References: <6978b4af0804230620p560c33c5hfa8385a57bbed80c@mail.gmail.com> <000a01c8aae0$95d30550$f2d8180a@amr.corp.intel.com> Message-ID: <4818A382.9060601@opengridcomputing.com> Is it time for someone to write an RDMA programming book? Do we have enough buyers yet? :) Steve. Sean Hefty wrote: > I had a look at the rping example and I'm trying to use Roland Dreier's examples. > > But my example simply doesn't work. I'm totally new to this so please bare with me. > > If someone has time to have a look at http://pastebin.com/m708b032c and http://pastebin.com/m13673097 > > > > > It would be helpful if you explained what the problem is, and post the relevant code directly to the list. > > - Sean > ------------------------------------------------------------------------ > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Wed Apr 30 10:13:15 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 30 Apr 2008 10:13:15 -0700 Subject: [ofa-general] Re: iwarp-specific async events In-Reply-To: <4818A243.1090201@opengridcomputing.com> (Steve Wise's message of "Wed, 30 Apr 2008 11:45:55 -0500") References: <4818A243.1090201@opengridcomputing.com> Message-ID: > I'm looking for a good way to trigger iwarp QP flushing on a normal > disconnect for user mode QPs. The async event notification provider > ops function is one way I can do it easily with the currently > infrastructure, if we add some new event types. For example, if a > fatal error occurs on a QP which causes the connection to be aborted, > then the kernel driver will mark the user qp as "in error" and post a > FATAL_QP event. When the app reaps that event, the libcxgb3 async > event ops function will flush the user's qp. However for a normal non > fatal close, no async event is posted. But one should be. The iWARP > verbs specify many async event types that I think we need to add at > some point. Case in point: > > LLP Close Complete (qp event) - The TCP connection completed and no > SQ WQEs were flushed (normal close) Yeah, it makes sense just to add any iWARP events that make sense and don't fit the existing set of IB events. We already have IB-specific stuff for path migration etc. > There is a whole slew of other events. The above event, however, is > key in that libcxgb3 could trigger a qp flush when this event is > reaped by the application. Currently, the flushing of the QP is only > triggered by fatal connections errors as described above and/or if the > application tries to post on a QP that has been marked in error by the > kernel. However, If the app does neither, then the flush never > happens. On the other hand, how does cxgb3 know when an application has reaped the event? Do we need to add code to the uverbs module to know when an async event has reached userspace? - R. From ramachandra.kuchimanchi at qlogic.com Wed Apr 30 10:15:52 2008 From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K) Date: Wed, 30 Apr 2008 22:45:52 +0530 Subject: [ofa-general] [PATCH 00/13] QLogic Virtual NIC (VNIC) Driver Message-ID: <20080430171028.31725.86190.stgit@localhost.localdomain> Roland, This is the QLogic Virtual NIC driver patch series which has been tested against your for-2.6.26 and for-2.6.27 branches. We intended these patches to make it to the 2.6.26 kernel, but if it is too late for the 2.6.26 merge window please consider them for 2.6.27. This patch series adds the QLogic Virtual NIC (VNIC) driver which works in conjunction with the the QLogic Ethernet Virtual I/O Controller (EVIC) hardware. The VNIC driver along with the QLogic EVIC's two 10 Gigabit ethernet ports, enables Infiniband clusters to connect to Ethernet networks. This driver also works with the earlier version of the I/O Controller, the VEx. The QLogic VNIC driver creates virtual ethernet interfaces and tunnels the Ethernet data to/from the EVIC over Infiniband using an Infiniband reliable connection. The driver compiles cleanly with sparse endianness checking enabled. We have also tested the driver with lockdep checking enabled. We have run these patches through checkpatch.pl and the only warnings are related to lines slightly longer than 80 columns in some of the statements. The driver itself has has been tested with long duration iperf, netperf TCP, UDP streams. --- [PATCH 01/13] QLogic VNIC: Driver - netdev implementation [PATCH 02/13] QLogic VNIC: Netpath - abstraction of connection to EVIC/VEx [PATCH 03/13] QLogic VNIC: Implementation of communication protocol with EVIC/VEx [PATCH 04/13] QLogic VNIC: Implementation of Control path of communication protocol [PATCH 05/13] QLogic VNIC: Implementation of Data path of communication protocol [PATCH 06/13] QLogic VNIC: IB core stack interaction [PATCH 07/13] QLogic VNIC: Handling configurable parameters of the driver [PATCH 08/13] QLogic VNIC: sysfs interface implementation for the driver [PATCH 09/13] QLogic VNIC: IB Multicast for Ethernet broadcast/multicast [PATCH 10/13] QLogic VNIC: Driver Statistics collection [PATCH 11/13] QLogic VNIC: Driver utility file - implements various utility macros [PATCH 12/13] QLogic VNIC: Driver Kconfig and Makefile. [PATCH 13/13] QLogic VNIC: Modifications to IB Kconfig and Makefile drivers/infiniband/Kconfig | 2 drivers/infiniband/Makefile | 1 drivers/infiniband/ulp/qlgc_vnic/Kconfig | 28 drivers/infiniband/ulp/qlgc_vnic/Makefile | 13 drivers/infiniband/ulp/qlgc_vnic/vnic_config.c | 380 +++ drivers/infiniband/ulp/qlgc_vnic/vnic_config.h | 242 ++ drivers/infiniband/ulp/qlgc_vnic/vnic_control.c | 2288 ++++++++++++++++++++ drivers/infiniband/ulp/qlgc_vnic/vnic_control.h | 180 ++ .../infiniband/ulp/qlgc_vnic/vnic_control_pkt.h | 368 +++ drivers/infiniband/ulp/qlgc_vnic/vnic_data.c | 1473 +++++++++++++ drivers/infiniband/ulp/qlgc_vnic/vnic_data.h | 206 ++ drivers/infiniband/ulp/qlgc_vnic/vnic_ib.c | 1046 +++++++++ drivers/infiniband/ulp/qlgc_vnic/vnic_ib.h | 206 ++ drivers/infiniband/ulp/qlgc_vnic/vnic_main.c | 1052 +++++++++ drivers/infiniband/ulp/qlgc_vnic/vnic_main.h | 167 + drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.c | 332 +++ drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.h | 76 + drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.c | 112 + drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.h | 80 + drivers/infiniband/ulp/qlgc_vnic/vnic_stats.c | 234 ++ drivers/infiniband/ulp/qlgc_vnic/vnic_stats.h | 497 ++++ drivers/infiniband/ulp/qlgc_vnic/vnic_sys.c | 1127 ++++++++++ drivers/infiniband/ulp/qlgc_vnic/vnic_sys.h | 62 + drivers/infiniband/ulp/qlgc_vnic/vnic_trailer.h | 103 + drivers/infiniband/ulp/qlgc_vnic/vnic_util.h | 251 ++ drivers/infiniband/ulp/qlgc_vnic/vnic_viport.c | 1233 +++++++++++ drivers/infiniband/ulp/qlgc_vnic/vnic_viport.h | 176 ++ 27 files changed, 11935 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/ulp/qlgc_vnic/Kconfig create mode 100644 drivers/infiniband/ulp/qlgc_vnic/Makefile create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_config.c create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_config.h create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_control.c create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_control.h create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_control_pkt.h create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_data.c create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_data.h create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_ib.c create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_ib.h create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_main.c create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_main.h create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.c create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.h create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.c create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.h create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_stats.c create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_stats.h create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_sys.c create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_sys.h create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_trailer.h create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_util.h create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_viport.c create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_viport.h -- Regards, Ram From ramachandra.kuchimanchi at qlogic.com Wed Apr 30 10:16:24 2008 From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K) Date: Wed, 30 Apr 2008 22:46:24 +0530 Subject: [ofa-general] [PATCH 01/13] QLogic VNIC: Driver - netdev implementation In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain> References: <20080430171028.31725.86190.stgit@localhost.localdomain> Message-ID: <20080430171624.31725.98475.stgit@localhost.localdomain> From: Ramachandra K QLogic Virtual NIC Driver. This patch implements netdev registration, netdev functions and state maintenance of the QLogic Virtual NIC corresponding to the various events associated with the QLogic Ethernet Virtual I/O Controller (EVIC/VEx) connection. Signed-off-by: Poornima Kamath Signed-off-by: Amar Mudrankit --- drivers/infiniband/ulp/qlgc_vnic/vnic_main.c | 1052 ++++++++++++++++++++++++++ drivers/infiniband/ulp/qlgc_vnic/vnic_main.h | 167 ++++ 2 files changed, 1219 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_main.c create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_main.h diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_main.c b/drivers/infiniband/ulp/qlgc_vnic/vnic_main.c new file mode 100644 index 0000000..393c79a --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_main.c @@ -0,0 +1,1052 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "vnic_util.h" +#include "vnic_main.h" +#include "vnic_netpath.h" +#include "vnic_viport.h" +#include "vnic_ib.h" +#include "vnic_stats.h" + +#define MODULEVERSION "1.3.0.0.4" +#define MODULEDETAILS \ + "QLogic Corp. Virtual NIC (VNIC) driver version " MODULEVERSION + +MODULE_AUTHOR("QLogic Corp."); +MODULE_DESCRIPTION(MODULEDETAILS); +MODULE_LICENSE("Dual BSD/GPL"); +MODULE_SUPPORTED_DEVICE("QLogic Ethernet Virtual I/O Controller"); + +u32 vnic_debug; + +module_param(vnic_debug, uint, 0444); +MODULE_PARM_DESC(vnic_debug, "Enable debug tracing if > 0"); + +LIST_HEAD(vnic_list); + +static DECLARE_WAIT_QUEUE_HEAD(vnic_npevent_queue); +static LIST_HEAD(vnic_npevent_list); +static DECLARE_COMPLETION(vnic_npevent_thread_exit); +static spinlock_t vnic_npevent_list_lock; +static struct task_struct *vnic_npevent_thread; +static int vnic_npevent_thread_end; + + +void vnic_connected(struct vnic *vnic, struct netpath *netpath) +{ + VNIC_FUNCTION("vnic_connected()\n"); + if (netpath->second_bias) + vnic_npevent_queue_evt(netpath, VNIC_SECNP_CONNECTED); + else + vnic_npevent_queue_evt(netpath, VNIC_PRINP_CONNECTED); + + vnic_connected_stats(vnic); +} + +void vnic_disconnected(struct vnic *vnic, struct netpath *netpath) +{ + VNIC_FUNCTION("vnic_disconnected()\n"); + if (netpath->second_bias) + vnic_npevent_queue_evt(netpath, VNIC_SECNP_DISCONNECTED); + else + vnic_npevent_queue_evt(netpath, VNIC_PRINP_DISCONNECTED); +} + +void vnic_link_up(struct vnic *vnic, struct netpath *netpath) +{ + VNIC_FUNCTION("vnic_link_up()\n"); + if (netpath->second_bias) + vnic_npevent_queue_evt(netpath, VNIC_SECNP_LINKUP); + else + vnic_npevent_queue_evt(netpath, VNIC_PRINP_LINKUP); +} + +void vnic_link_down(struct vnic *vnic, struct netpath *netpath) +{ + VNIC_FUNCTION("vnic_link_down()\n"); + if (netpath->second_bias) + vnic_npevent_queue_evt(netpath, VNIC_SECNP_LINKDOWN); + else + vnic_npevent_queue_evt(netpath, VNIC_PRINP_LINKDOWN); +} + +void vnic_stop_xmit(struct vnic *vnic, struct netpath *netpath) +{ + VNIC_FUNCTION("vnic_stop_xmit()\n"); + if (netpath == vnic->current_path) { + if (vnic->xmit_started) { + netif_stop_queue(vnic->netdevice); + vnic->xmit_started = 0; + } + + vnic_stop_xmit_stats(vnic); + } +} + +void vnic_restart_xmit(struct vnic *vnic, struct netpath *netpath) +{ + VNIC_FUNCTION("vnic_restart_xmit()\n"); + if (netpath == vnic->current_path) { + if (!vnic->xmit_started) { + netif_wake_queue(vnic->netdevice); + vnic->xmit_started = 1; + } + + vnic_restart_xmit_stats(vnic); + } +} + +void vnic_recv_packet(struct vnic *vnic, struct netpath *netpath, + struct sk_buff *skb) +{ + VNIC_FUNCTION("vnic_recv_packet()\n"); + if ((netpath != vnic->current_path) || !vnic->open) { + VNIC_INFO("tossing packet\n"); + dev_kfree_skb(skb); + return; + } + + vnic->netdevice->last_rx = jiffies; + skb->dev = vnic->netdevice; + skb->protocol = eth_type_trans(skb, skb->dev); + if (!vnic->config->use_rx_csum) + skb->ip_summed = CHECKSUM_NONE; + netif_rx(skb); + vnic_recv_pkt_stats(vnic); +} + +static struct net_device_stats *vnic_get_stats(struct net_device *device) +{ + struct vnic *vnic; + struct netpath *np; + + VNIC_FUNCTION("vnic_get_stats()\n"); + vnic = (struct vnic *)device->priv; + + np = vnic->current_path; + if (np && np->viport && !np->cleanup_started) + viport_get_stats(np->viport, &vnic->stats); + return &vnic->stats; +} + +static int vnic_open(struct net_device *device) +{ + struct vnic *vnic; + + VNIC_FUNCTION("vnic_open()\n"); + vnic = (struct vnic *)device->priv; + + vnic->open++; + vnic_npevent_queue_evt(&vnic->primary_path, VNIC_PRINP_SETLINK); + vnic->xmit_started = 1; + netif_start_queue(vnic->netdevice); + + return 0; +} + +static int vnic_stop(struct net_device *device) +{ + struct vnic *vnic; + int ret = 0; + + VNIC_FUNCTION("vnic_stop()\n"); + vnic = (struct vnic *)device->priv; + netif_stop_queue(device); + vnic->xmit_started = 0; + vnic->open--; + vnic_npevent_queue_evt(&vnic->primary_path, VNIC_PRINP_SETLINK); + + return ret; +} + +static int vnic_hard_start_xmit(struct sk_buff *skb, + struct net_device *device) +{ + struct vnic *vnic; + struct netpath *np; + cycles_t xmit_time; + int ret = -1; + + VNIC_FUNCTION("vnic_hard_start_xmit()\n"); + vnic = (struct vnic *)device->priv; + np = vnic->current_path; + + vnic_pre_pkt_xmit_stats(&xmit_time); + + if (np && np->viport) + ret = viport_xmit_packet(np->viport, skb); + + if (ret) { + vnic_xmit_fail_stats(vnic); + dev_kfree_skb_any(skb); + vnic->stats.tx_dropped++; + goto out; + } + + device->trans_start = jiffies; + vnic_post_pkt_xmit_stats(vnic, xmit_time); +out: + return 0; +} + +static void vnic_tx_timeout(struct net_device *device) +{ + struct vnic *vnic; + + VNIC_FUNCTION("vnic_tx_timeout()\n"); + vnic = (struct vnic *)device->priv; + device->trans_start = jiffies; + + if (vnic->current_path->viport) + viport_failure(vnic->current_path->viport); + + VNIC_ERROR("vnic_tx_timeout\n"); +} + +static void vnic_set_multicast_list(struct net_device *device) +{ + struct vnic *vnic; + unsigned long flags; + + VNIC_FUNCTION("vnic_set_multicast_list()\n"); + vnic = (struct vnic *)device->priv; + + spin_lock_irqsave(&vnic->lock, flags); + if (device->mc_count == 0) { + if (vnic->mc_list_len) { + vnic->mc_list_len = vnic->mc_count = 0; + kfree(vnic->mc_list); + } + } else { + struct dev_mc_list *mc_list = device->mc_list; + int i; + + if (device->mc_count > vnic->mc_list_len) { + if (vnic->mc_list_len) + kfree(vnic->mc_list); + vnic->mc_list_len = device->mc_count + 10; + vnic->mc_list = kmalloc(vnic->mc_list_len * + sizeof *mc_list, GFP_ATOMIC); + if (!vnic->mc_list) { + vnic->mc_list_len = vnic->mc_count = 0; + VNIC_ERROR("failed allocating mc_list\n"); + goto failure; + } + } + vnic->mc_count = device->mc_count; + for (i = 0; i < device->mc_count; i++) { + vnic->mc_list[i] = *mc_list; + vnic->mc_list[i].next = &vnic->mc_list[i + 1]; + mc_list = mc_list->next; + } + } + spin_unlock_irqrestore(&vnic->lock, flags); + + if (vnic->primary_path.viport) + viport_set_multicast(vnic->primary_path.viport, + vnic->mc_list, vnic->mc_count); + + if (vnic->secondary_path.viport) + viport_set_multicast(vnic->secondary_path.viport, + vnic->mc_list, vnic->mc_count); + + vnic_npevent_queue_evt(&vnic->primary_path, VNIC_PRINP_SETLINK); + return; +failure: + spin_unlock_irqrestore(&vnic->lock, flags); +} + +/** + * Following set of functions queues up the events for EVIC and the + * kernel thread queuing up the event might return. + */ +static int vnic_set_mac_address(struct net_device *device, void *addr) +{ + struct vnic *vnic; + struct sockaddr *sockaddr = addr; + u8 *address; + int ret = -1; + + VNIC_FUNCTION("vnic_set_mac_address()\n"); + vnic = (struct vnic *)device->priv; + + if (!is_valid_ether_addr(sockaddr->sa_data)) + return -EADDRNOTAVAIL; + + if (netif_running(device)) + return -EBUSY; + + memcpy(device->dev_addr, sockaddr->sa_data, ETH_ALEN); + address = sockaddr->sa_data; + + if (vnic->primary_path.viport) + ret = viport_set_unicast(vnic->primary_path.viport, + address); + + if (ret) + return ret; + + if (vnic->secondary_path.viport) + viport_set_unicast(vnic->secondary_path.viport, address); + + vnic->mac_set = 1; + return 0; +} + +static int vnic_change_mtu(struct net_device *device, int mtu) +{ + struct vnic *vnic; + int ret = 0; + int pri_max_mtu; + int sec_max_mtu; + + VNIC_FUNCTION("vnic_change_mtu()\n"); + vnic = (struct vnic *)device->priv; + + if (vnic->primary_path.viport) + pri_max_mtu = viport_max_mtu(vnic->primary_path.viport); + else + pri_max_mtu = MAX_PARAM_VALUE; + + if (vnic->secondary_path.viport) + sec_max_mtu = viport_max_mtu(vnic->secondary_path.viport); + else + sec_max_mtu = MAX_PARAM_VALUE; + + if ((mtu < pri_max_mtu) && (mtu < sec_max_mtu)) { + device->mtu = mtu; + vnic_npevent_queue_evt(&vnic->primary_path, + VNIC_PRINP_SETLINK); + vnic_npevent_queue_evt(&vnic->secondary_path, + VNIC_SECNP_SETLINK); + } else if (pri_max_mtu < sec_max_mtu) + printk(KERN_WARNING PFX "%s: Maximum " + "supported MTU size is %d. " + "Cannot set MTU to %d\n", + vnic->config->name, pri_max_mtu, mtu); + else + printk(KERN_WARNING PFX "%s: Maximum " + "supported MTU size is %d. " + "Cannot set MTU to %d\n", + vnic->config->name, sec_max_mtu, mtu); + + return ret; +} + +static int vnic_npevent_register(struct vnic *vnic, struct netpath *netpath) +{ + u8 *address; + int ret; + + if (!vnic->mac_set) { + /* if netpath == secondary_path, then the primary path isn't + * connected. MAC address will be set when the primary + * connects. + */ + netpath_get_hw_addr(netpath, vnic->netdevice->dev_addr); + address = vnic->netdevice->dev_addr; + + if (vnic->secondary_path.viport) + viport_set_unicast(vnic->secondary_path.viport, + address); + + vnic->mac_set = 1; + } + ret = register_netdev(vnic->netdevice); + if (ret) { + printk(KERN_ERR PFX "%s failed registering netdev " + "error %d - calling viport_failure\n", + config_viport_name(vnic->primary_path.viport->config), + ret); + vnic_free(vnic); + printk(KERN_ERR PFX "%s DELETED : register_netdev failure\n", + config_viport_name(vnic->primary_path.viport->config)); + return ret; + } + + vnic->state = VNIC_REGISTERED; + vnic->carrier = 2; /*special value to force netif_carrier_(on|off)*/ + return 0; +} + +static void vnic_npevent_dequeue_all(struct vnic *vnic) +{ + unsigned long flags; + struct vnic_npevent *npevt, *tmp; + + spin_lock_irqsave(&vnic_npevent_list_lock, flags); + if (list_empty(&vnic_npevent_list)) + goto out; + list_for_each_entry_safe(npevt, tmp, &vnic_npevent_list, + list_ptrs) { + if ((npevt->vnic == vnic)) { + list_del(&npevt->list_ptrs); + kfree(npevt); + } + } +out: + spin_unlock_irqrestore(&vnic_npevent_list_lock, flags); +} + +static void update_path_and_reconnect(struct netpath *netpath, + struct vnic *vnic) +{ + struct viport_config *config = netpath->viport->config; + int delay = 1; + + if (vnic_ib_get_path(netpath, vnic)) + return; + /* + * tell viport_connect to wait for default_no_path_timeout + * before connecting if we are retrying the same path index + * within default_no_path_timeout. + * This prevents flooding connect requests to a path (or set + * of paths) that aren't successfully connecting for some reason. + */ + if (jiffies > netpath->connect_time + + vnic->config->no_path_timeout) { + netpath->path_idx = config->path_idx; + netpath->connect_time = jiffies; + netpath->delay_reconnect = 0; + delay = 0; + } else if (config->path_idx != netpath->path_idx) { + delay = netpath->delay_reconnect; + netpath->path_idx = config->path_idx; + netpath->delay_reconnect = 1; + } else + delay = 1; + viport_connect(netpath->viport, delay); +} + +static void vnic_set_uni_multicast(struct vnic *vnic, + struct netpath *netpath) +{ + unsigned long flags; + u8 *address; + + if (vnic->mac_set) { + address = vnic->netdevice->dev_addr; + + if (netpath->viport) + viport_set_unicast(netpath->viport, address); + } + spin_lock_irqsave(&vnic->lock, flags); + + if (vnic->mc_list && netpath->viport) + viport_set_multicast(netpath->viport, vnic->mc_list, + vnic->mc_count); + + spin_unlock_irqrestore(&vnic->lock, flags); + if (vnic->state == VNIC_REGISTERED) { + if (!netpath->viport) + return; + viport_set_link(netpath->viport, + vnic->netdevice->flags & ~IFF_UP, + vnic->netdevice->mtu); + } +} + +static void vnic_set_netpath_timers(struct vnic *vnic, + struct netpath *netpath) +{ + switch (netpath->timer_state) { + case NETPATH_TS_IDLE: + netpath->timer_state = NETPATH_TS_ACTIVE; + if (vnic->state == VNIC_UNINITIALIZED) + netpath_timer(netpath, + vnic->config-> + primary_connect_timeout); + else + netpath_timer(netpath, + vnic->config-> + primary_reconnect_timeout); + break; + case NETPATH_TS_ACTIVE: + /*nothing to do*/ + break; + case NETPATH_TS_EXPIRED: + if (vnic->state == VNIC_UNINITIALIZED) + vnic_npevent_register(vnic, netpath); + + break; + } +} + +static void vnic_check_primary_path_timer(struct vnic *vnic) +{ + switch (vnic->primary_path.timer_state) { + case NETPATH_TS_ACTIVE: + /* nothing to do. just wait */ + break; + case NETPATH_TS_IDLE: + netpath_timer(&vnic->primary_path, + vnic->config-> + primary_switch_timeout); + break; + case NETPATH_TS_EXPIRED: + printk(KERN_INFO PFX + "%s: switching to primary path\n", + vnic->config->name); + + vnic->current_path = &vnic->primary_path; + if (vnic->config->use_tx_csum + && netpath_can_tx_csum(vnic-> + current_path)) { + vnic->netdevice->features |= + NETIF_F_IP_CSUM; + } + break; + } +} + +static void vnic_carrier_loss(struct vnic *vnic, + struct netpath *last_path) +{ + if (vnic->primary_path.carrier) { + vnic->carrier = 1; + vnic->current_path = &vnic->primary_path; + + if (last_path && last_path != vnic->current_path) + printk(KERN_INFO PFX + "%s: failing over to primary path\n", + vnic->config->name); + else if (!last_path) + printk(KERN_INFO PFX "%s: using primary path\n", + vnic->config->name); + + if (vnic->config->use_tx_csum && + netpath_can_tx_csum(vnic->current_path)) + vnic->netdevice->features |= NETIF_F_IP_CSUM; + + } else if ((vnic->secondary_path.carrier) && + (vnic->secondary_path.timer_state != NETPATH_TS_ACTIVE)) { + vnic->carrier = 1; + vnic->current_path = &vnic->secondary_path; + + if (last_path && last_path != vnic->current_path) + printk(KERN_INFO PFX + "%s: failing over to secondary path\n", + vnic->config->name); + else if (!last_path) + printk(KERN_INFO PFX "%s: using secondary path\n", + vnic->config->name); + + if (vnic->config->use_tx_csum && + netpath_can_tx_csum(vnic->current_path)) + vnic->netdevice->features |= NETIF_F_IP_CSUM; + + } + +} + +static void vnic_handle_path_change(struct vnic *vnic, + struct netpath **path) +{ + struct netpath *last_path = *path; + + if (!last_path) { + if (vnic->current_path == &vnic->primary_path) + last_path = &vnic->secondary_path; + else + last_path = &vnic->primary_path; + + } + + if (vnic->current_path && vnic->current_path->viport) + viport_set_link(vnic->current_path->viport, + vnic->netdevice->flags, + vnic->netdevice->mtu); + + if (last_path->viport) + viport_set_link(last_path->viport, + vnic->netdevice->flags & + ~IFF_UP, vnic->netdevice->mtu); + + vnic_restart_xmit(vnic, vnic->current_path); +} + +static void vnic_report_path_change(struct vnic *vnic, + struct netpath *last_path, + int other_path_ok) +{ + if (!vnic->current_path) { + if (last_path == &vnic->primary_path) + printk(KERN_INFO PFX "%s: primary path lost, " + "no failover path available\n", + vnic->config->name); + else + printk(KERN_INFO PFX "%s: secondary path lost, " + "no failover path available\n", + vnic->config->name); + return; + } + + if (last_path != vnic->current_path) + return; + + if (vnic->current_path == &vnic->secondary_path) { + if (other_path_ok != vnic->primary_path.carrier) { + if (other_path_ok) + printk(KERN_INFO PFX "%s: primary path no" + " longer available for failover\n", + vnic->config->name); + else + printk(KERN_INFO PFX "%s: primary path now" + " available for failover\n", + vnic->config->name); + } + } else { + if (other_path_ok != vnic->secondary_path.carrier) { + if (other_path_ok) + printk(KERN_INFO PFX "%s: secondary path no" + " longer available for failover\n", + vnic->config->name); + else + printk(KERN_INFO PFX "%s: secondary path now" + " available for failover\n", + vnic->config->name); + } + } +} + +static void vnic_handle_free_vnic_evt(struct vnic *vnic) +{ + netpath_timer_stop(&vnic->primary_path); + netpath_timer_stop(&vnic->secondary_path); + vnic->current_path = NULL; + netpath_free(&vnic->primary_path); + netpath_free(&vnic->secondary_path); + if (vnic->state == VNIC_REGISTERED) { + unregister_netdev(vnic->netdevice); + free_netdev(vnic->netdevice); + } + vnic_npevent_dequeue_all(vnic); + kfree(vnic->config); + if (vnic->mc_list_len) { + vnic->mc_list_len = vnic->mc_count = 0; + kfree(vnic->mc_list); + } + + sysfs_remove_group(&vnic->dev_info.dev.kobj, + &vnic_dev_attr_group); + vnic_cleanup_stats_files(vnic); + device_unregister(&vnic->dev_info.dev); + wait_for_completion(&vnic->dev_info.released); +} + +static struct vnic *vnic_handle_npevent(struct vnic *vnic, + enum vnic_npevent_type npevt_type) +{ + struct netpath *netpath; + const char *netpath_str; + + if (npevt_type <= VNIC_PRINP_LASTTYPE) + netpath_str = netpath_to_string(vnic, &vnic->primary_path); + else if (npevt_type <= VNIC_SECNP_LASTTYPE) + netpath_str = netpath_to_string(vnic, &vnic->secondary_path); + else + netpath_str = netpath_to_string(vnic, vnic->current_path); + + VNIC_INFO("%s: processing %s, netpath=%s, carrier=%d\n", + vnic->config->name, vnic_npevent_str[npevt_type], + netpath_str, vnic->carrier); + + switch (npevt_type) { + case VNIC_PRINP_CONNECTED: + netpath = &vnic->primary_path; + if (vnic->state == VNIC_UNINITIALIZED) { + if (vnic_npevent_register(vnic, netpath)) + break; + } + vnic_set_uni_multicast(vnic, netpath); + break; + case VNIC_SECNP_CONNECTED: + vnic_set_uni_multicast(vnic, &vnic->secondary_path); + break; + case VNIC_PRINP_TIMEREXPIRED: + netpath = &vnic->primary_path; + netpath->timer_state = NETPATH_TS_EXPIRED; + if (!netpath->carrier) + update_path_and_reconnect(netpath, vnic); + break; + case VNIC_SECNP_TIMEREXPIRED: + netpath = &vnic->secondary_path; + netpath->timer_state = NETPATH_TS_EXPIRED; + if (!netpath->carrier) + update_path_and_reconnect(netpath, vnic); + else { + if (vnic->state == VNIC_UNINITIALIZED) + vnic_npevent_register(vnic, netpath); + } + break; + case VNIC_PRINP_LINKUP: + vnic->primary_path.carrier = 1; + break; + case VNIC_SECNP_LINKUP: + netpath = &vnic->secondary_path; + netpath->carrier = 1; + if (!vnic->carrier) + vnic_set_netpath_timers(vnic, netpath); + break; + case VNIC_PRINP_LINKDOWN: + vnic->primary_path.carrier = 0; + break; + case VNIC_SECNP_LINKDOWN: + if (vnic->state == VNIC_UNINITIALIZED) + netpath_timer_stop(&vnic->secondary_path); + vnic->secondary_path.carrier = 0; + break; + case VNIC_PRINP_DISCONNECTED: + netpath = &vnic->primary_path; + netpath_timer_stop(netpath); + netpath->carrier = 0; + update_path_and_reconnect(netpath, vnic); + break; + case VNIC_SECNP_DISCONNECTED: + netpath = &vnic->secondary_path; + netpath_timer_stop(netpath); + netpath->carrier = 0; + update_path_and_reconnect(netpath, vnic); + break; + case VNIC_PRINP_SETLINK: + netpath = vnic->current_path; + if (!netpath || !netpath->viport) + break; + viport_set_link(netpath->viport, + vnic->netdevice->flags, + vnic->netdevice->mtu); + break; + case VNIC_SECNP_SETLINK: + netpath = &vnic->secondary_path; + if (!netpath || !netpath->viport) + break; + viport_set_link(netpath->viport, + vnic->netdevice->flags, + vnic->netdevice->mtu); + break; + case VNIC_NP_FREEVNIC: + vnic_handle_free_vnic_evt(vnic); + kfree(vnic); + vnic = NULL; + break; + } + return vnic; +} + +static int vnic_npevent_statemachine(void *context) +{ + struct vnic_npevent *vnic_link_evt; + enum vnic_npevent_type npevt_type; + struct vnic *vnic; + int last_carrier; + int other_path_ok = 0; + struct netpath *last_path; + + while (!vnic_npevent_thread_end || + !list_empty(&vnic_npevent_list)) { + unsigned long flags; + + wait_event_interruptible(vnic_npevent_queue, + !list_empty(&vnic_npevent_list) + || vnic_npevent_thread_end); + spin_lock_irqsave(&vnic_npevent_list_lock, flags); + if (list_empty(&vnic_npevent_list)) { + spin_unlock_irqrestore(&vnic_npevent_list_lock, + flags); + VNIC_INFO("netpath statemachine wake" + " on empty list\n"); + continue; + } + + vnic_link_evt = list_entry(vnic_npevent_list.next, + struct vnic_npevent, + list_ptrs); + list_del(&vnic_link_evt->list_ptrs); + spin_unlock_irqrestore(&vnic_npevent_list_lock, flags); + vnic = vnic_link_evt->vnic; + npevt_type = vnic_link_evt->event_type; + kfree(vnic_link_evt); + + if (vnic->current_path == &vnic->secondary_path) + other_path_ok = vnic->primary_path.carrier; + else if (vnic->current_path == &vnic->primary_path) + other_path_ok = vnic->secondary_path.carrier; + + vnic = vnic_handle_npevent(vnic, npevt_type); + + if (!vnic) + continue; + + last_carrier = vnic->carrier; + last_path = vnic->current_path; + + if (!vnic->current_path || + !vnic->current_path->carrier) { + vnic->carrier = 0; + vnic->current_path = NULL; + vnic->netdevice->features &= ~NETIF_F_IP_CSUM; + } + + if (!vnic->carrier) + vnic_carrier_loss(vnic, last_path); + else if ((vnic->current_path != &vnic->primary_path) && + (vnic->config->prefer_primary) && + (vnic->primary_path.carrier)) + vnic_check_primary_path_timer(vnic); + + if (last_path) + vnic_report_path_change(vnic, last_path, + other_path_ok); + + VNIC_INFO("new netpath=%s, carrier=%d\n", + netpath_to_string(vnic, vnic->current_path), + vnic->carrier); + + if (vnic->current_path != last_path) + vnic_handle_path_change(vnic, &last_path); + + if (vnic->carrier != last_carrier) { + if (vnic->carrier) { + VNIC_INFO("netif_carrier_on\n"); + netif_carrier_on(vnic->netdevice); + vnic_carrier_loss_stats(vnic); + } else { + VNIC_INFO("netif_carrier_off\n"); + netif_carrier_off(vnic->netdevice); + vnic_disconn_stats(vnic); + } + + } + } + complete_and_exit(&vnic_npevent_thread_exit, 0); + return 0; +} + +void vnic_npevent_queue_evt(struct netpath *netpath, + enum vnic_npevent_type evt) +{ + struct vnic_npevent *npevent; + unsigned long flags; + + npevent = kmalloc(sizeof *npevent, GFP_ATOMIC); + if (!npevent) { + VNIC_ERROR("Could not allocate memory for vnic event\n"); + return; + } + npevent->vnic = netpath->parent; + npevent->event_type = evt; + INIT_LIST_HEAD(&npevent->list_ptrs); + spin_lock_irqsave(&vnic_npevent_list_lock, flags); + list_add_tail(&npevent->list_ptrs, &vnic_npevent_list); + spin_unlock_irqrestore(&vnic_npevent_list_lock, flags); + wake_up(&vnic_npevent_queue); +} + +void vnic_npevent_dequeue_evt(struct netpath *netpath, + enum vnic_npevent_type evt) +{ + unsigned long flags; + struct vnic_npevent *npevt, *tmp; + struct vnic *vnic = netpath->parent; + + spin_lock_irqsave(&vnic_npevent_list_lock, flags); + if (list_empty(&vnic_npevent_list)) + goto out; + list_for_each_entry_safe(npevt, tmp, &vnic_npevent_list, + list_ptrs) { + if ((npevt->vnic == vnic) && + (npevt->event_type == evt)) { + list_del(&npevt->list_ptrs); + kfree(npevt); + break; + } + } +out: + spin_unlock_irqrestore(&vnic_npevent_list_lock, flags); +} + +static int vnic_npevent_start(void) +{ + VNIC_FUNCTION("vnic_npevent_start()\n"); + + spin_lock_init(&vnic_npevent_list_lock); + vnic_npevent_thread = kthread_run(vnic_npevent_statemachine, NULL, + "qlgc_vnic_npevent_s_m"); + if (IS_ERR(vnic_npevent_thread)) { + printk(KERN_WARNING PFX "failed to create vnic npevent" + " thread; error %d\n", + (int) PTR_ERR(vnic_npevent_thread)); + vnic_npevent_thread = NULL; + return 1; + } + + return 0; +} + +void vnic_npevent_cleanup(void) +{ + if (vnic_npevent_thread) { + vnic_npevent_thread_end = 1; + wake_up(&vnic_npevent_queue); + wait_for_completion(&vnic_npevent_thread_exit); + vnic_npevent_thread = NULL; + } +} + +static void vnic_setup(struct net_device *device) +{ + ether_setup(device); + + /* ether_setup is used to fill + * device parameters for ethernet devices. + * We override some of the parameters + * which are specific to VNIC. + */ + device->get_stats = vnic_get_stats; + device->open = vnic_open; + device->stop = vnic_stop; + device->hard_start_xmit = vnic_hard_start_xmit; + device->tx_timeout = vnic_tx_timeout; + device->set_multicast_list = vnic_set_multicast_list; + device->set_mac_address = vnic_set_mac_address; + device->change_mtu = vnic_change_mtu; + device->watchdog_timeo = 10 * HZ; + device->features = 0; +} + +struct vnic *vnic_allocate(struct vnic_config *config) +{ + struct vnic *vnic = NULL; + + VNIC_FUNCTION("vnic_allocate()\n"); + vnic = kzalloc(sizeof *vnic, GFP_KERNEL); + if (!vnic) { + VNIC_ERROR("failed allocating vnic structure\n"); + return NULL; + } + + spin_lock_init(&vnic->lock); + vnic_alloc_stats(vnic); + vnic->state = VNIC_UNINITIALIZED; + vnic->config = config; + + /* Allocating a VNIC network device. + * The private data structure for VNIC will be taken care by the + * VNIC driver, hence setting size of private data structure to 0. + */ + vnic->netdevice = alloc_netdev((int) 0, config->name, vnic_setup); + vnic->netdevice->priv = (void *)vnic; + + netpath_init(&vnic->primary_path, vnic, 0); + netpath_init(&vnic->secondary_path, vnic, 1); + + vnic->current_path = NULL; + + list_add_tail(&vnic->list_ptrs, &vnic_list); + + return vnic; +} + +void vnic_free(struct vnic *vnic) +{ + VNIC_FUNCTION("vnic_free()\n"); + list_del(&vnic->list_ptrs); + vnic_npevent_queue_evt(&vnic->primary_path, VNIC_NP_FREEVNIC); +} + +static void __exit vnic_cleanup(void) +{ + VNIC_FUNCTION("vnic_cleanup()\n"); + + VNIC_INIT("unloading %s\n", MODULEDETAILS); + + while (!list_empty(&vnic_list)) { + struct vnic *vnic = + list_entry(vnic_list.next, struct vnic, list_ptrs); + vnic_free(vnic); + } + + vnic_npevent_cleanup(); + viport_cleanup(); + vnic_ib_cleanup(); +} + +static int __init vnic_init(void) +{ + int ret; + VNIC_FUNCTION("vnic_init()\n"); + VNIC_INIT("Initializing %s\n", MODULEDETAILS); + + ret = config_start(); + if (ret) { + VNIC_ERROR("config_start failed\n"); + goto failure; + } + + ret = vnic_ib_init(); + if (ret) { + VNIC_ERROR("ib_start failed\n"); + goto failure; + } + + ret = viport_start(); + if (ret) { + VNIC_ERROR("viport_start failed\n"); + goto failure; + } + + ret = vnic_npevent_start(); + if (ret) { + VNIC_ERROR("vnic_npevent_start failed\n"); + goto failure; + } + + return 0; +failure: + vnic_cleanup(); + return ret; +} + +module_init(vnic_init); +module_exit(vnic_cleanup); diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_main.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_main.h new file mode 100644 index 0000000..c5ccd8b --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_main.h @@ -0,0 +1,167 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_MAIN_H_INCLUDED +#define VNIC_MAIN_H_INCLUDED + +#include +#include +#include +#include + +#include "vnic_config.h" +#include "vnic_netpath.h" + +extern u16 vnic_max_mtu; +extern struct list_head vnic_list; +extern struct attribute_group vnic_stats_attr_group; +extern cycles_t recv_ref; + +enum vnic_npevent_type { + VNIC_PRINP_CONNECTED = 0, + VNIC_PRINP_DISCONNECTED = 1, + VNIC_PRINP_LINKUP = 2, + VNIC_PRINP_LINKDOWN = 3, + VNIC_PRINP_TIMEREXPIRED = 4, + VNIC_PRINP_SETLINK = 5, + + /* used to figure out PRI vs SEC types for dbg msg*/ + VNIC_PRINP_LASTTYPE = VNIC_PRINP_SETLINK, + + VNIC_SECNP_CONNECTED = 6, + VNIC_SECNP_DISCONNECTED = 7, + VNIC_SECNP_LINKUP = 8, + VNIC_SECNP_LINKDOWN = 9, + VNIC_SECNP_TIMEREXPIRED = 10, + VNIC_SECNP_SETLINK = 11, + + /* used to figure out PRI vs SEC types for dbg msg*/ + VNIC_SECNP_LASTTYPE = VNIC_SECNP_SETLINK, + + VNIC_NP_FREEVNIC = 12, +}; + +/* This array should be kept next to enum above since a change to npevent_type + enum affects this array. */ +static const char *const vnic_npevent_str[] = { + "PRIMARY CONNECTED", + "PRIMARY DISCONNECTED", + "PRIMARY CARRIER", + "PRIMARY NO CARRIER", + "PRIMARY TIMER EXPIRED", + "PRIMARY SETLINK", + "SECONDARY CONNECTED", + "SECONDARY DISCONNECTED", + "SECONDARY CARRIER", + "SECONDARY NO CARRIER", + "SECONDARY TIMER EXPIRED", + "SECONDARY SETLINK", + "FREE VNIC", +}; + + +struct vnic_npevent { + struct list_head list_ptrs; + struct vnic *vnic; + enum vnic_npevent_type event_type; +}; + +void vnic_npevent_queue_evt(struct netpath *netpath, + enum vnic_npevent_type evt); +void vnic_npevent_dequeue_evt(struct netpath *netpath, + enum vnic_npevent_type evt); + +enum vnic_state { + VNIC_UNINITIALIZED = 0, + VNIC_REGISTERED = 1 +}; + +struct vnic { + struct list_head list_ptrs; + enum vnic_state state; + struct vnic_config *config; + struct netpath *current_path; + struct netpath primary_path; + struct netpath secondary_path; + int open; + int carrier; + int xmit_started; + int mac_set; + struct net_device_stats stats; + struct net_device *netdevice; + struct dev_info dev_info; + struct dev_mc_list *mc_list; + int mc_list_len; + int mc_count; + spinlock_t lock; +#ifdef CONFIG_INFINIBAND_QLGC_VNIC_STATS + struct { + cycles_t start_time; + cycles_t conn_time; + cycles_t disconn_ref; /* intermediate time */ + cycles_t disconn_time; + u32 disconn_num; + cycles_t xmit_time; + u32 xmit_num; + u32 xmit_fail; + cycles_t recv_time; + u32 recv_num; + u32 multicast_recv_num; + cycles_t xmit_ref; /* intermediate time */ + cycles_t xmit_off_time; + u32 xmit_off_num; + cycles_t carrier_ref; /* intermediate time */ + cycles_t carrier_off_time; + u32 carrier_off_num; + } statistics; + struct dev_info stat_info; +#endif /* CONFIG_INFINIBAND_QLGC_VNIC_STATS */ +}; + +struct vnic *vnic_allocate(struct vnic_config *config); + +void vnic_free(struct vnic *vnic); + +void vnic_connected(struct vnic *vnic, struct netpath *netpath); +void vnic_disconnected(struct vnic *vnic, struct netpath *netpath); + +void vnic_link_up(struct vnic *vnic, struct netpath *netpath); +void vnic_link_down(struct vnic *vnic, struct netpath *netpath); + +void vnic_stop_xmit(struct vnic *vnic, struct netpath *netpath); +void vnic_restart_xmit(struct vnic *vnic, struct netpath *netpath); + +void vnic_recv_packet(struct vnic *vnic, struct netpath *netpath, + struct sk_buff *skb); +void vnic_npevent_cleanup(void); +void completion_callback_cleanup(struct vnic_ib_conn *ib_conn); +#endif /* VNIC_MAIN_H_INCLUDED */ From ramachandra.kuchimanchi at qlogic.com Wed Apr 30 10:17:24 2008 From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K) Date: Wed, 30 Apr 2008 22:47:24 +0530 Subject: [ofa-general] [PATCH 03/13] QLogic VNIC: Implementation of communication protocol with EVIC/VEx In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain> References: <20080430171028.31725.86190.stgit@localhost.localdomain> Message-ID: <20080430171724.31725.91243.stgit@localhost.localdomain> From: Poornima Kamath Implementation of the statemachine for the protocol used while communicating with the EVIC. The patch also implements the viport abstraction which represents the virtual ethernet port on EVIC. Signed-off-by: Ramachandra K Signed-off-by: Amar Mudrankit --- drivers/infiniband/ulp/qlgc_vnic/vnic_viport.c | 1233 ++++++++++++++++++++++++ drivers/infiniband/ulp/qlgc_vnic/vnic_viport.h | 176 +++ 2 files changed, 1409 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_viport.c create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_viport.h diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_viport.c b/drivers/infiniband/ulp/qlgc_vnic/vnic_viport.c new file mode 100644 index 0000000..e44e31b --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_viport.c @@ -0,0 +1,1233 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include +#include +#include + +#include "vnic_util.h" +#include "vnic_main.h" +#include "vnic_viport.h" +#include "vnic_netpath.h" +#include "vnic_control.h" +#include "vnic_data.h" +#include "vnic_config.h" +#include "vnic_control_pkt.h" + +#define VIPORT_DISCONN_TIMER 10000 /* 10 seconds */ + +#define MAX_RETRY_INTERVAL 20000 /* 20 seconds */ +#define RETRY_INCREMENT 5000 /* 5 seconds */ +#define MAX_CONNECT_RETRY_TIMEOUT 600000 /* 10 minutes */ + +static DECLARE_WAIT_QUEUE_HEAD(viport_queue); +static LIST_HEAD(viport_list); +static DECLARE_COMPLETION(viport_thread_exit); +static spinlock_t viport_list_lock; + +static struct task_struct *viport_thread; +static int viport_thread_end; + +static void viport_timer(struct viport *viport, int timeout); + +struct viport *viport_allocate(struct viport_config *config) +{ + struct viport *viport; + + VIPORT_FUNCTION("viport_allocate()\n"); + viport = kzalloc(sizeof *viport, GFP_KERNEL); + if (!viport) { + VIPORT_ERROR("failed allocating viport structure\n"); + return NULL; + } + + viport->state = VIPORT_DISCONNECTED; + viport->link_state = LINK_FIRSTCONNECT; + viport->connect = WAIT; + viport->new_mtu = 1500; + viport->new_flags = 0; + viport->config = config; + viport->connect = DELAY; + viport->data.max_mtu = vnic_max_mtu; + spin_lock_init(&viport->lock); + init_waitqueue_head(&viport->stats_queue); + init_waitqueue_head(&viport->disconnect_queue); + init_waitqueue_head(&viport->reference_queue); + INIT_LIST_HEAD(&viport->list_ptrs); + + vnic_mc_init(viport); + + return viport; +} + +void viport_connect(struct viport *viport, int delay) +{ + VIPORT_FUNCTION("viport_connect()\n"); + + if (viport->connect != DELAY) + viport->connect = (delay) ? DELAY : NOW; + if (viport->link_state == LINK_FIRSTCONNECT) { + u32 duration; + duration = (net_random() & 0x1ff); + if (!viport->parent->is_primary_path) + duration += 0x1ff; + viport->link_state = LINK_RETRYWAIT; + viport_timer(viport, duration); + } else + viport_kick(viport); +} + +void viport_disconnect(struct viport *viport) +{ + VIPORT_FUNCTION("viport_disconnect()\n"); + viport->disconnect = 1; + viport_failure(viport); + wait_event(viport->disconnect_queue, viport->disconnect == 0); +} + +void viport_free(struct viport *viport) +{ + VIPORT_FUNCTION("viport_free()\n"); + vnic_mc_uninit(viport); + viport_disconnect(viport); /* NOTE: this can sleep */ + kfree(viport->config); + kfree(viport); +} + +void viport_set_link(struct viport *viport, u16 flags, u16 mtu) +{ + unsigned long localflags; + int i; + + VIPORT_FUNCTION("viport_set_link()\n"); + if (mtu > data_max_mtu(&viport->data)) { + VIPORT_ERROR("configuration error." + " mtu of %d unsupported by %s\n", mtu, + config_viport_name(viport->config)); + goto failure; + } + + spin_lock_irqsave(&viport->lock, localflags); + flags &= IFF_UP | IFF_ALLMULTI | IFF_PROMISC; + if ((viport->new_flags != flags) + || (viport->new_mtu != mtu)) { + viport->new_flags = flags; + viport->new_mtu = mtu; + viport->updates |= NEED_LINK_CONFIG; + if (viport->features_supported & VNIC_FEAT_INBOUND_IB_MC) { + if (((viport->mtu <= MCAST_MSG_SIZE) && (mtu > MCAST_MSG_SIZE)) || + ((viport->mtu > MCAST_MSG_SIZE) && (mtu <= MCAST_MSG_SIZE))) { + /* + * MTU value will enable/disable the multicast. In + * either case, need to send the CMD_CONFIG_ADDRESS2 to + * EVIC. Hence, setting the NEED_ADDRESS_CONFIG flag. + */ + viport->updates |= NEED_ADDRESS_CONFIG; + if (mtu <= MCAST_MSG_SIZE) { + VIPORT_PRINT("%s: MTU changed; " + "old:%d new:%d (threshold:%d);" + " MULTICAST will be enabled.\n", + config_viport_name(viport->config), + viport->mtu, mtu, + (int)MCAST_MSG_SIZE); + } else { + VIPORT_PRINT("%s: MTU changed; " + "old:%d new:%d (threshold:%d); " + "MULTICAST will be disabled.\n", + config_viport_name(viport->config), + viport->mtu, mtu, + (int)MCAST_MSG_SIZE); + } + /* When we resend these addresses, EVIC will + * send mgid=0 back in response. So no need to + * shutoff ib_multicast. + */ + for (i = MCAST_ADDR_START; i < viport->num_mac_addresses; i++) { + if (viport->mac_addresses[i].valid) + viport->mac_addresses[i].operation = VNIC_OP_SET_ENTRY; + } + } + } + viport_kick(viport); + } + + spin_unlock_irqrestore(&viport->lock, localflags); + return; +failure: + viport_failure(viport); +} + +int viport_set_unicast(struct viport *viport, u8 *address) +{ + unsigned long flags; + int ret = -1; + VIPORT_FUNCTION("viport_set_unicast()\n"); + spin_lock_irqsave(&viport->lock, flags); + + if (!viport->mac_addresses) + goto out; + + if (memcmp(viport->mac_addresses[UNICAST_ADDR].address, + address, ETH_ALEN)) { + memcpy(viport->mac_addresses[UNICAST_ADDR].address, + address, ETH_ALEN); + viport->mac_addresses[UNICAST_ADDR].operation + = VNIC_OP_SET_ENTRY; + viport->updates |= NEED_ADDRESS_CONFIG; + viport_kick(viport); + } + ret = 0; +out: + spin_unlock_irqrestore(&viport->lock, flags); + return ret; +} + +int viport_set_multicast(struct viport *viport, + struct dev_mc_list *mc_list, int mc_count) +{ + u32 old_update_list; + int i; + int ret = -1; + unsigned long flags; + + VIPORT_FUNCTION("viport_set_multicast()\n"); + spin_lock_irqsave(&viport->lock, flags); + + if (!viport->mac_addresses) + goto out; + + old_update_list = viport->updates; + if (mc_count > viport->num_mac_addresses - MCAST_ADDR_START) + viport->updates |= NEED_LINK_CONFIG | MCAST_OVERFLOW; + else { + if (mc_count == 0) { + ret = 0; + goto out; + } + if (viport->updates & MCAST_OVERFLOW) { + viport->updates &= ~MCAST_OVERFLOW; + viport->updates |= NEED_LINK_CONFIG; + } + for (i = MCAST_ADDR_START; i < mc_count + MCAST_ADDR_START; + i++, mc_list = mc_list->next) { + if (viport->mac_addresses[i].valid && + !memcmp(viport->mac_addresses[i].address, + mc_list->dmi_addr, ETH_ALEN)) + continue; + memcpy(viport->mac_addresses[i].address, + mc_list->dmi_addr, ETH_ALEN); + viport->mac_addresses[i].valid = 1; + viport->mac_addresses[i].operation = VNIC_OP_SET_ENTRY; + } + for (; i < viport->num_mac_addresses; i++) { + if (!viport->mac_addresses[i].valid) + continue; + viport->mac_addresses[i].valid = 0; + viport->mac_addresses[i].operation = VNIC_OP_SET_ENTRY; + } + if (mc_count) + viport->updates |= NEED_ADDRESS_CONFIG; + } + + if (viport->updates != old_update_list) + viport_kick(viport); + ret = 0; +out: + spin_unlock_irqrestore(&viport->lock, flags); + return ret; +} + +static inline void viport_disable_multicast(struct viport *viport) +{ + VIPORT_INFO("turned off IB_MULTICAST\n"); + viport->config->control_config.ib_multicast = 0; + viport->config->control_config.ib_config.conn_data.features_supported &= + __constant_cpu_to_be32((u32)~VNIC_FEAT_INBOUND_IB_MC); + viport->link_state = LINK_RESET; +} + +void viport_get_stats(struct viport *viport, + struct net_device_stats *stats) +{ + unsigned long flags; + + VIPORT_FUNCTION("viport_get_stats()\n"); + if (jiffies > viport->last_stats_time + + viport->config->stats_interval) { + + spin_lock_irqsave(&viport->lock, flags); + viport->updates |= NEED_STATS; + /* increment reference count which indicates + * that viport structure is being used, which + * prevents its freeing when this task sleeps + */ + viport->reference_count++; + spin_unlock_irqrestore(&viport->lock, flags); + viport_kick(viport); + wait_event(viport->stats_queue, + !(viport->updates & NEED_STATS)); + + if (viport->stats.ethernet_status) + vnic_link_up(viport->vnic, viport->parent); + else + vnic_link_down(viport->vnic, viport->parent); + + } else { + spin_lock_irqsave(&viport->lock, flags); + viport->reference_count++; + spin_unlock_irqrestore(&viport->lock, flags); + } + + stats->rx_packets = be64_to_cpu(viport->stats.if_in_ok); + stats->tx_packets = be64_to_cpu(viport->stats.if_out_ok); + stats->rx_bytes = be64_to_cpu(viport->stats.if_in_octets); + stats->tx_bytes = be64_to_cpu(viport->stats.if_out_octets); + stats->rx_errors = be64_to_cpu(viport->stats.if_in_errors); + stats->tx_errors = be64_to_cpu(viport->stats.if_out_errors); + stats->rx_dropped = 0; /* EIOC doesn't track */ + stats->tx_dropped = 0; /* EIOC doesn't track */ + stats->multicast = be64_to_cpu(viport->stats.if_in_nucast_pkts); + stats->collisions = 0; /* EIOC doesn't track */ + + spin_lock_irqsave(&viport->lock, flags); + viport->reference_count--; + spin_unlock_irqrestore(&viport->lock, flags); + wake_up(&viport->reference_queue); +} + +int viport_xmit_packet(struct viport *viport, struct sk_buff *skb) +{ + int status = -1; + unsigned long flags; + + VIPORT_FUNCTION("viport_xmit_packet()\n"); + spin_lock_irqsave(&viport->lock, flags); + if (viport->state == VIPORT_CONNECTED) + status = data_xmit_packet(&viport->data, skb); + spin_unlock_irqrestore(&viport->lock, flags); + + return status; +} + +void viport_kick(struct viport *viport) +{ + unsigned long flags; + + VIPORT_FUNCTION("viport_kick()\n"); + spin_lock_irqsave(&viport_list_lock, flags); + if (list_empty(&viport->list_ptrs)) { + list_add_tail(&viport->list_ptrs, &viport_list); + wake_up(&viport_queue); + } + spin_unlock_irqrestore(&viport_list_lock, flags); +} + +void viport_failure(struct viport *viport) +{ + unsigned long flags; + + VIPORT_FUNCTION("viport_failure()\n"); + spin_lock_irqsave(&viport_list_lock, flags); + viport->errored = 1; + if (list_empty(&viport->list_ptrs)) { + list_add_tail(&viport->list_ptrs, &viport_list); + wake_up(&viport_queue); + } + spin_unlock_irqrestore(&viport_list_lock, flags); +} + +static void viport_timeout(unsigned long data) +{ + struct viport *viport; + + VIPORT_FUNCTION("viport_timeout()\n"); + viport = (struct viport *)data; + viport->timer_active = 0; + viport_kick(viport); +} + +static void viport_timer(struct viport *viport, int timeout) +{ + VIPORT_FUNCTION("viport_timer()\n"); + if (viport->timer_active) + del_timer(&viport->timer); + init_timer(&viport->timer); + viport->timer.expires = jiffies + timeout; + viport->timer.data = (unsigned long)viport; + viport->timer.function = viport_timeout; + viport->timer_active = 1; + add_timer(&viport->timer); +} + +static void viport_timer_stop(struct viport *viport) +{ + VIPORT_FUNCTION("viport_timer_stop()\n"); + if (viport->timer_active) + del_timer(&viport->timer); + viport->timer_active = 0; +} + +static int viport_init_mac_addresses(struct viport *viport) +{ + struct vnic_address_op2 *temp; + unsigned long flags; + int i; + + VIPORT_FUNCTION("viport_init_mac_addresses()\n"); + i = viport->num_mac_addresses * sizeof *temp; + temp = kzalloc(viport->num_mac_addresses * sizeof *temp, + GFP_KERNEL); + if (!temp) { + VIPORT_ERROR("failed allocating MAC address table\n"); + return -ENOMEM; + } + + spin_lock_irqsave(&viport->lock, flags); + viport->mac_addresses = temp; + for (i = 0; i < viport->num_mac_addresses; i++) { + viport->mac_addresses[i].index = cpu_to_be16(i); + viport->mac_addresses[i].vlan = + cpu_to_be16(viport->default_vlan); + } + memset(viport->mac_addresses[BROADCAST_ADDR].address, + 0xFF, ETH_ALEN); + viport->mac_addresses[BROADCAST_ADDR].valid = 1; + memcpy(viport->mac_addresses[UNICAST_ADDR].address, + viport->hw_mac_address, ETH_ALEN); + viport->mac_addresses[UNICAST_ADDR].valid = 1; + + spin_unlock_irqrestore(&viport->lock, flags); + + return 0; +} + +static inline void viport_match_mac_address(struct vnic *vnic, + struct viport *viport) +{ + if (vnic && vnic->current_path && + viport == vnic->current_path->viport && + vnic->mac_set && + memcmp(vnic->netdevice->dev_addr, viport->hw_mac_address, ETH_ALEN)) { + VIPORT_ERROR("*** ERROR MAC address mismatch; " + "current = %02x:%02x:%02x:%02x:%02x:%02x " + "From EVIC = %02x:%02x:%02x:%02x:%02x:%02x\n", + vnic->netdevice->dev_addr[0], + vnic->netdevice->dev_addr[1], + vnic->netdevice->dev_addr[2], + vnic->netdevice->dev_addr[3], + vnic->netdevice->dev_addr[4], + vnic->netdevice->dev_addr[5], + viport->hw_mac_address[0], + viport->hw_mac_address[1], + viport->hw_mac_address[2], + viport->hw_mac_address[3], + viport->hw_mac_address[4], + viport->hw_mac_address[5]); + } +} + +static int viport_handle_init_states(struct viport *viport) +{ + enum link_state old_state; + + do { + switch (old_state = viport->link_state) { + case LINK_UNINITIALIZED: + LINK_STATE("state LINK_UNINITIALIZED\n"); + viport->updates = 0; + /* cleanup_started will ensure that + * no more get_stats request will be + * be sent.Old stats will be returned + */ + viport->parent->cleanup_started = 1; + wake_up(&viport->stats_queue); + spin_lock_irq(&viport_list_lock); + list_del_init(&viport->list_ptrs); + spin_unlock_irq(&viport_list_lock); + spin_lock_irq(&viport->lock); + if (viport->reference_count) { + spin_unlock_irq(&viport->lock); + wait_event(viport->reference_queue, + viport->reference_count == 0); + } else + spin_unlock_irq(&viport->lock); + /* No more references to viport structure + * so it is safe to delete it by waking disconnect + * queue + */ + + viport->disconnect = 0; + wake_up(&viport->disconnect_queue); + break; + case LINK_INITIALIZE: + LINK_STATE("state LINK_INITIALIZE\n"); + viport->errored = 0; + viport->connect = WAIT; + viport->last_stats_time = 0; + if (viport->disconnect) + viport->link_state = LINK_UNINITIALIZED; + else + viport->link_state = LINK_INITIALIZECONTROL; + break; + case LINK_INITIALIZECONTROL: + LINK_STATE("state LINK_INITIALIZECONTROL\n"); + viport->pd = ib_alloc_pd(viport->config->ibdev); + if (IS_ERR(viport->pd)) + viport->link_state = LINK_DISCONNECTED; + else if (control_init(&viport->control, viport, + &viport->config->control_config, + viport->pd)) { + ib_dealloc_pd(viport->pd); + viport->link_state = LINK_DISCONNECTED; + + } else + viport->link_state = LINK_INITIALIZEDATA; + break; + case LINK_INITIALIZEDATA: + LINK_STATE("state LINK_INITIALIZEDATA\n"); + if (data_init(&viport->data, viport, + &viport->config->data_config, + viport->pd)) + viport->link_state = LINK_CLEANUPCONTROL; + else + viport->link_state = LINK_CONTROLCONNECT; + break; + default: + return -1; + } + } while (viport->link_state != old_state); + + return 0; +} + +static int viport_handle_control_states(struct viport *viport) +{ + enum link_state old_state; + struct vnic *vnic; + + do { + switch (old_state = viport->link_state) { + case LINK_CONTROLCONNECT: + if (vnic_ib_cm_connect(&viport->control.ib_conn)) + viport->link_state = LINK_CLEANUPDATA; + else + viport->link_state = LINK_CONTROLCONNECTWAIT; + break; + case LINK_CONTROLCONNECTWAIT: + LINK_STATE("state LINK_CONTROLCONNECTWAIT\n"); + if (control_is_connected(&viport->control)) + viport->link_state = LINK_INITVNICREQ; + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_CONTROLDISCONNECT; + } + break; + case LINK_INITVNICREQ: + LINK_STATE("state LINK_INITVNICREQ\n"); + if (control_init_vnic_req(&viport->control)) + viport->link_state = LINK_RESETCONTROL; + else + viport->link_state = LINK_INITVNICRSP; + break; + case LINK_INITVNICRSP: + LINK_STATE("state LINK_INITVNICRSP\n"); + control_process_async(&viport->control); + + if (!control_init_vnic_rsp(&viport->control, + &viport->features_supported, + viport->hw_mac_address, + &viport->num_mac_addresses, + &viport->default_vlan)) { + if (viport_init_mac_addresses(viport)) + viport->link_state = + LINK_RESETCONTROL; + else { + viport->link_state = + LINK_BEGINDATAPATH; + /* + * Ensure that the current path's MAC + * address matches the one returned by + * EVIC - we've had cases of mismatch + * which then caused havoc. + */ + vnic = viport->parent->parent; + viport_match_mac_address(vnic, viport); + } + } + + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_RESETCONTROL; + } + break; + default: + return -1; + } + } while (viport->link_state != old_state); + + return 0; +} + +static int viport_handle_data_states(struct viport *viport) +{ + enum link_state old_state; + + do { + switch (old_state = viport->link_state) { + case LINK_BEGINDATAPATH: + LINK_STATE("state LINK_BEGINDATAPATH\n"); + viport->link_state = LINK_CONFIGDATAPATHREQ; + break; + case LINK_CONFIGDATAPATHREQ: + LINK_STATE("state LINK_CONFIGDATAPATHREQ\n"); + if (control_config_data_path_req(&viport->control, + data_path_id(&viport-> + data), + data_host_pool_max + (&viport->data), + data_eioc_pool_max + (&viport->data))) + viport->link_state = LINK_RESETCONTROL; + else + viport->link_state = LINK_CONFIGDATAPATHRSP; + break; + case LINK_CONFIGDATAPATHRSP: + LINK_STATE("state LINK_CONFIGDATAPATHRSP\n"); + control_process_async(&viport->control); + + if (!control_config_data_path_rsp(&viport->control, + data_host_pool + (&viport->data), + data_eioc_pool + (&viport->data), + data_host_pool_max + (&viport->data), + data_eioc_pool_max + (&viport->data), + data_host_pool_min + (&viport->data), + data_eioc_pool_min + (&viport->data))) + viport->link_state = LINK_DATACONNECT; + + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_RESETCONTROL; + } + break; + case LINK_DATACONNECT: + LINK_STATE("state LINK_DATACONNECT\n"); + if (data_connect(&viport->data)) + viport->link_state = LINK_RESETCONTROL; + else + viport->link_state = LINK_DATACONNECTWAIT; + break; + case LINK_DATACONNECTWAIT: + LINK_STATE("state LINK_DATACONNECTWAIT\n"); + control_process_async(&viport->control); + if (data_is_connected(&viport->data)) + viport->link_state = LINK_XCHGPOOLREQ; + + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_RESET; + } + break; + default: + return -1; + } + } while (viport->link_state != old_state); + + return 0; +} + +static int viport_handle_xchgpool_states(struct viport *viport) +{ + enum link_state old_state; + + do { + switch (old_state = viport->link_state) { + case LINK_XCHGPOOLREQ: + LINK_STATE("state LINK_XCHGPOOLREQ\n"); + if (control_exchange_pools_req(&viport->control, + data_local_pool_addr + (&viport->data), + data_local_pool_rkey + (&viport->data))) + viport->link_state = LINK_RESET; + else + viport->link_state = LINK_XCHGPOOLRSP; + break; + case LINK_XCHGPOOLRSP: + LINK_STATE("state LINK_XCHGPOOLRSP\n"); + control_process_async(&viport->control); + + if (!control_exchange_pools_rsp(&viport->control, + data_remote_pool_addr + (&viport->data), + data_remote_pool_rkey + (&viport->data))) + viport->link_state = LINK_INITIALIZED; + + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_RESET; + } + break; + case LINK_INITIALIZED: + LINK_STATE("state LINK_INITIALIZED\n"); + viport->state = VIPORT_CONNECTED; + printk(KERN_INFO PFX + "%s: connection established\n", + config_viport_name(viport->config)); + data_connected(&viport->data); + vnic_connected(viport->parent->parent, + viport->parent); + if (viport->features_supported & VNIC_FEAT_INBOUND_IB_MC) { + printk(KERN_INFO PFX "%s: Supports Inbound IB " + "Multicast\n", + config_viport_name(viport->config)); + if (mc_data_init(&viport->mc_data, viport, + &viport->config->data_config, + viport->pd)) { + viport_disable_multicast(viport); + break; + } + } + spin_lock_irq(&viport->lock); + viport->mtu = 1500; + viport->flags = 0; + if ((viport->mtu != viport->new_mtu) || + (viport->flags != viport->new_flags)) + viport->updates |= NEED_LINK_CONFIG; + spin_unlock_irq(&viport->lock); + viport->link_state = LINK_IDLE; + viport->retry_duration = 0; + viport->total_retry_duration = 0; + break; + default: + return -1; + } + } while (viport->link_state != old_state); + + return 0; +} + +static int viport_handle_idle_states(struct viport *viport) +{ + enum link_state old_state; + int handle_mc_join_compl, handle_mc_join; + + do { + switch (old_state = viport->link_state) { + case LINK_IDLE: + LINK_STATE("state LINK_IDLE\n"); + if (viport->config->hb_interval) + viport_timer(viport, + viport->config->hb_interval); + viport->link_state = LINK_IDLING; + break; + case LINK_IDLING: + LINK_STATE("state LINK_IDLING\n"); + control_process_async(&viport->control); + if (viport->errored) { + viport_timer_stop(viport); + viport->errored = 0; + viport->link_state = LINK_RESET; + break; + } + + spin_lock_irq(&viport->lock); + handle_mc_join = (viport->updates & NEED_MCAST_JOIN); + handle_mc_join_compl = + (viport->updates & NEED_MCAST_COMPLETION); + /* + * Turn off both flags, the handler functions will + * rearm them if necessary. + */ + viport->updates &= ~(NEED_MCAST_JOIN | NEED_MCAST_COMPLETION); + + if (viport->updates & NEED_LINK_CONFIG) { + viport_timer_stop(viport); + viport->link_state = LINK_CONFIGLINKREQ; + } else if (viport->updates & NEED_ADDRESS_CONFIG) { + viport_timer_stop(viport); + viport->link_state = LINK_CONFIGADDRSREQ; + } else if (viport->updates & NEED_STATS) { + viport_timer_stop(viport); + viport->link_state = LINK_REPORTSTATREQ; + } else if (viport->config->hb_interval) { + if (!viport->timer_active) + viport->link_state = + LINK_HEARTBEATREQ; + } + spin_unlock_irq(&viport->lock); + if (handle_mc_join) { + if (vnic_mc_join(viport)) + viport_disable_multicast(viport); + } + if (handle_mc_join_compl) + vnic_mc_join_handle_completion(viport); + + break; + default: + return -1; + } + } while (viport->link_state != old_state); + + return 0; +} + +static int viport_handle_config_states(struct viport *viport) +{ + enum link_state old_state; + int res; + + do { + switch (old_state = viport->link_state) { + case LINK_CONFIGLINKREQ: + LINK_STATE("state LINK_CONFIGLINKREQ\n"); + spin_lock_irq(&viport->lock); + viport->updates &= ~NEED_LINK_CONFIG; + viport->flags = viport->new_flags; + if (viport->updates & MCAST_OVERFLOW) + viport->flags |= IFF_ALLMULTI; + viport->mtu = viport->new_mtu; + spin_unlock_irq(&viport->lock); + if (control_config_link_req(&viport->control, + viport->flags, + viport->mtu)) + viport->link_state = LINK_RESET; + else + viport->link_state = LINK_CONFIGLINKRSP; + break; + case LINK_CONFIGLINKRSP: + LINK_STATE("state LINK_CONFIGLINKRSP\n"); + control_process_async(&viport->control); + + if (!control_config_link_rsp(&viport->control, + &viport->flags, + &viport->mtu)) + viport->link_state = LINK_IDLE; + + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_RESET; + } + break; + case LINK_CONFIGADDRSREQ: + LINK_STATE("state LINK_CONFIGADDRSREQ\n"); + + spin_lock_irq(&viport->lock); + res = control_config_addrs_req(&viport->control, + viport->mac_addresses, + viport-> + num_mac_addresses); + + if (res > 0) { + viport->updates &= ~NEED_ADDRESS_CONFIG; + viport->link_state = LINK_CONFIGADDRSRSP; + } else if (res == 0) + viport->link_state = LINK_CONFIGADDRSRSP; + else + viport->link_state = LINK_RESET; + spin_unlock_irq(&viport->lock); + break; + case LINK_CONFIGADDRSRSP: + LINK_STATE("state LINK_CONFIGADDRSRSP\n"); + control_process_async(&viport->control); + + if (!control_config_addrs_rsp(&viport->control)) + viport->link_state = LINK_IDLE; + + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_RESET; + } + break; + default: + return -1; + } + } while (viport->link_state != old_state); + + return 0; +} + +static int viport_handle_stat_states(struct viport *viport) +{ + enum link_state old_state; + + do { + switch (old_state = viport->link_state) { + case LINK_REPORTSTATREQ: + LINK_STATE("state LINK_REPORTSTATREQ\n"); + if (control_report_statistics_req(&viport->control)) + viport->link_state = LINK_RESET; + else + viport->link_state = LINK_REPORTSTATRSP; + break; + case LINK_REPORTSTATRSP: + LINK_STATE("state LINK_REPORTSTATRSP\n"); + control_process_async(&viport->control); + + spin_lock_irq(&viport->lock); + if (control_report_statistics_rsp(&viport->control, + &viport->stats) == 0) { + viport->updates &= ~NEED_STATS; + viport->last_stats_time = jiffies; + wake_up(&viport->stats_queue); + viport->link_state = LINK_IDLE; + } + + spin_unlock_irq(&viport->lock); + + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_RESET; + } + break; + default: + return -1; + } + } while (viport->link_state != old_state); + + return 0; +} + +static int viport_handle_heartbeat_states(struct viport *viport) +{ + enum link_state old_state; + + do { + switch (old_state = viport->link_state) { + case LINK_HEARTBEATREQ: + LINK_STATE("state LINK_HEARTBEATREQ\n"); + if (control_heartbeat_req(&viport->control, + viport->config->hb_timeout)) + viport->link_state = LINK_RESET; + else + viport->link_state = LINK_HEARTBEATRSP; + break; + case LINK_HEARTBEATRSP: + LINK_STATE("state LINK_HEARTBEATRSP\n"); + control_process_async(&viport->control); + + if (!control_heartbeat_rsp(&viport->control)) + viport->link_state = LINK_IDLE; + + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_RESET; + } + break; + default: + return -1; + } + } while (viport->link_state != old_state); + + return 0; +} + +static int viport_handle_reset_states(struct viport *viport) +{ + enum link_state old_state; + int handle_mc_join_compl = 0, handle_mc_join = 0; + + do { + switch (old_state = viport->link_state) { + case LINK_RESET: + LINK_STATE("state LINK_RESET\n"); + viport->errored = 0; + spin_lock_irq(&viport->lock); + viport->state = VIPORT_DISCONNECTED; + /* + * Turn off both flags, the handler functions will + * rearm them if necessary + */ + viport->updates &= ~(NEED_MCAST_JOIN | NEED_MCAST_COMPLETION); + + spin_unlock_irq(&viport->lock); + vnic_link_down(viport->vnic, viport->parent); + printk(KERN_INFO PFX + "%s: connection lost\n", + config_viport_name(viport->config)); + if (handle_mc_join) { + if (vnic_mc_join(viport)) + viport_disable_multicast(viport); + } + if (handle_mc_join_compl) + vnic_mc_join_handle_completion(viport); + if (viport->features_supported & VNIC_FEAT_INBOUND_IB_MC) { + VIPORT_ERROR("calling vnic_mc_leave\n"); + vnic_mc_leave(viport); + VIPORT_ERROR("calling mc_data_cleanup\n"); + mc_data_cleanup(&viport->mc_data); + } + + if (control_reset_req(&viport->control)) + viport->link_state = LINK_DATADISCONNECT; + else + viport->link_state = LINK_RESETRSP; + break; + case LINK_RESETRSP: + LINK_STATE("state LINK_RESETRSP\n"); + control_process_async(&viport->control); + + if (!control_reset_rsp(&viport->control)) + viport->link_state = LINK_DATADISCONNECT; + + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_DATADISCONNECT; + } + break; + case LINK_RESETCONTROL: + LINK_STATE("state LINK_RESETCONTROL\n"); + if (control_reset_req(&viport->control)) + viport->link_state = LINK_CONTROLDISCONNECT; + else + viport->link_state = LINK_RESETCONTROLRSP; + break; + case LINK_RESETCONTROLRSP: + LINK_STATE("state LINK_RESETCONTROLRSP\n"); + control_process_async(&viport->control); + + if (!control_reset_rsp(&viport->control)) + viport->link_state = LINK_CONTROLDISCONNECT; + + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_CONTROLDISCONNECT; + } + break; + default: + return -1; + } + } while (viport->link_state != old_state); + + return 0; +} + +static int viport_handle_disconn_states(struct viport *viport) +{ + enum link_state old_state; + + do { + switch (old_state = viport->link_state) { + case LINK_DATADISCONNECT: + LINK_STATE("state LINK_DATADISCONNECT\n"); + data_disconnect(&viport->data); + viport->link_state = LINK_CONTROLDISCONNECT; + break; + case LINK_CONTROLDISCONNECT: + LINK_STATE("state LINK_CONTROLDISCONNECT\n"); + viport->link_state = LINK_CLEANUPDATA; + break; + case LINK_CLEANUPDATA: + LINK_STATE("state LINK_CLEANUPDATA\n"); + data_cleanup(&viport->data); + viport->link_state = LINK_CLEANUPCONTROL; + break; + case LINK_CLEANUPCONTROL: + LINK_STATE("state LINK_CLEANUPCONTROL\n"); + spin_lock_irq(&viport->lock); + kfree(viport->mac_addresses); + viport->mac_addresses = NULL; + spin_unlock_irq(&viport->lock); + control_cleanup(&viport->control); + ib_dealloc_pd(viport->pd); + viport->link_state = LINK_DISCONNECTED; + break; + case LINK_DISCONNECTED: + LINK_STATE("state LINK_DISCONNECTED\n"); + vnic_disconnected(viport->parent->parent, + viport->parent); + if (viport->disconnect != 0) + viport->link_state = LINK_UNINITIALIZED; + else if (viport->retry == 1) { + viport->retry = 0; + /* + * Check if the initial retry interval has crossed + * 20 seconds. + * The retry interval is initially 5 seconds which + * is incremented by 5. Once it is 20 the interval + * is fixed to 20 seconds till 10 minutes, + * after which retrying is stopped + */ + if (viport->retry_duration < MAX_RETRY_INTERVAL) + viport->retry_duration += + RETRY_INCREMENT; + + viport->total_retry_duration += + viport->retry_duration; + + if (viport->total_retry_duration >= + MAX_CONNECT_RETRY_TIMEOUT) { + viport->link_state = LINK_UNINITIALIZED; + printk("Timed out after retrying" + " for retry_duration %d msecs\n" + , viport->total_retry_duration); + } else { + viport->connect = DELAY; + viport->link_state = LINK_RETRYWAIT; + } + viport_timer(viport, + msecs_to_jiffies(viport->retry_duration)); + } else { + u32 duration = 5000 + ((net_random()) & 0x1FF); + if (!viport->parent->is_primary_path) + duration += 0x1ff; + viport_timer(viport, + msecs_to_jiffies(duration)); + viport->connect = DELAY; + viport->link_state = LINK_RETRYWAIT; + } + break; + case LINK_RETRYWAIT: + LINK_STATE("state LINK_RETRYWAIT\n"); + viport->stats.ethernet_status = 0; + viport->updates = 0; + wake_up(&viport->stats_queue); + if (viport->disconnect != 0) { + viport_timer_stop(viport); + viport->link_state = LINK_UNINITIALIZED; + } else if (viport->connect == DELAY) { + if (!viport->timer_active) + viport->link_state = LINK_INITIALIZE; + } else if (viport->connect == NOW) { + viport_timer_stop(viport); + viport->link_state = LINK_INITIALIZE; + } + break; + case LINK_FIRSTCONNECT: + viport->stats.ethernet_status = 0; + viport->updates = 0; + wake_up(&viport->stats_queue); + if (viport->disconnect != 0) { + viport_timer_stop(viport); + viport->link_state = LINK_UNINITIALIZED; + } + + break; + default: + return -1; + } + } while (viport->link_state != old_state); + + return 0; +} + +static int viport_statemachine(void *context) +{ + struct viport *viport; + enum link_state old_link_state; + + VIPORT_FUNCTION("viport_statemachine()\n"); + while (!viport_thread_end || !list_empty(&viport_list)) { + wait_event_interruptible(viport_queue, + !list_empty(&viport_list) + || viport_thread_end); + spin_lock_irq(&viport_list_lock); + if (list_empty(&viport_list)) { + spin_unlock_irq(&viport_list_lock); + continue; + } + viport = list_entry(viport_list.next, struct viport, + list_ptrs); + list_del_init(&viport->list_ptrs); + spin_unlock_irq(&viport_list_lock); + + do { + old_link_state = viport->link_state; + + /* + * Optimize for the state machine steady state + * by checking for the most common states first. + * + */ + if (viport_handle_idle_states(viport) == 0) + break; + if (viport_handle_heartbeat_states(viport) == 0) + break; + if (viport_handle_stat_states(viport) == 0) + break; + if (viport_handle_config_states(viport) == 0) + break; + + if (viport_handle_init_states(viport) == 0) + break; + if (viport_handle_control_states(viport) == 0) + break; + if (viport_handle_data_states(viport) == 0) + break; + if (viport_handle_xchgpool_states(viport) == 0) + break; + if (viport_handle_reset_states(viport) == 0) + break; + if (viport_handle_disconn_states(viport) == 0) + break; + } while (viport->link_state != old_link_state); + } + + complete_and_exit(&viport_thread_exit, 0); +} + +int viport_start(void) +{ + VIPORT_FUNCTION("viport_start()\n"); + + spin_lock_init(&viport_list_lock); + viport_thread = kthread_run(viport_statemachine, NULL, + "qlgc_vnic_viport_s_m"); + if (IS_ERR(viport_thread)) { + printk(KERN_WARNING PFX "Could not create viport_thread;" + " error %d\n", (int) PTR_ERR(viport_thread)); + viport_thread = NULL; + return 1; + } + + return 0; +} + +void viport_cleanup(void) +{ + VIPORT_FUNCTION("viport_cleanup()\n"); + if (viport_thread) { + viport_thread_end = 1; + wake_up(&viport_queue); + wait_for_completion(&viport_thread_exit); + viport_thread = NULL; + } +} diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_viport.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_viport.h new file mode 100644 index 0000000..bb0e7e1 --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_viport.h @@ -0,0 +1,176 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_VIPORT_H_INCLUDED +#define VNIC_VIPORT_H_INCLUDED + +#include "vnic_control.h" +#include "vnic_data.h" +#include "vnic_multicast.h" + +enum viport_state { + VIPORT_DISCONNECTED = 0, + VIPORT_CONNECTED = 1 +}; + +enum link_state { + LINK_UNINITIALIZED = 0, + LINK_INITIALIZE = 1, + LINK_INITIALIZECONTROL = 2, + LINK_INITIALIZEDATA = 3, + LINK_CONTROLCONNECT = 4, + LINK_CONTROLCONNECTWAIT = 5, + LINK_INITVNICREQ = 6, + LINK_INITVNICRSP = 7, + LINK_BEGINDATAPATH = 8, + LINK_CONFIGDATAPATHREQ = 9, + LINK_CONFIGDATAPATHRSP = 10, + LINK_DATACONNECT = 11, + LINK_DATACONNECTWAIT = 12, + LINK_XCHGPOOLREQ = 13, + LINK_XCHGPOOLRSP = 14, + LINK_INITIALIZED = 15, + LINK_IDLE = 16, + LINK_IDLING = 17, + LINK_CONFIGLINKREQ = 18, + LINK_CONFIGLINKRSP = 19, + LINK_CONFIGADDRSREQ = 20, + LINK_CONFIGADDRSRSP = 21, + LINK_REPORTSTATREQ = 22, + LINK_REPORTSTATRSP = 23, + LINK_HEARTBEATREQ = 24, + LINK_HEARTBEATRSP = 25, + LINK_RESET = 26, + LINK_RESETRSP = 27, + LINK_RESETCONTROL = 28, + LINK_RESETCONTROLRSP = 29, + LINK_DATADISCONNECT = 30, + LINK_CONTROLDISCONNECT = 31, + LINK_CLEANUPDATA = 32, + LINK_CLEANUPCONTROL = 33, + LINK_DISCONNECTED = 34, + LINK_RETRYWAIT = 35, + LINK_FIRSTCONNECT = 36 +}; + +enum { + BROADCAST_ADDR = 0, + UNICAST_ADDR = 1, + MCAST_ADDR_START = 2 +}; + +#define current_mac_address mac_addresses[UNICAST_ADDR].address + +enum { + NEED_STATS = 0x00000001, + NEED_ADDRESS_CONFIG = 0x00000002, + NEED_LINK_CONFIG = 0x00000004, + MCAST_OVERFLOW = 0x00000008, + NEED_MCAST_COMPLETION = 0x00000010, + NEED_MCAST_JOIN = 0x00000020 +}; + +struct viport { + struct list_head list_ptrs; + struct netpath *parent; + struct vnic *vnic; + struct viport_config *config; + struct control control; + struct data data; + spinlock_t lock; + struct ib_pd *pd; + enum viport_state state; + enum link_state link_state; + struct vnic_cmd_report_stats_rsp stats; + wait_queue_head_t stats_queue; + u32 last_stats_time; + u32 features_supported; + u8 hw_mac_address[ETH_ALEN]; + u16 default_vlan; + u16 num_mac_addresses; + struct vnic_address_op2 *mac_addresses; + u32 updates; + u16 flags; + u16 new_flags; + u16 mtu; + u16 new_mtu; + u32 errored; + enum { WAIT, DELAY, NOW } connect; + u32 disconnect; + u32 retry; + wait_queue_head_t disconnect_queue; + int timer_active; + struct timer_list timer; + u32 retry_duration; + u32 total_retry_duration; + int reference_count; + wait_queue_head_t reference_queue; + struct mc_info mc_info; + struct mc_data mc_data; +}; + +int viport_start(void); +void viport_cleanup(void); + +struct viport *viport_allocate(struct viport_config *config); +void viport_free(struct viport *viport); + +void viport_connect(struct viport *viport, int delay); +void viport_disconnect(struct viport *viport); + +void viport_set_link(struct viport *viport, u16 flags, u16 mtu); +void viport_get_stats(struct viport *viport, + struct net_device_stats *stats); +int viport_xmit_packet(struct viport *viport, struct sk_buff *skb); +void viport_kick(struct viport *viport); + +void viport_failure(struct viport *viport); + +int viport_set_unicast(struct viport *viport, u8 *address); +int viport_set_multicast(struct viport *viport, + struct dev_mc_list *mc_list, + int mc_count); + +#define viport_max_mtu(viport) data_max_mtu(&(viport)->data) + +#define viport_get_hw_addr(viport, address) \ + memcpy(address, (viport)->hw_mac_address, ETH_ALEN) + +#define viport_features(viport) ((viport)->features_supported) + +#define viport_can_tx_csum(viport) \ + (((viport)->features_supported & \ + (VNIC_FEAT_IPV4_CSUM_TX | VNIC_FEAT_TCP_CSUM_TX | \ + VNIC_FEAT_UDP_CSUM_TX)) == (VNIC_FEAT_IPV4_CSUM_TX | \ + VNIC_FEAT_TCP_CSUM_TX | VNIC_FEAT_UDP_CSUM_TX)) + +#endif /* VNIC_VIPORT_H_INCLUDED */ From ramachandra.kuchimanchi at qlogic.com Wed Apr 30 10:17:54 2008 From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K) Date: Wed, 30 Apr 2008 22:47:54 +0530 Subject: [ofa-general] [PATCH 04/13] QLogic VNIC: Implementation of Control path of communication protocol In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain> References: <20080430171028.31725.86190.stgit@localhost.localdomain> Message-ID: <20080430171754.31725.77615.stgit@localhost.localdomain> From: Poornima Kamath This patch adds the files that define the control packet formats and implements various control messages that are exchanged as part of the communication protocol with the EVIC/VEx. Signed-off-by: Ramachandra K Signed-off-by: Amar Mudrankit --- drivers/infiniband/ulp/qlgc_vnic/vnic_control.c | 2288 ++++++++++++++++++++ drivers/infiniband/ulp/qlgc_vnic/vnic_control.h | 180 ++ .../infiniband/ulp/qlgc_vnic/vnic_control_pkt.h | 368 +++ 3 files changed, 2836 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_control.c create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_control.h create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_control_pkt.h diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_control.c b/drivers/infiniband/ulp/qlgc_vnic/vnic_control.c new file mode 100644 index 0000000..470f22e --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_control.c @@ -0,0 +1,2288 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include + +#include "vnic_util.h" +#include "vnic_main.h" +#include "vnic_viport.h" +#include "vnic_control.h" +#include "vnic_control_pkt.h" +#include "vnic_stats.h" + +#define vnic_multicast_address(rsp2_address, index) \ + ((rsp2_address)->list_address_ops[index].address[0] & 0x01) + +static void control_log_control_packet(struct vnic_control_packet *pkt); + +static inline char *control_ifcfg_name(struct control *control) +{ + if (!control) + return "nctl"; + if (!control->parent) + return "np"; + if (!control->parent->parent) + return "npp"; + if (!control->parent->parent->parent) + return "nppp"; + if (!control->parent->parent->parent->config) + return "npppc"; + return (control->parent->parent->parent->config->name); +} + +static void control_recv(struct control *control, struct recv_io *recv_io) +{ + if (vnic_ib_post_recv(&control->ib_conn, &recv_io->io)) + viport_failure(control->parent); +} + +static void control_recv_complete(struct io *io) +{ + struct recv_io *recv_io = (struct recv_io *)io; + struct recv_io *last_recv_io; + struct control *control = &io->viport->control; + struct vnic_control_packet *pkt = control_packet(recv_io); + struct vnic_control_header *c_hdr = &pkt->hdr; + unsigned long flags; + cycles_t response_time; + + CONTROL_FUNCTION("%s: control_recv_complete() State=%d\n", + control_ifcfg_name(control), control->req_state); + + ib_dma_sync_single_for_cpu(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + control_note_rsptime_stats(&response_time); + CONTROL_PACKET(pkt); + spin_lock_irqsave(&control->io_lock, flags); + if (c_hdr->pkt_type == TYPE_INFO) { + last_recv_io = control->info; + control->info = recv_io; + spin_unlock_irqrestore(&control->io_lock, flags); + viport_kick(control->parent); + if (last_recv_io) + control_recv(control, last_recv_io); + } else if (c_hdr->pkt_type == TYPE_RSP) { + u8 repost = 0; + u8 fail = 0; + u8 kick = 0; + + switch (control->req_state) { + case REQ_INACTIVE: + case RSP_RECEIVED: + case REQ_COMPLETED: + CONTROL_ERROR("%s: Unexpected control" + "response received: CMD = %d\n", + control_ifcfg_name(control), + c_hdr->pkt_cmd); + control_log_control_packet(pkt); + control->req_state = REQ_FAILED; + fail = 1; + break; + case REQ_POSTED: + case REQ_SENT: + if (c_hdr->pkt_cmd != control->last_cmd + || c_hdr->pkt_seq_num != control->seq_num) { + CONTROL_ERROR("%s: Incorrect Control Response " + "received\n", + control_ifcfg_name(control)); + CONTROL_ERROR("%s: Sent control request:\n", + control_ifcfg_name(control)); + control_log_control_packet(control_last_req(control)); + CONTROL_ERROR("%s: Received control response:\n", + control_ifcfg_name(control)); + control_log_control_packet(pkt); + control->req_state = REQ_FAILED; + fail = 1; + } else { + control->response = recv_io; + control_update_rsptime_stats(control, + response_time); + if (control->req_state == REQ_POSTED) { + CONTROL_INFO("%s: Recv CMD RSP %d" + "before Send Completion\n", + control_ifcfg_name(control), + c_hdr->pkt_cmd); + control->req_state = RSP_RECEIVED; + } else { + control->req_state = REQ_COMPLETED; + kick = 1; + } + } + break; + case REQ_FAILED: + /* stay in REQ_FAILED state */ + repost = 1; + break; + } + spin_unlock_irqrestore(&control->io_lock, flags); + /* we must do this outside the lock*/ + if (kick) + viport_kick(control->parent); + if (repost || fail) { + control_recv(control, recv_io); + if (fail) + viport_failure(control->parent); + } + + } else { + list_add_tail(&recv_io->io.list_ptrs, + &control->failure_list); + spin_unlock_irqrestore(&control->io_lock, flags); + viport_kick(control->parent); + } + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); +} + +static void control_timeout(unsigned long data) +{ + struct control *control; + unsigned long flags; + u8 fail = 0; + u8 kick = 0; + + control = (struct control *)data; + CONTROL_FUNCTION("%s: control_timeout(), State=%d\n", + control_ifcfg_name(control), control->req_state); + control->timer_state = TIMER_EXPIRED; + + spin_lock_irqsave(&control->io_lock, flags); + switch (control->req_state) { + case REQ_INACTIVE: + kick = 1; + /* stay in REQ_INACTIVE state */ + break; + case REQ_POSTED: + case REQ_SENT: + control->req_state = REQ_FAILED; + CONTROL_ERROR("%s: No send Completion for Cmd=%d \n", + control_ifcfg_name(control), control->last_cmd); + control_timeout_stats(control); + fail = 1; + break; + case RSP_RECEIVED: + control->req_state = REQ_FAILED; + CONTROL_ERROR("%s: No response received from EIOC for Cmd=%d\n", + control_ifcfg_name(control), control->last_cmd); + control_timeout_stats(control); + fail = 1; + break; + case REQ_COMPLETED: + /* stay in REQ_COMPLETED state*/ + kick = 1; + break; + case REQ_FAILED: + /* stay in REQ_FAILED state*/ + break; + } + spin_unlock_irqrestore(&control->io_lock, flags); + /* we must do this outside the lock */ + if (fail) + viport_failure(control->parent); + if (kick) + viport_kick(control->parent); + + return; +} + +static void control_timer(struct control *control, int timeout) +{ + CONTROL_FUNCTION("%s: control_timer()\n", + control_ifcfg_name(control)); + if (control->timer_state == TIMER_ACTIVE) + mod_timer(&control->timer, jiffies + timeout); + else { + init_timer(&control->timer); + control->timer.expires = jiffies + timeout; + control->timer.data = (unsigned long)control; + control->timer.function = control_timeout; + control->timer_state = TIMER_ACTIVE; + add_timer(&control->timer); + } +} + +static void control_timer_stop(struct control *control) +{ + CONTROL_FUNCTION("%s: control_timer_stop()\n", + control_ifcfg_name(control)); + if (control->timer_state == TIMER_ACTIVE) + del_timer_sync(&control->timer); + + control->timer_state = TIMER_IDLE; +} + +static int control_send(struct control *control, struct send_io *send_io) +{ + unsigned long flags; + u8 ret = -1; + u8 fail = 0; + struct vnic_control_packet *pkt = control_packet(send_io); + + CONTROL_FUNCTION("%s: control_send(), State=%d\n", + control_ifcfg_name(control), control->req_state); + spin_lock_irqsave(&control->io_lock, flags); + switch (control->req_state) { + case REQ_INACTIVE: + CONTROL_PACKET(pkt); + control_timer(control, control->config->rsp_timeout); + control_note_reqtime_stats(control); + if (vnic_ib_post_send(&control->ib_conn, &control->send_io.io)) { + CONTROL_ERROR("%s: Failed to post send\n", + control_ifcfg_name(control)); + /* stay in REQ_INACTIVE state*/ + fail = 1; + } else { + control->last_cmd = pkt->hdr.pkt_cmd; + control->req_state = REQ_POSTED; + ret = 0; + } + break; + case REQ_POSTED: + case REQ_SENT: + case RSP_RECEIVED: + case REQ_COMPLETED: + CONTROL_ERROR("%s:Previous Command is not completed." + "New CMD: %d Last CMD: %d Seq: %d\n", + control_ifcfg_name(control), pkt->hdr.pkt_cmd, + control->last_cmd, control->seq_num); + + control->req_state = REQ_FAILED; + fail = 1; + break; + case REQ_FAILED: + /* this can occur after an error when ViPort state machine + * attempts to reset the link. + */ + CONTROL_INFO("%s:Attempt to send in failed state." + "New CMD: %d Last CMD: %d\n", + control_ifcfg_name(control), pkt->hdr.pkt_cmd, + control->last_cmd); + /* stay in REQ_FAILED state*/ + break; + } + spin_unlock_irqrestore(&control->io_lock, flags); + + /* we must do this outside the lock */ + if (fail) + viport_failure(control->parent); + return ret; + +} + +static void control_send_complete(struct io *io) +{ + struct control *control = &io->viport->control; + unsigned long flags; + u8 fail = 0; + u8 kick = 0; + + CONTROL_FUNCTION("%s: control_sendComplete(), State=%d\n", + control_ifcfg_name(control), control->req_state); + spin_lock_irqsave(&control->io_lock, flags); + switch (control->req_state) { + case REQ_INACTIVE: + case REQ_SENT: + case REQ_COMPLETED: + CONTROL_ERROR("%s: Unexpected control send completion\n", + control_ifcfg_name(control)); + fail = 1; + control->req_state = REQ_FAILED; + break; + case REQ_POSTED: + control->req_state = REQ_SENT; + break; + case RSP_RECEIVED: + control->req_state = REQ_COMPLETED; + kick = 1; + break; + case REQ_FAILED: + /* stay in REQ_FAILED state */ + break; + } + spin_unlock_irqrestore(&control->io_lock, flags); + /* we must do this outside the lock */ + if (fail) + viport_failure(control->parent); + if (kick) + viport_kick(control->parent); + + return; +} + +void control_process_async(struct control *control) +{ + struct recv_io *recv_io; + struct vnic_control_packet *pkt; + unsigned long flags; + + CONTROL_FUNCTION("%s: control_process_async()\n", + control_ifcfg_name(control)); + ib_dma_sync_single_for_cpu(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + spin_lock_irqsave(&control->io_lock, flags); + recv_io = control->info; + if (recv_io) { + CONTROL_INFO("%s: processing info packet\n", + control_ifcfg_name(control)); + control->info = NULL; + spin_unlock_irqrestore(&control->io_lock, flags); + pkt = control_packet(recv_io); + if (pkt->hdr.pkt_cmd == CMD_REPORT_STATUS) { + u32 status; + status = + be32_to_cpu(pkt->cmd.report_status.status_number); + switch (status) { + case VNIC_STATUS_LINK_UP: + CONTROL_INFO("%s: link up\n", + control_ifcfg_name(control)); + vnic_link_up(control->parent->vnic, + control->parent->parent); + break; + case VNIC_STATUS_LINK_DOWN: + CONTROL_INFO("%s: link down\n", + control_ifcfg_name(control)); + vnic_link_down(control->parent->vnic, + control->parent->parent); + break; + default: + CONTROL_ERROR("%s: asynchronous status" + " received from EIOC\n", + control_ifcfg_name(control)); + control_log_control_packet(pkt); + break; + } + } + if ((pkt->hdr.pkt_cmd != CMD_REPORT_STATUS) || + pkt->cmd.report_status.is_fatal) + viport_failure(control->parent); + + control_recv(control, recv_io); + spin_lock_irqsave(&control->io_lock, flags); + } + + while (!list_empty(&control->failure_list)) { + CONTROL_INFO("%s: processing error packet\n", + control_ifcfg_name(control)); + recv_io = (struct recv_io *) + list_entry(control->failure_list.next, struct io, + list_ptrs); + list_del(&recv_io->io.list_ptrs); + spin_unlock_irqrestore(&control->io_lock, flags); + pkt = control_packet(recv_io); + CONTROL_ERROR("%s: asynchronous error received from EIOC\n", + control_ifcfg_name(control)); + control_log_control_packet(pkt); + if ((pkt->hdr.pkt_type != TYPE_ERR) + || (pkt->hdr.pkt_cmd != CMD_REPORT_STATUS) + || pkt->cmd.report_status.is_fatal) + viport_failure(control->parent); + + control_recv(control, recv_io); + spin_lock_irqsave(&control->io_lock, flags); + } + spin_unlock_irqrestore(&control->io_lock, flags); + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + CONTROL_FUNCTION("%s: done control_process_async\n", + control_ifcfg_name(control)); +} + +static struct send_io *control_init_hdr(struct control *control, u8 cmd) +{ + struct control_config *config; + struct vnic_control_packet *pkt; + struct vnic_control_header *hdr; + + CONTROL_FUNCTION("control_init_hdr()\n"); + config = control->config; + + pkt = control_packet(&control->send_io); + hdr = &pkt->hdr; + + hdr->pkt_type = TYPE_REQ; + hdr->pkt_cmd = cmd; + control->seq_num++; + hdr->pkt_seq_num = control->seq_num; + hdr->pkt_retry_count = 0; + + return &control->send_io; +} + +static struct recv_io *control_get_rsp(struct control *control) +{ + struct recv_io *recv_io = NULL; + unsigned long flags; + u8 fail = 0; + + CONTROL_FUNCTION("%s: control_getRsp(), State=%d\n", + control_ifcfg_name(control), control->req_state); + spin_lock_irqsave(&control->io_lock, flags); + switch (control->req_state) { + case REQ_INACTIVE: + CONTROL_ERROR("%s: Checked for Response with no" + "command pending\n", + control_ifcfg_name(control)); + control->req_state = REQ_FAILED; + fail = 1; + break; + case REQ_POSTED: + case REQ_SENT: + case RSP_RECEIVED: + /* no response available yet + stay in present state*/ + break; + case REQ_COMPLETED: + recv_io = control->response; + if (!recv_io) { + control->req_state = REQ_FAILED; + fail = 1; + break; + } + control->response = NULL; + control->last_cmd = CMD_INVALID; + control_timer_stop(control); + control->req_state = REQ_INACTIVE; + break; + case REQ_FAILED: + control_timer_stop(control); + /* stay in REQ_FAILED state*/ + break; + } + spin_unlock_irqrestore(&control->io_lock, flags); + if (fail) + viport_failure(control->parent); + return recv_io; +} + +int control_init_vnic_req(struct control *control) +{ + struct send_io *send_io; + struct control_config *config = control->config; + struct vnic_control_packet *pkt; + struct vnic_cmd_init_vnic_req *init_vnic_req; + + ib_dma_sync_single_for_cpu(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + send_io = control_init_hdr(control, CMD_INIT_VNIC); + if (!send_io) + goto failure; + + pkt = control_packet(send_io); + init_vnic_req = &pkt->cmd.init_vnic_req; + init_vnic_req->vnic_major_version = + __constant_cpu_to_be16(VNIC_MAJORVERSION); + init_vnic_req->vnic_minor_version = + __constant_cpu_to_be16(VNIC_MINORVERSION); + init_vnic_req->vnic_instance = config->vnic_instance; + init_vnic_req->num_data_paths = 1; + init_vnic_req->num_address_entries = + cpu_to_be16(config->max_address_entries); + + control->last_cmd = pkt->hdr.pkt_cmd; + CONTROL_PACKET(pkt); + + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + return control_send(control, send_io); +failure: + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + return -1; +} + +static int control_chk_vnic_rsp_values(struct control *control, + u16 *num_addrs, + u8 num_data_paths, + u8 num_lan_switches, + u32 *features) +{ + + struct control_config *config = control->config; + + if ((control->maj_ver > VNIC_MAJORVERSION) + || ((control->maj_ver == VNIC_MAJORVERSION) + && (control->min_ver > VNIC_MINORVERSION))) { + CONTROL_ERROR("%s: unsupported version\n", + control_ifcfg_name(control)); + goto failure; + } + if (num_data_paths != 1) { + CONTROL_ERROR("%s: EIOC returned too many datapaths\n", + control_ifcfg_name(control)); + goto failure; + } + if (*num_addrs > config->max_address_entries) { + CONTROL_ERROR("%s: EIOC returned more address" + " entries than requested\n", + control_ifcfg_name(control)); + goto failure; + } + if (*num_addrs < config->min_address_entries) { + CONTROL_ERROR("%s: not enough address entries\n", + control_ifcfg_name(control)); + goto failure; + } + if (num_lan_switches < 1) { + CONTROL_ERROR("%s: EIOC returned no lan switches\n", + control_ifcfg_name(control)); + goto failure; + } + if (num_lan_switches > 1) { + CONTROL_ERROR("%s: EIOC returned multiple lan switches\n", + control_ifcfg_name(control)); + goto failure; + } + CONTROL_ERROR("%s checking features %x ib_multicast:%d\n", + control_ifcfg_name(control), + *features, config->ib_multicast); + if ((*features & VNIC_FEAT_INBOUND_IB_MC) && !config->ib_multicast) { + /* disable multicast if it is not on in the cfg file, or + if we turned it off because join failed */ + *features &= ~VNIC_FEAT_INBOUND_IB_MC; + } + + return 0; +failure: + return -1; +} + +int control_init_vnic_rsp(struct control *control, u32 *features, + u8 *mac_address, u16 *num_addrs, u16 *vlan) +{ + u8 num_data_paths; + u8 num_lan_switches; + struct recv_io *recv_io; + struct vnic_control_packet *pkt; + struct vnic_cmd_init_vnic_rsp *init_vnic_rsp; + + + CONTROL_FUNCTION("%s: control_init_vnic_rsp()\n", + control_ifcfg_name(control)); + ib_dma_sync_single_for_cpu(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + recv_io = control_get_rsp(control); + if (!recv_io) + goto out; + + pkt = control_packet(recv_io); + if (pkt->hdr.pkt_cmd != CMD_INIT_VNIC) + goto failure; + + init_vnic_rsp = &pkt->cmd.init_vnic_rsp; + control->maj_ver = be16_to_cpu(init_vnic_rsp->vnic_major_version); + control->min_ver = be16_to_cpu(init_vnic_rsp->vnic_minor_version); + num_data_paths = init_vnic_rsp->num_data_paths; + num_lan_switches = init_vnic_rsp->num_lan_switches; + *features = be32_to_cpu(init_vnic_rsp->features_supported); + *num_addrs = be16_to_cpu(init_vnic_rsp->num_address_entries); + + if (control_chk_vnic_rsp_values(control, num_addrs, + num_data_paths, + num_lan_switches, + features)) + goto failure; + + control->lan_switch.lan_switch_num = + init_vnic_rsp->lan_switch[0].lan_switch_num; + control->lan_switch.num_enet_ports = + init_vnic_rsp->lan_switch[0].num_enet_ports; + control->lan_switch.default_vlan = + init_vnic_rsp->lan_switch[0].default_vlan; + *vlan = be16_to_cpu(control->lan_switch.default_vlan); + memcpy(control->lan_switch.hw_mac_address, + init_vnic_rsp->lan_switch[0].hw_mac_address, ETH_ALEN); + memcpy(mac_address, init_vnic_rsp->lan_switch[0].hw_mac_address, + ETH_ALEN); + + control_recv(control, recv_io); + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + return 0; +failure: + viport_failure(control->parent); +out: + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + return -1; +} + +static void copy_recv_pool_config(struct vnic_recv_pool_config *src, + struct vnic_recv_pool_config *dst) +{ + dst->size_recv_pool_entry = src->size_recv_pool_entry; + dst->num_recv_pool_entries = src->num_recv_pool_entries; + dst->timeout_before_kick = src->timeout_before_kick; + dst->num_recv_pool_entries_before_kick = + src->num_recv_pool_entries_before_kick; + dst->num_recv_pool_bytes_before_kick = + src->num_recv_pool_bytes_before_kick; + dst->free_recv_pool_entries_per_update = + src->free_recv_pool_entries_per_update; +} + +static int check_recv_pool_config_value(__be32 *src, __be32 *dst, + __be32 *max, __be32 *min, + char *name) +{ + u32 value; + + value = be32_to_cpu(*src); + if (value > be32_to_cpu(*max)) { + CONTROL_ERROR("value %s too large\n", name); + return -1; + } else if (value < be32_to_cpu(*min)) { + CONTROL_ERROR("value %s too small\n", name); + return -1; + } + + *dst = cpu_to_be32(value); + return 0; +} + +static int check_recv_pool_config(struct vnic_recv_pool_config *src, + struct vnic_recv_pool_config *dst, + struct vnic_recv_pool_config *max, + struct vnic_recv_pool_config *min) +{ + if (check_recv_pool_config_value(&src->size_recv_pool_entry, + &dst->size_recv_pool_entry, + &max->size_recv_pool_entry, + &min->size_recv_pool_entry, + "size_recv_pool_entry") + || check_recv_pool_config_value(&src->num_recv_pool_entries, + &dst->num_recv_pool_entries, + &max->num_recv_pool_entries, + &min->num_recv_pool_entries, + "num_recv_pool_entries") + || check_recv_pool_config_value(&src->timeout_before_kick, + &dst->timeout_before_kick, + &max->timeout_before_kick, + &min->timeout_before_kick, + "timeout_before_kick") + || check_recv_pool_config_value(&src-> + num_recv_pool_entries_before_kick, + &dst-> + num_recv_pool_entries_before_kick, + &max-> + num_recv_pool_entries_before_kick, + &min-> + num_recv_pool_entries_before_kick, + "num_recv_pool_entries_before_kick") + || check_recv_pool_config_value(&src-> + num_recv_pool_bytes_before_kick, + &dst-> + num_recv_pool_bytes_before_kick, + &max-> + num_recv_pool_bytes_before_kick, + &min-> + num_recv_pool_bytes_before_kick, + "num_recv_pool_bytes_before_kick") + || check_recv_pool_config_value(&src-> + free_recv_pool_entries_per_update, + &dst-> + free_recv_pool_entries_per_update, + &max-> + free_recv_pool_entries_per_update, + &min-> + free_recv_pool_entries_per_update, + "free_recv_pool_entries_per_update")) + goto failure; + + if (!is_power_of2(be32_to_cpu(dst->num_recv_pool_entries))) { + CONTROL_ERROR("num_recv_pool_entries (%d)" + " must be power of 2\n", + dst->num_recv_pool_entries); + goto failure; + } + + if (!is_power_of2(be32_to_cpu(dst-> + free_recv_pool_entries_per_update))) { + CONTROL_ERROR("free_recv_pool_entries_per_update (%d)" + " must be power of 2\n", + dst->free_recv_pool_entries_per_update); + goto failure; + } + + if (be32_to_cpu(dst->free_recv_pool_entries_per_update) >= + be32_to_cpu(dst->num_recv_pool_entries)) { + CONTROL_ERROR("free_recv_pool_entries_per_update (%d) must" + " be less than num_recv_pool_entries (%d)\n", + dst->free_recv_pool_entries_per_update, + dst->num_recv_pool_entries); + goto failure; + } + + if (be32_to_cpu(dst->num_recv_pool_entries_before_kick) >= + be32_to_cpu(dst->num_recv_pool_entries)) { + CONTROL_ERROR("num_recv_pool_entries_before_kick (%d) must" + " be less than num_recv_pool_entries (%d)\n", + dst->num_recv_pool_entries_before_kick, + dst->num_recv_pool_entries); + goto failure; + } + + return 0; +failure: + return -1; +} + +int control_config_data_path_req(struct control *control, u64 path_id, + struct vnic_recv_pool_config *host, + struct vnic_recv_pool_config *eioc) +{ + struct send_io *send_io; + struct vnic_control_packet *pkt; + struct vnic_cmd_config_data_path *config_data_path; + + CONTROL_FUNCTION("%s: control_config_data_path_req()\n", + control_ifcfg_name(control)); + ib_dma_sync_single_for_cpu(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + send_io = control_init_hdr(control, CMD_CONFIG_DATA_PATH); + if (!send_io) + goto failure; + + pkt = control_packet(send_io); + config_data_path = &pkt->cmd.config_data_path_req; + config_data_path->data_path = 0; + config_data_path->path_identifier = path_id; + copy_recv_pool_config(host, + &config_data_path->host_recv_pool_config); + copy_recv_pool_config(eioc, + &config_data_path->eioc_recv_pool_config); + CONTROL_PACKET(pkt); + + control->last_cmd = pkt->hdr.pkt_cmd; + + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + return control_send(control, send_io); +failure: + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + return -1; +} + +int control_config_data_path_rsp(struct control *control, + struct vnic_recv_pool_config *host, + struct vnic_recv_pool_config *eioc, + struct vnic_recv_pool_config *max_host, + struct vnic_recv_pool_config *max_eioc, + struct vnic_recv_pool_config *min_host, + struct vnic_recv_pool_config *min_eioc) +{ + struct recv_io *recv_io; + struct vnic_control_packet *pkt; + struct vnic_cmd_config_data_path *config_data_path; + + CONTROL_FUNCTION("%s: control_config_data_path_rsp()\n", + control_ifcfg_name(control)); + ib_dma_sync_single_for_cpu(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + recv_io = control_get_rsp(control); + if (!recv_io) + goto out; + + pkt = control_packet(recv_io); + if (pkt->hdr.pkt_cmd != CMD_CONFIG_DATA_PATH) + goto failure; + + config_data_path = &pkt->cmd.config_data_path_rsp; + if (config_data_path->data_path != 0) { + CONTROL_ERROR("%s: received CMD_CONFIG_DATA_PATH response" + " for wrong data path: %u\n", + control_ifcfg_name(control), + config_data_path->data_path); + goto failure; + } + + if (check_recv_pool_config(&config_data_path-> + host_recv_pool_config, + host, max_host, min_host) + || check_recv_pool_config(&config_data_path-> + eioc_recv_pool_config, + eioc, max_eioc, min_eioc)) { + goto failure; + } + + control_recv(control, recv_io); + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + return 0; +failure: + viport_failure(control->parent); +out: + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + return -1; +} + +int control_exchange_pools_req(struct control *control, u64 addr, u32 rkey) +{ + struct send_io *send_io; + struct vnic_control_packet *pkt; + struct vnic_cmd_exchange_pools *exchange_pools; + + CONTROL_FUNCTION("%s: control_exchange_pools_req()\n", + control_ifcfg_name(control)); + ib_dma_sync_single_for_cpu(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + send_io = control_init_hdr(control, CMD_EXCHANGE_POOLS); + if (!send_io) + goto failure; + + pkt = control_packet(send_io); + exchange_pools = &pkt->cmd.exchange_pools_req; + exchange_pools->data_path = 0; + exchange_pools->pool_rkey = cpu_to_be32(rkey); + exchange_pools->pool_addr = cpu_to_be64(addr); + + control->last_cmd = pkt->hdr.pkt_cmd; + + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + return control_send(control, send_io); +failure: + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + return -1; +} + +int control_exchange_pools_rsp(struct control *control, u64 *addr, + u32 *rkey) +{ + struct recv_io *recv_io; + struct vnic_control_packet *pkt; + struct vnic_cmd_exchange_pools *exchange_pools; + + CONTROL_FUNCTION("%s: control_exchange_pools_rsp()\n", + control_ifcfg_name(control)); + ib_dma_sync_single_for_cpu(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + recv_io = control_get_rsp(control); + if (!recv_io) + goto out; + + pkt = control_packet(recv_io); + if (pkt->hdr.pkt_cmd != CMD_EXCHANGE_POOLS) + goto failure; + + exchange_pools = &pkt->cmd.exchange_pools_rsp; + *rkey = be32_to_cpu(exchange_pools->pool_rkey); + *addr = be64_to_cpu(exchange_pools->pool_addr); + + if (exchange_pools->data_path != 0) { + CONTROL_ERROR("%s: received CMD_EXCHANGE_POOLS response" + " for wrong data path: %u\n", + control_ifcfg_name(control), + exchange_pools->data_path); + goto failure; + } + + control_recv(control, recv_io); + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + return 0; +failure: + viport_failure(control->parent); +out: + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + return -1; +} + +int control_config_link_req(struct control *control, u16 flags, u16 mtu) +{ + struct send_io *send_io; + struct vnic_cmd_config_link *config_link_req; + struct vnic_control_packet *pkt; + + CONTROL_FUNCTION("%s: control_config_link_req()\n", + control_ifcfg_name(control)); + ib_dma_sync_single_for_cpu(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + send_io = control_init_hdr(control, CMD_CONFIG_LINK); + if (!send_io) + goto failure; + + pkt = control_packet(send_io); + config_link_req = &pkt->cmd.config_link_req; + config_link_req->lan_switch_num = + control->lan_switch.lan_switch_num; + config_link_req->cmd_flags = VNIC_FLAG_SET_MTU; + if (flags & IFF_UP) + config_link_req->cmd_flags |= VNIC_FLAG_ENABLE_NIC; + else + config_link_req->cmd_flags |= VNIC_FLAG_DISABLE_NIC; + if (flags & IFF_ALLMULTI) + config_link_req->cmd_flags |= VNIC_FLAG_ENABLE_MCAST_ALL; + else + config_link_req->cmd_flags |= VNIC_FLAG_DISABLE_MCAST_ALL; + if (flags & IFF_PROMISC) { + config_link_req->cmd_flags |= VNIC_FLAG_ENABLE_PROMISC; + /* the EIOU doesn't really do PROMISC mode. + * if PROMISC is set, it only receives unicast packets + * I also have to set MCAST_ALL if I want real + * PROMISC mode. + */ + config_link_req->cmd_flags &= ~VNIC_FLAG_DISABLE_MCAST_ALL; + config_link_req->cmd_flags |= VNIC_FLAG_ENABLE_MCAST_ALL; + } else + config_link_req->cmd_flags |= VNIC_FLAG_DISABLE_PROMISC; + + config_link_req->mtu_size = cpu_to_be16(mtu); + + control->last_cmd = pkt->hdr.pkt_cmd; + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + return control_send(control, send_io); +failure: + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + return -1; +} + +int control_config_link_rsp(struct control *control, u16 *flags, u16 *mtu) +{ + struct recv_io *recv_io; + struct vnic_control_packet *pkt; + struct vnic_cmd_config_link *config_link_rsp; + + CONTROL_FUNCTION("%s: control_config_link_rsp()\n", + control_ifcfg_name(control)); + ib_dma_sync_single_for_cpu(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + recv_io = control_get_rsp(control); + if (!recv_io) + goto out; + + pkt = control_packet(recv_io); + if (pkt->hdr.pkt_cmd != CMD_CONFIG_LINK) + goto failure; + config_link_rsp = &pkt->cmd.config_link_rsp; + if (config_link_rsp->cmd_flags & VNIC_FLAG_ENABLE_NIC) + *flags |= IFF_UP; + if (config_link_rsp->cmd_flags & VNIC_FLAG_ENABLE_MCAST_ALL) + *flags |= IFF_ALLMULTI; + if (config_link_rsp->cmd_flags & VNIC_FLAG_ENABLE_PROMISC) + *flags |= IFF_PROMISC; + + *mtu = be16_to_cpu(config_link_rsp->mtu_size); + + if (control->parent->features_supported & VNIC_FEAT_INBOUND_IB_MC) { + /* featuresSupported might include INBOUND_IB_MC but + MTU might cause it to be auto-disabled at embedded */ + if (config_link_rsp->cmd_flags & VNIC_FLAG_ENABLE_MCAST_ALL) { + union ib_gid mgid = config_link_rsp->allmulti_mgid; + if (mgid.raw[0] != 0xff) { + CONTROL_ERROR("%s: invalid formatprefix " + VNIC_GID_FMT "\n", + control_ifcfg_name(control), + VNIC_GID_RAW_ARG(mgid.raw)); + } else { + /* rather than issuing join here, which might + * arrive at SM before EVIC creates the MC + * group, postpone it. + */ + vnic_mc_join_setup(control->parent, &mgid); + CONTROL_ERROR("join setup for ALL_MULTI\n"); + } + } + /* we don't want to leave mcast group if MCAST_ALL is disabled + * because there are no doubt multicast addresses set and we + * want to stay joined so we can get that traffic via the + * mcast group. + */ + } + + control_recv(control, recv_io); + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + return 0; +failure: + viport_failure(control->parent); +out: + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + return -1; +} + +/* control_config_addrs_req: + * return values: + * -1: failure + * 0: incomplete (successful operation, but more address + * table entries to be updated) + * 1: complete + */ +int control_config_addrs_req(struct control *control, + struct vnic_address_op2 *addrs, u16 num) +{ + u16 i; + u8 j; + int ret = 1; + struct send_io *send_io; + struct vnic_control_packet *pkt; + struct vnic_cmd_config_addresses *config_addrs_req; + struct vnic_cmd_config_addresses2 *config_addrs_req2; + + CONTROL_FUNCTION("%s: control_config_addrs_req()\n", + control_ifcfg_name(control)); + ib_dma_sync_single_for_cpu(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + if (control->parent->features_supported & VNIC_FEAT_INBOUND_IB_MC) { + CONTROL_INFO("Sending CMD_CONFIG_ADDRESSES2 %lx MAX:%d " + "sizes:%d %d(off:%d) sizes2:%d %d %d" + "(off:%d - %d %d %d %d %d %d %d)\n", jiffies, + (int)MAX_CONFIG_ADDR_ENTRIES2, + (int)sizeof(struct vnic_cmd_config_addresses), + (int)sizeof(struct vnic_address_op), + (int)offsetof(struct vnic_cmd_config_addresses, + list_address_ops), + (int)sizeof(struct vnic_cmd_config_addresses2), + (int)sizeof(struct vnic_address_op2), + (int)sizeof(union ib_gid), + (int)offsetof(struct vnic_cmd_config_addresses2, + list_address_ops), + (int)offsetof(struct vnic_address_op2, index), + (int)offsetof(struct vnic_address_op2, operation), + (int)offsetof(struct vnic_address_op2, valid), + (int)offsetof(struct vnic_address_op2, address), + (int)offsetof(struct vnic_address_op2, vlan), + (int)offsetof(struct vnic_address_op2, reserved), + (int)offsetof(struct vnic_address_op2, mgid) + ); + send_io = control_init_hdr(control, CMD_CONFIG_ADDRESSES2); + if (!send_io) + goto failure; + + pkt = control_packet(send_io); + config_addrs_req2 = &pkt->cmd.config_addresses_req2; + memset(pkt->cmd.cmd_data, 0, VNIC_MAX_CONTROLDATASZ); + config_addrs_req2->lan_switch_num = + control->lan_switch.lan_switch_num; + for (i = 0, j = 0; (i < num) && (j < MAX_CONFIG_ADDR_ENTRIES2); i++) { + if (!addrs[i].operation) + continue; + config_addrs_req2->list_address_ops[j].index = + cpu_to_be16(i); + config_addrs_req2->list_address_ops[j].operation = + VNIC_OP_SET_ENTRY; + config_addrs_req2->list_address_ops[j].valid = + addrs[i].valid; + memcpy(config_addrs_req2->list_address_ops[j].address, + addrs[i].address, ETH_ALEN); + config_addrs_req2->list_address_ops[j].vlan = + addrs[i].vlan; + addrs[i].operation = 0; + CONTROL_INFO("%s i=%d " + "addr[%d]=%02x:%02x:%02x:%02x:%02x:%02x " + "valid:%d\n", control_ifcfg_name(control), i, j, + addrs[i].address[0], addrs[i].address[1], + addrs[i].address[2], addrs[i].address[3], + addrs[i].address[4], addrs[i].address[5], + addrs[i].valid); + j++; + } + config_addrs_req2->num_address_ops = j; + } else { + send_io = control_init_hdr(control, CMD_CONFIG_ADDRESSES); + if (!send_io) + goto failure; + + pkt = control_packet(send_io); + config_addrs_req = &pkt->cmd.config_addresses_req; + config_addrs_req->lan_switch_num = + control->lan_switch.lan_switch_num; + for (i = 0, j = 0; (i < num) && (j < 16); i++) { + if (!addrs[i].operation) + continue; + config_addrs_req->list_address_ops[j].index = + cpu_to_be16(i); + config_addrs_req->list_address_ops[j].operation = + VNIC_OP_SET_ENTRY; + config_addrs_req->list_address_ops[j].valid = + addrs[i].valid; + memcpy(config_addrs_req->list_address_ops[j].address, + addrs[i].address, ETH_ALEN); + config_addrs_req->list_address_ops[j].vlan = + addrs[i].vlan; + addrs[i].operation = 0; + j++; + } + config_addrs_req->num_address_ops = j; + } + for (; i < num; i++) { + if (addrs[i].operation) { + ret = 0; + break; + } + } + + control->last_cmd = pkt->hdr.pkt_cmd; + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + if (control_send(control, send_io)) + return -1; + return ret; +failure: + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + return -1; +} + +static int process_cmd_config_address2_rsp(struct control *control, + struct vnic_control_packet *pkt, + struct recv_io *recv_io) +{ + struct vnic_cmd_config_addresses2 *config_addrs_rsp2; + int idx, mcaddrs, nomgid; + union ib_gid mgid, rsp_mgid; + + config_addrs_rsp2 = &pkt->cmd.config_addresses_rsp2; + CONTROL_INFO("%s rsp to CONFIG_ADDRESSES2\n", + control_ifcfg_name(control)); + + for (idx = 0, mcaddrs = 0, nomgid = 1; + idx < config_addrs_rsp2->num_address_ops; + idx++) { + if (!config_addrs_rsp2->list_address_ops[idx].valid) + continue; + + /* check if address is multicasts */ + if (!vnic_multicast_address(config_addrs_rsp2, idx)) + continue; + + mcaddrs++; + mgid = config_addrs_rsp2->list_address_ops[idx].mgid; + CONTROL_INFO("%s: got mgid " VNIC_GID_FMT + " MCAST_MSG_SIZE:%d mtu:%d\n", + control_ifcfg_name(control), + VNIC_GID_RAW_ARG(mgid.raw), + (int)MCAST_MSG_SIZE, + control->parent->mtu); + + /* Embedded should have turned off multicast + * due to large MTU size; mgid had better be 0. + */ + if (control->parent->mtu > MCAST_MSG_SIZE) { + if ((mgid.global.subnet_prefix != 0) || + (mgid.global.interface_id != 0)) { + CONTROL_ERROR("%s: invalid mgid; " + "expected 0 " + VNIC_GID_FMT "\n", + control_ifcfg_name(control), + VNIC_GID_RAW_ARG(mgid.raw)); + } + continue; + } + if (mgid.raw[0] != 0xff) { + CONTROL_ERROR("%s: invalid formatprefix " + VNIC_GID_FMT "\n", + control_ifcfg_name(control), + VNIC_GID_RAW_ARG(mgid.raw)); + continue; + } + nomgid = 0; /* got a valid mgid */ + + /* let's verify that all the mgids match this one */ + for (; idx < config_addrs_rsp2->num_address_ops; idx++) { + if (!config_addrs_rsp2->list_address_ops[idx].valid) + continue; + + /* check if address is multicasts */ + if (!vnic_multicast_address(config_addrs_rsp2, idx)) + continue; + + rsp_mgid = config_addrs_rsp2->list_address_ops[idx].mgid; + if (memcmp(&mgid, &rsp_mgid, sizeof(union ib_gid)) == 0) + continue; + + CONTROL_ERROR("%s: Multicast Group MGIDs not " + "unique; mgids: " VNIC_GID_FMT + " " VNIC_GID_FMT "\n", + control_ifcfg_name(control), + VNIC_GID_RAW_ARG(mgid.raw), + VNIC_GID_RAW_ARG(rsp_mgid.raw)); + return 1; + } + + /* rather than issuing join here, which might arrive + * at SM before EVIC creates the MC group, postpone it. + */ + vnic_mc_join_setup(control->parent, &mgid); + + /* there is only one multicast group to join, so we're done. */ + break; + } + + /* we sent atleast one multicast address but got no MGID + * back so, if it is not allmulti case, leave the group + * we joined before. (for allmulti case we have to stay + * joined) + */ + if ((config_addrs_rsp2->num_address_ops > 0) && (mcaddrs > 0) && + nomgid && !(control->parent->flags & IFF_ALLMULTI)) { + CONTROL_INFO("numaddrops:%d mcadrs:%d nomgid:%d\n", + config_addrs_rsp2->num_address_ops, + mcaddrs > 0, nomgid); + + vnic_mc_leave(control->parent); + } + + return 0; +} + +int control_config_addrs_rsp(struct control *control) +{ + struct recv_io *recv_io; + struct vnic_control_packet *pkt; + + CONTROL_FUNCTION("%s: control_config_addrs_rsp()\n", + control_ifcfg_name(control)); + ib_dma_sync_single_for_cpu(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + recv_io = control_get_rsp(control); + if (!recv_io) + goto out; + + pkt = control_packet(recv_io); + if ((pkt->hdr.pkt_cmd != CMD_CONFIG_ADDRESSES) && + (pkt->hdr.pkt_cmd != CMD_CONFIG_ADDRESSES2)) + goto failure; + + if (((pkt->hdr.pkt_cmd == CMD_CONFIG_ADDRESSES2) && + !control->parent->features_supported & VNIC_FEAT_INBOUND_IB_MC) || + ((pkt->hdr.pkt_cmd == CMD_CONFIG_ADDRESSES) && + control->parent->features_supported & VNIC_FEAT_INBOUND_IB_MC)) { + CONTROL_ERROR("%s unexpected response pktCmd:%d flag:%x\n", + control_ifcfg_name(control), pkt->hdr.pkt_cmd, + control->parent->features_supported & + VNIC_FEAT_INBOUND_IB_MC); + goto failure; + } + + if (pkt->hdr.pkt_cmd == CMD_CONFIG_ADDRESSES2) { + if (process_cmd_config_address2_rsp(control, pkt, recv_io)) + goto failure; + } else { + struct vnic_cmd_config_addresses *config_addrs_rsp; + config_addrs_rsp = &pkt->cmd.config_addresses_rsp; + } + + control_recv(control, recv_io); + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + return 0; +failure: + viport_failure(control->parent); +out: + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + return -1; +} + +int control_report_statistics_req(struct control *control) +{ + struct send_io *send_io; + struct vnic_control_packet *pkt; + struct vnic_cmd_report_stats_req *report_statistics_req; + + CONTROL_FUNCTION("%s: control_report_statistics_req()\n", + control_ifcfg_name(control)); + ib_dma_sync_single_for_cpu(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + send_io = control_init_hdr(control, CMD_REPORT_STATISTICS); + if (!send_io) + goto failure; + + pkt = control_packet(send_io); + report_statistics_req = &pkt->cmd.report_statistics_req; + report_statistics_req->lan_switch_num = + control->lan_switch.lan_switch_num; + + control->last_cmd = pkt->hdr.pkt_cmd; + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + return control_send(control, send_io); +failure: + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + return -1; +} + +int control_report_statistics_rsp(struct control *control, + struct vnic_cmd_report_stats_rsp *stats) +{ + struct recv_io *recv_io; + struct vnic_control_packet *pkt; + struct vnic_cmd_report_stats_rsp *rep_stat_rsp; + + CONTROL_FUNCTION("%s: control_report_statistics_rsp()\n", + control_ifcfg_name(control)); + ib_dma_sync_single_for_cpu(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + recv_io = control_get_rsp(control); + if (!recv_io) + goto out; + + pkt = control_packet(recv_io); + if (pkt->hdr.pkt_cmd != CMD_REPORT_STATISTICS) + goto failure; + + rep_stat_rsp = &pkt->cmd.report_statistics_rsp; + + stats->if_in_broadcast_pkts = rep_stat_rsp->if_in_broadcast_pkts; + stats->if_in_multicast_pkts = rep_stat_rsp->if_in_multicast_pkts; + stats->if_in_octets = rep_stat_rsp->if_in_octets; + stats->if_in_ucast_pkts = rep_stat_rsp->if_in_ucast_pkts; + stats->if_in_nucast_pkts = rep_stat_rsp->if_in_nucast_pkts; + stats->if_in_underrun = rep_stat_rsp->if_in_underrun; + stats->if_in_errors = rep_stat_rsp->if_in_errors; + stats->if_out_errors = rep_stat_rsp->if_out_errors; + stats->if_out_octets = rep_stat_rsp->if_out_octets; + stats->if_out_ucast_pkts = rep_stat_rsp->if_out_ucast_pkts; + stats->if_out_multicast_pkts = rep_stat_rsp->if_out_multicast_pkts; + stats->if_out_broadcast_pkts = rep_stat_rsp->if_out_broadcast_pkts; + stats->if_out_nucast_pkts = rep_stat_rsp->if_out_nucast_pkts; + stats->if_out_ok = rep_stat_rsp->if_out_ok; + stats->if_in_ok = rep_stat_rsp->if_in_ok; + stats->if_out_ucast_bytes = rep_stat_rsp->if_out_ucast_bytes; + stats->if_out_multicast_bytes = rep_stat_rsp->if_out_multicast_bytes; + stats->if_out_broadcast_bytes = rep_stat_rsp->if_out_broadcast_bytes; + stats->if_in_ucast_bytes = rep_stat_rsp->if_in_ucast_bytes; + stats->if_in_multicast_bytes = rep_stat_rsp->if_in_multicast_bytes; + stats->if_in_broadcast_bytes = rep_stat_rsp->if_in_broadcast_bytes; + stats->ethernet_status = rep_stat_rsp->ethernet_status; + + control_recv(control, recv_io); + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + return 0; +failure: + viport_failure(control->parent); +out: + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + return -1; +} + +int control_reset_req(struct control *control) +{ + struct send_io *send_io; + struct vnic_control_packet *pkt; + + CONTROL_FUNCTION("%s: control_reset_req()\n", + control_ifcfg_name(control)); + ib_dma_sync_single_for_cpu(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + send_io = control_init_hdr(control, CMD_RESET); + if (!send_io) + goto failure; + + pkt = control_packet(send_io); + + control->last_cmd = pkt->hdr.pkt_cmd; + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + return control_send(control, send_io); +failure: + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + return -1; +} + +int control_reset_rsp(struct control *control) +{ + struct recv_io *recv_io; + struct vnic_control_packet *pkt; + + CONTROL_FUNCTION("%s: control_reset_rsp()\n", + control_ifcfg_name(control)); + ib_dma_sync_single_for_cpu(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + recv_io = control_get_rsp(control); + if (!recv_io) + goto out; + + pkt = control_packet(recv_io); + if (pkt->hdr.pkt_cmd != CMD_RESET) + goto failure; + + control_recv(control, recv_io); + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + return 0; +failure: + viport_failure(control->parent); +out: + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + return -1; +} + +int control_heartbeat_req(struct control *control, u32 hb_interval) +{ + struct send_io *send_io; + struct vnic_control_packet *pkt; + struct vnic_cmd_heartbeat *heartbeat_req; + + CONTROL_FUNCTION("%s: control_heartbeat_req()\n", + control_ifcfg_name(control)); + ib_dma_sync_single_for_cpu(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + send_io = control_init_hdr(control, CMD_HEARTBEAT); + if (!send_io) + goto failure; + + pkt = control_packet(send_io); + heartbeat_req = &pkt->cmd.heartbeat_req; + heartbeat_req->hb_interval = cpu_to_be32(hb_interval); + + control->last_cmd = pkt->hdr.pkt_cmd; + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + return control_send(control, send_io); +failure: + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + return -1; +} + +int control_heartbeat_rsp(struct control *control) +{ + struct recv_io *recv_io; + struct vnic_control_packet *pkt; + struct vnic_cmd_heartbeat *heartbeat_rsp; + + CONTROL_FUNCTION("%s: control_heartbeat_rsp()\n", + control_ifcfg_name(control)); + ib_dma_sync_single_for_cpu(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + recv_io = control_get_rsp(control); + if (!recv_io) + goto out; + + pkt = control_packet(recv_io); + if (pkt->hdr.pkt_cmd != CMD_HEARTBEAT) + goto failure; + + heartbeat_rsp = &pkt->cmd.heartbeat_rsp; + + control_recv(control, recv_io); + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + return 0; +failure: + viport_failure(control->parent); +out: + ib_dma_sync_single_for_device(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + return -1; +} + +static int control_init_recv_ios(struct control *control, + struct viport *viport, + struct vnic_control_packet *pkt) +{ + struct io *io; + struct ib_device *ibdev = viport->config->ibdev; + struct control_config *config = control->config; + dma_addr_t recv_dma; + unsigned int i; + + + control->recv_len = sizeof *pkt * config->num_recvs; + control->recv_dma = ib_dma_map_single(ibdev, + pkt, control->recv_len, + DMA_FROM_DEVICE); + + if (ib_dma_mapping_error(ibdev, control->recv_dma)) { + CONTROL_ERROR("control recv dma map error\n"); + goto failure; + } + + recv_dma = control->recv_dma; + for (i = 0; i < config->num_recvs; i++) { + io = &control->recv_ios[i].io; + io->viport = viport; + io->routine = control_recv_complete; + io->type = RECV; + + control->recv_ios[i].virtual_addr = (u8 *)pkt; + control->recv_ios[i].list.addr = recv_dma; + control->recv_ios[i].list.length = sizeof *pkt; + control->recv_ios[i].list.lkey = control->mr->lkey; + + recv_dma = recv_dma + sizeof *pkt; + pkt++; + + io->rwr.wr_id = (u64)io; + io->rwr.sg_list = &control->recv_ios[i].list; + io->rwr.num_sge = 1; + if (vnic_ib_post_recv(&control->ib_conn, io)) + goto unmap_recv; + } + + return 0; +unmap_recv: + ib_dma_unmap_single(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); +failure: + return -1; +} + +static int control_init_send_ios(struct control *control, + struct viport *viport, + struct vnic_control_packet *pkt) +{ + struct io *io; + struct ib_device *ibdev = viport->config->ibdev; + + control->send_io.virtual_addr = (u8 *)pkt; + control->send_len = sizeof *pkt; + control->send_dma = ib_dma_map_single(ibdev, pkt, + control->send_len, + DMA_TO_DEVICE); + if (ib_dma_mapping_error(ibdev, control->send_dma)) { + CONTROL_ERROR("control send dma map error\n"); + goto failure; + } + + io = &control->send_io.io; + io->viport = viport; + io->routine = control_send_complete; + + control->send_io.list.addr = control->send_dma; + control->send_io.list.length = sizeof *pkt; + control->send_io.list.lkey = control->mr->lkey; + + io->swr.wr_id = (u64)io; + io->swr.sg_list = &control->send_io.list; + io->swr.num_sge = 1; + io->swr.opcode = IB_WR_SEND; + io->swr.send_flags = IB_SEND_SIGNALED; + io->type = SEND; + + return 0; +failure: + return -1; +} + +int control_init(struct control *control, struct viport *viport, + struct control_config *config, struct ib_pd *pd) +{ + struct vnic_control_packet *pkt; + unsigned int sz; + + CONTROL_FUNCTION("%s: control_init()\n", + control_ifcfg_name(control)); + control->parent = viport; + control->config = config; + control->ib_conn.viport = viport; + control->ib_conn.ib_config = &config->ib_config; + control->ib_conn.state = IB_CONN_UNINITTED; + control->ib_conn.callback_thread = NULL; + control->ib_conn.callback_thread_end = 0; + control->req_state = REQ_INACTIVE; + control->last_cmd = CMD_INVALID; + control->seq_num = 0; + control->response = NULL; + control->info = NULL; + INIT_LIST_HEAD(&control->failure_list); + spin_lock_init(&control->io_lock); + + if (vnic_ib_conn_init(&control->ib_conn, viport, pd, + &config->ib_config)) { + CONTROL_ERROR("Control IB connection" + " initialization failed\n"); + goto failure; + } + + control->mr = ib_get_dma_mr(pd, IB_ACCESS_LOCAL_WRITE); + if (IS_ERR(control->mr)) { + CONTROL_ERROR("%s: failed to register memory" + " for control connection\n", + control_ifcfg_name(control)); + goto destroy_conn; + } + + control->ib_conn.cm_id = ib_create_cm_id(viport->config->ibdev, + vnic_ib_cm_handler, + &control->ib_conn); + if (IS_ERR(control->ib_conn.cm_id)) { + CONTROL_ERROR("creating control CM ID failed\n"); + goto destroy_mr; + } + + sz = sizeof(struct recv_io) * config->num_recvs; + control->recv_ios = vmalloc(sz); + + if (!control->recv_ios) { + CONTROL_ERROR("%s: failed allocating space for recv ios\n", + control_ifcfg_name(control)); + goto destroy_cm_id; + } + + memset(control->recv_ios, 0, sz); + /*One send buffer and num_recvs recv buffers */ + control->local_storage = kzalloc(sizeof *pkt * + (config->num_recvs + 1), + GFP_KERNEL); + + if (!control->local_storage) { + CONTROL_ERROR("%s: failed allocating space" + " for local storage\n", + control_ifcfg_name(control)); + goto free_recv_ios; + } + + pkt = control->local_storage; + if (control_init_send_ios(control, viport, pkt)) + goto free_storage; + + pkt++; + if (control_init_recv_ios(control, viport, pkt)) + goto unmap_send; + + return 0; + +unmap_send: + ib_dma_unmap_single(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); +free_storage: + kfree(control->local_storage); +free_recv_ios: + vfree(control->recv_ios); +destroy_cm_id: + ib_destroy_cm_id(control->ib_conn.cm_id); +destroy_mr: + ib_dereg_mr(control->mr); +destroy_conn: + ib_destroy_qp(control->ib_conn.qp); + ib_destroy_cq(control->ib_conn.cq); +failure: + return -1; +} + +void control_cleanup(struct control *control) +{ + CONTROL_FUNCTION("%s: control_disconnect()\n", + control_ifcfg_name(control)); + + if (ib_send_cm_dreq(control->ib_conn.cm_id, NULL, 0)) + printk(KERN_DEBUG "control CM DREQ sending failed\n"); + + control->ib_conn.state = IB_CONN_DISCONNECTED; + control_timer_stop(control); + control->req_state = REQ_INACTIVE; + control->response = NULL; + control->last_cmd = CMD_INVALID; + completion_callback_cleanup(&control->ib_conn); + ib_destroy_cm_id(control->ib_conn.cm_id); + ib_destroy_qp(control->ib_conn.qp); + ib_destroy_cq(control->ib_conn.cq); + ib_dereg_mr(control->mr); + ib_dma_unmap_single(control->parent->config->ibdev, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + ib_dma_unmap_single(control->parent->config->ibdev, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + vfree(control->recv_ios); + kfree(control->local_storage); + +} + +static void control_log_report_status_pkt(struct vnic_control_packet *pkt) +{ + printk(KERN_INFO + " pkt_cmd = CMD_REPORT_STATUS\n"); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + pkt->hdr.pkt_seq_num, + pkt->hdr.pkt_retry_count); + printk(KERN_INFO + " lan_switch_num = %u, is_fatal = %u\n", + pkt->cmd.report_status.lan_switch_num, + pkt->cmd.report_status.is_fatal); + printk(KERN_INFO + " status_number = %u, status_info = %u\n", + be32_to_cpu(pkt->cmd.report_status.status_number), + be32_to_cpu(pkt->cmd.report_status.status_info)); + pkt->cmd.report_status.file_name[31] = '\0'; + pkt->cmd.report_status.routine[31] = '\0'; + printk(KERN_INFO " filename = %s, routine = %s\n", + pkt->cmd.report_status.file_name, + pkt->cmd.report_status.routine); + printk(KERN_INFO + " line_num = %u, error_parameter = %u\n", + be32_to_cpu(pkt->cmd.report_status.line_num), + be32_to_cpu(pkt->cmd.report_status.error_parameter)); + pkt->cmd.report_status.desc_text[127] = '\0'; + printk(KERN_INFO " desc_text = %s\n", + pkt->cmd.report_status.desc_text); +} + +static void control_log_report_stats_pkt(struct vnic_control_packet *pkt) +{ + printk(KERN_INFO + " pkt_cmd = CMD_REPORT_STATISTICS\n"); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + pkt->hdr.pkt_seq_num, + pkt->hdr.pkt_retry_count); + printk(KERN_INFO " lan_switch_num = %u\n", + pkt->cmd.report_statistics_req.lan_switch_num); + if (pkt->hdr.pkt_type == TYPE_REQ) + return; + printk(KERN_INFO " if_in_broadcast_pkts = %llu", + be64_to_cpu(pkt->cmd.report_statistics_rsp. + if_in_broadcast_pkts)); + printk(" if_in_multicast_pkts = %llu\n", + be64_to_cpu(pkt->cmd.report_statistics_rsp. + if_in_multicast_pkts)); + printk(KERN_INFO " if_in_octets = %llu", + be64_to_cpu(pkt->cmd.report_statistics_rsp. + if_in_octets)); + printk(" if_in_ucast_pkts = %llu\n", + be64_to_cpu(pkt->cmd.report_statistics_rsp. + if_in_ucast_pkts)); + printk(KERN_INFO " if_in_nucast_pkts = %llu", + be64_to_cpu(pkt->cmd.report_statistics_rsp. + if_in_nucast_pkts)); + printk(" if_in_underrun = %llu\n", + be64_to_cpu(pkt->cmd.report_statistics_rsp. + if_in_underrun)); + printk(KERN_INFO " if_in_errors = %llu", + be64_to_cpu(pkt->cmd.report_statistics_rsp. + if_in_errors)); + printk(" if_out_errors = %llu\n", + be64_to_cpu(pkt->cmd.report_statistics_rsp. + if_out_errors)); + printk(KERN_INFO " if_out_octets = %llu", + be64_to_cpu(pkt->cmd.report_statistics_rsp. + if_out_octets)); + printk(" if_out_ucast_pkts = %llu\n", + be64_to_cpu(pkt->cmd.report_statistics_rsp. + if_out_ucast_pkts)); + printk(KERN_INFO " if_out_multicast_pkts = %llu", + be64_to_cpu(pkt->cmd.report_statistics_rsp. + if_out_multicast_pkts)); + printk(" if_out_broadcast_pkts = %llu\n", + be64_to_cpu(pkt->cmd.report_statistics_rsp. + if_out_broadcast_pkts)); + printk(KERN_INFO " if_out_nucast_pkts = %llu", + be64_to_cpu(pkt->cmd.report_statistics_rsp. + if_out_nucast_pkts)); + printk(" if_out_ok = %llu\n", + be64_to_cpu(pkt->cmd.report_statistics_rsp.if_out_ok)); + printk(KERN_INFO " if_in_ok = %llu", + be64_to_cpu(pkt->cmd.report_statistics_rsp.if_in_ok)); + printk(" if_out_ucast_bytes = %llu\n", + be64_to_cpu(pkt->cmd.report_statistics_rsp. + if_out_ucast_bytes)); + printk(KERN_INFO " if_out_multicast_bytes = %llu", + be64_to_cpu(pkt->cmd.report_statistics_rsp. + if_out_multicast_bytes)); + printk(" if_out_broadcast_bytes = %llu\n", + be64_to_cpu(pkt->cmd.report_statistics_rsp. + if_out_broadcast_bytes)); + printk(KERN_INFO " if_in_ucast_bytes = %llu", + be64_to_cpu(pkt->cmd.report_statistics_rsp. + if_in_ucast_bytes)); + printk(" if_in_multicast_bytes = %llu\n", + be64_to_cpu(pkt->cmd.report_statistics_rsp. + if_in_multicast_bytes)); + printk(KERN_INFO " if_in_broadcast_bytes = %llu", + be64_to_cpu(pkt->cmd.report_statistics_rsp. + if_in_broadcast_bytes)); + printk(" ethernet_status = %llu\n", + be64_to_cpu(pkt->cmd.report_statistics_rsp. + ethernet_status)); +} + +static void control_log_config_link_pkt(struct vnic_control_packet *pkt) +{ + printk(KERN_INFO + " pkt_cmd = CMD_CONFIG_LINK\n"); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + pkt->hdr.pkt_seq_num, + pkt->hdr.pkt_retry_count); + printk(KERN_INFO " cmd_flags = %x\n", + pkt->cmd.config_link_req.cmd_flags); + if (pkt->cmd.config_link_req.cmd_flags & VNIC_FLAG_ENABLE_NIC) + printk(KERN_INFO + " VNIC_FLAG_ENABLE_NIC\n"); + if (pkt->cmd.config_link_req.cmd_flags & VNIC_FLAG_DISABLE_NIC) + printk(KERN_INFO + " VNIC_FLAG_DISABLE_NIC\n"); + if (pkt->cmd.config_link_req. + cmd_flags & VNIC_FLAG_ENABLE_MCAST_ALL) + printk(KERN_INFO + " VNIC_FLAG_ENABLE_" + "MCAST_ALL\n"); + if (pkt->cmd.config_link_req. + cmd_flags & VNIC_FLAG_DISABLE_MCAST_ALL) + printk(KERN_INFO + " VNIC_FLAG_DISABLE_" + "MCAST_ALL\n"); + if (pkt->cmd.config_link_req. + cmd_flags & VNIC_FLAG_ENABLE_PROMISC) + printk(KERN_INFO + " VNIC_FLAG_ENABLE_" + "PROMISC\n"); + if (pkt->cmd.config_link_req. + cmd_flags & VNIC_FLAG_DISABLE_PROMISC) + printk(KERN_INFO + " VNIC_FLAG_DISABLE_" + "PROMISC\n"); + if (pkt->cmd.config_link_req.cmd_flags & VNIC_FLAG_SET_MTU) + printk(KERN_INFO + " VNIC_FLAG_SET_MTU\n"); + printk(KERN_INFO + " lan_switch_num = %x, mtu_size = %d\n", + pkt->cmd.config_link_req.lan_switch_num, + be16_to_cpu(pkt->cmd.config_link_req.mtu_size)); + if (pkt->hdr.pkt_type == TYPE_RSP) { + printk(KERN_INFO + " default_vlan = %u," + " hw_mac_address =" + " %02x:%02x:%02x:%02x:%02x:%02x\n", + be16_to_cpu(pkt->cmd.config_link_req. + default_vlan), + pkt->cmd.config_link_req.hw_mac_address[0], + pkt->cmd.config_link_req.hw_mac_address[1], + pkt->cmd.config_link_req.hw_mac_address[2], + pkt->cmd.config_link_req.hw_mac_address[3], + pkt->cmd.config_link_req.hw_mac_address[4], + pkt->cmd.config_link_req.hw_mac_address[5]); + } +} + +static void print_config_addr(struct vnic_address_op *list, + int num_address_ops, size_t mgidoff) +{ + int i = 0; + + while (i < num_address_ops && i < 16) { + printk(KERN_INFO " list_address_ops[%u].index" + " = %u\n", i, be16_to_cpu(list->index)); + switch (list->operation) { + case VNIC_OP_GET_ENTRY: + printk(KERN_INFO " list_address_ops[%u]." + "operation = VNIC_OP_GET_ENTRY\n", i); + break; + case VNIC_OP_SET_ENTRY: + printk(KERN_INFO " list_address_ops[%u]." + "operation = VNIC_OP_SET_ENTRY\n", i); + break; + default: + printk(KERN_INFO " list_address_ops[%u]." + "operation = UNKNOWN(%d)\n", i, + list->operation); + break; + } + printk(KERN_INFO " list_address_ops[%u].valid" + " = %u\n", i, list->valid); + printk(KERN_INFO " list_address_ops[%u].address" + " = %02x:%02x:%02x:%02x:%02x:%02x\n", i, + list->address[0], list->address[1], + list->address[2], list->address[3], + list->address[4], list->address[5]); + printk(KERN_INFO " list_address_ops[%u].vlan" + " = %u\n", i, be16_to_cpu(list->vlan)); + if (mgidoff) { + printk(KERN_INFO + " list_address_ops[%u].mgid" + " = " VNIC_GID_FMT "\n", i, + VNIC_GID_RAW_ARG((char *)list + mgidoff)); + list = (struct vnic_address_op *) + ((char *)list + sizeof(struct vnic_address_op2)); + } else + list = (struct vnic_address_op *) + ((char *)list + sizeof(struct vnic_address_op)); + i++; + } +} + +static void control_log_config_addrs_pkt(struct vnic_control_packet *pkt, + u8 addresses2) +{ + struct vnic_address_op *list; + int no_address_ops; + + if (addresses2) + printk(KERN_INFO + " pkt_cmd = CMD_CONFIG_ADDRESSES2\n"); + else + printk(KERN_INFO + " pkt_cmd = CMD_CONFIG_ADDRESSES\n"); + printk(KERN_INFO " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + pkt->hdr.pkt_seq_num, pkt->hdr.pkt_retry_count); + if (addresses2) { + printk(KERN_INFO " num_address_ops = %x," + " lan_switch_num = %d\n", + pkt->cmd.config_addresses_req2.num_address_ops, + pkt->cmd.config_addresses_req2.lan_switch_num); + list = (struct vnic_address_op *) + pkt->cmd.config_addresses_req2.list_address_ops; + no_address_ops = pkt->cmd.config_addresses_req2.num_address_ops; + print_config_addr(list, no_address_ops, + offsetof(struct vnic_address_op2, mgid)); + } else { + printk(KERN_INFO " num_address_ops = %x," + " lan_switch_num = %d\n", + pkt->cmd.config_addresses_req.num_address_ops, + pkt->cmd.config_addresses_req.lan_switch_num); + list = pkt->cmd.config_addresses_req.list_address_ops; + no_address_ops = pkt->cmd.config_addresses_req.num_address_ops; + print_config_addr(list, no_address_ops, 0); + } +} + +static void control_log_exch_pools_pkt(struct vnic_control_packet *pkt) +{ + printk(KERN_INFO + " pkt_cmd = CMD_EXCHANGE_POOLS\n"); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + pkt->hdr.pkt_seq_num, + pkt->hdr.pkt_retry_count); + printk(KERN_INFO " datapath = %u\n", + pkt->cmd.exchange_pools_req.data_path); + printk(KERN_INFO " pool_rkey = %08x" + " pool_addr = %llx\n", + be32_to_cpu(pkt->cmd.exchange_pools_req.pool_rkey), + be64_to_cpu(pkt->cmd.exchange_pools_req.pool_addr)); +} + +static void control_log_data_path_pkt(struct vnic_control_packet *pkt) +{ + printk(KERN_INFO + " pkt_cmd = CMD_CONFIG_DATA_PATH\n"); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + pkt->hdr.pkt_seq_num, + pkt->hdr.pkt_retry_count); + printk(KERN_INFO " path_identifier = %llx," + " data_path = %u\n", + pkt->cmd.config_data_path_req.path_identifier, + pkt->cmd.config_data_path_req.data_path); + printk(KERN_INFO + "host config size_recv_pool_entry = %u," + " num_recv_pool_entries = %u\n", + be32_to_cpu(pkt->cmd.config_data_path_req. + host_recv_pool_config.size_recv_pool_entry), + be32_to_cpu(pkt->cmd.config_data_path_req. + host_recv_pool_config.num_recv_pool_entries)); + printk(KERN_INFO + " timeout_before_kick = %u," + " num_recv_pool_entries_before_kick = %u\n", + be32_to_cpu(pkt->cmd.config_data_path_req. + host_recv_pool_config.timeout_before_kick), + be32_to_cpu(pkt->cmd.config_data_path_req. + host_recv_pool_config. + num_recv_pool_entries_before_kick)); + printk(KERN_INFO + " num_recv_pool_bytes_before_kick = %u," + " free_recv_pool_entries_per_update = %u\n", + be32_to_cpu(pkt->cmd.config_data_path_req. + host_recv_pool_config. + num_recv_pool_bytes_before_kick), + be32_to_cpu(pkt->cmd.config_data_path_req. + host_recv_pool_config. + free_recv_pool_entries_per_update)); + printk(KERN_INFO + "eioc config size_recv_pool_entry = %u," + " num_recv_pool_entries = %u\n", + be32_to_cpu(pkt->cmd.config_data_path_req. + eioc_recv_pool_config.size_recv_pool_entry), + be32_to_cpu(pkt->cmd.config_data_path_req. + eioc_recv_pool_config.num_recv_pool_entries)); + printk(KERN_INFO + " timeout_before_kick = %u," + " num_recv_pool_entries_before_kick = %u\n", + be32_to_cpu(pkt->cmd.config_data_path_req. + eioc_recv_pool_config.timeout_before_kick), + be32_to_cpu(pkt->cmd.config_data_path_req. + eioc_recv_pool_config. + num_recv_pool_entries_before_kick)); + printk(KERN_INFO + " num_recv_pool_bytes_before_kick = %u," + " free_recv_pool_entries_per_update = %u\n", + be32_to_cpu(pkt->cmd.config_data_path_req. + eioc_recv_pool_config. + num_recv_pool_bytes_before_kick), + be32_to_cpu(pkt->cmd.config_data_path_req. + eioc_recv_pool_config. + free_recv_pool_entries_per_update)); +} + +static void control_log_init_vnic_pkt(struct vnic_control_packet *pkt) +{ + printk(KERN_INFO + " pkt_cmd = CMD_INIT_VNIC\n"); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + pkt->hdr.pkt_seq_num, + pkt->hdr.pkt_retry_count); + printk(KERN_INFO + " vnic_major_version = %u," + " vnic_minor_version = %u\n", + be16_to_cpu(pkt->cmd.init_vnic_req.vnic_major_version), + be16_to_cpu(pkt->cmd.init_vnic_req.vnic_minor_version)); + if (pkt->hdr.pkt_type == TYPE_REQ) { + printk(KERN_INFO + " vnic_instance = %u," + " num_data_paths = %u\n", + pkt->cmd.init_vnic_req.vnic_instance, + pkt->cmd.init_vnic_req.num_data_paths); + printk(KERN_INFO + " num_address_entries = %u\n", + be16_to_cpu(pkt->cmd.init_vnic_req. + num_address_entries)); + } else { + printk(KERN_INFO + " num_lan_switches = %u," + " num_data_paths = %u\n", + pkt->cmd.init_vnic_rsp.num_lan_switches, + pkt->cmd.init_vnic_rsp.num_data_paths); + printk(KERN_INFO + " num_address_entries = %u," + " features_supported = %08x\n", + be16_to_cpu(pkt->cmd.init_vnic_rsp. + num_address_entries), + be32_to_cpu(pkt->cmd.init_vnic_rsp. + features_supported)); + if (pkt->cmd.init_vnic_rsp.num_lan_switches != 0) { + printk(KERN_INFO + "lan_switch[0] lan_switch_num = %u," + " num_enet_ports = %08x\n", + pkt->cmd.init_vnic_rsp. + lan_switch[0].lan_switch_num, + pkt->cmd.init_vnic_rsp. + lan_switch[0].num_enet_ports); + printk(KERN_INFO + " default_vlan = %u," + " hw_mac_address =" + " %02x:%02x:%02x:%02x:%02x:%02x\n", + be16_to_cpu(pkt->cmd.init_vnic_rsp. + lan_switch[0].default_vlan), + pkt->cmd.init_vnic_rsp.lan_switch[0]. + hw_mac_address[0], + pkt->cmd.init_vnic_rsp.lan_switch[0]. + hw_mac_address[1], + pkt->cmd.init_vnic_rsp.lan_switch[0]. + hw_mac_address[2], + pkt->cmd.init_vnic_rsp.lan_switch[0]. + hw_mac_address[3], + pkt->cmd.init_vnic_rsp.lan_switch[0]. + hw_mac_address[4], + pkt->cmd.init_vnic_rsp.lan_switch[0]. + hw_mac_address[5]); + } + } +} + +static void control_log_control_packet(struct vnic_control_packet *pkt) +{ + switch (pkt->hdr.pkt_type) { + case TYPE_INFO: + printk(KERN_INFO "control_packet: pkt_type = TYPE_INFO\n"); + break; + case TYPE_REQ: + printk(KERN_INFO "control_packet: pkt_type = TYPE_REQ\n"); + break; + case TYPE_RSP: + printk(KERN_INFO "control_packet: pkt_type = TYPE_RSP\n"); + break; + case TYPE_ERR: + printk(KERN_INFO "control_packet: pkt_type = TYPE_ERR\n"); + break; + default: + printk(KERN_INFO "control_packet: pkt_type = UNKNOWN\n"); + } + + switch (pkt->hdr.pkt_cmd) { + case CMD_INIT_VNIC: + control_log_init_vnic_pkt(pkt); + break; + case CMD_CONFIG_DATA_PATH: + control_log_data_path_pkt(pkt); + break; + case CMD_EXCHANGE_POOLS: + control_log_exch_pools_pkt(pkt); + break; + case CMD_CONFIG_ADDRESSES: + control_log_config_addrs_pkt(pkt, 0); + break; + case CMD_CONFIG_ADDRESSES2: + control_log_config_addrs_pkt(pkt, 1); + break; + case CMD_CONFIG_LINK: + control_log_config_link_pkt(pkt); + break; + case CMD_REPORT_STATISTICS: + control_log_report_stats_pkt(pkt); + break; + case CMD_CLEAR_STATISTICS: + printk(KERN_INFO + " pkt_cmd = CMD_CLEAR_STATISTICS\n"); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + pkt->hdr.pkt_seq_num, + pkt->hdr.pkt_retry_count); + break; + case CMD_REPORT_STATUS: + control_log_report_status_pkt(pkt); + + break; + case CMD_RESET: + printk(KERN_INFO + " pkt_cmd = CMD_RESET\n"); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + pkt->hdr.pkt_seq_num, + pkt->hdr.pkt_retry_count); + break; + case CMD_HEARTBEAT: + printk(KERN_INFO + " pkt_cmd = CMD_HEARTBEAT\n"); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + pkt->hdr.pkt_seq_num, + pkt->hdr.pkt_retry_count); + printk(KERN_INFO " hb_interval = %d\n", + be32_to_cpu(pkt->cmd.heartbeat_req.hb_interval)); + break; + default: + printk(KERN_INFO + " pkt_cmd = UNKNOWN (%u)\n", + pkt->hdr.pkt_cmd); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + pkt->hdr.pkt_seq_num, + pkt->hdr.pkt_retry_count); + break; + } +} diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_control.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_control.h new file mode 100644 index 0000000..3cf1fc0 --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_control.h @@ -0,0 +1,180 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_CONTROL_H_INCLUDED +#define VNIC_CONTROL_H_INCLUDED + +#ifdef CONFIG_INFINIBAND_QLGC_VNIC_STATS +#include +#include +#endif /* CONFIG_INFINIBAND_QLGC_VNIC_STATS */ + +#include "vnic_ib.h" +#include "vnic_control_pkt.h" + +enum control_timer_state { + TIMER_IDLE = 0, + TIMER_ACTIVE = 1, + TIMER_EXPIRED = 2 +}; + +enum control_request_state { + REQ_INACTIVE, /* quiet state, all previous operations done + * response is NULL + * last_cmd = CMD_INVALID + * timer_state = IDLE + */ + REQ_POSTED, /* REQ put on send Q + * response is NULL + * last_cmd = command issued + * timer_state = ACTIVE + */ + REQ_SENT, /* Send completed for REQ + * response is NULL + * last_cmd = command issued + * timer_state = ACTIVE + */ + RSP_RECEIVED, /* Received Resp, but no Send completion yet + * response is response buffer received + * last_cmd = command issued + * timer_state = ACTIVE + */ + REQ_COMPLETED, /* all processing for REQ completed, ready to be gotten + * response is response buffer received + * last_cmd = command issued + * timer_state = ACTIVE + */ + REQ_FAILED, /* processing of REQ/RSP failed. + * response is NULL + * last_cmd = CMD_INVALID + * timer_state = IDLE or EXPIRED + * viport has been moved to error state to force + * recovery + */ +}; + +struct control { + struct viport *parent; + struct control_config *config; + struct ib_mr *mr; + struct vnic_ib_conn ib_conn; + struct vnic_control_packet *local_storage; + int send_len; + int recv_len; + u16 maj_ver; + u16 min_ver; + struct vnic_lan_switch_attribs lan_switch; + struct send_io send_io; + struct recv_io *recv_ios; + dma_addr_t send_dma; + dma_addr_t recv_dma; + enum control_timer_state timer_state; + enum control_request_state req_state; + struct timer_list timer; + u8 seq_num; + u8 last_cmd; + struct recv_io *response; + struct recv_io *info; + struct list_head failure_list; + spinlock_t io_lock; + struct completion done; +#ifdef CONFIG_INFINIBAND_QLGC_VNIC_STATS + struct { + cycles_t request_time; /* intermediate value */ + cycles_t response_time; + u32 response_num; + cycles_t response_max; + cycles_t response_min; + u32 timeout_num; + } statistics; +#endif /* CONFIG_INFINIBAND_QLGC_VNIC_STATS */ +}; + +int control_init(struct control *control, struct viport *viport, + struct control_config *config, struct ib_pd *pd); + +void control_cleanup(struct control *control); + +void control_process_async(struct control *control); + +int control_init_vnic_req(struct control *control); +int control_init_vnic_rsp(struct control *control, u32 *features, + u8 *mac_address, u16 *num_addrs, u16 *vlan); + +int control_config_data_path_req(struct control *control, u64 path_id, + struct vnic_recv_pool_config *host, + struct vnic_recv_pool_config *eioc); +int control_config_data_path_rsp(struct control *control, + struct vnic_recv_pool_config *host, + struct vnic_recv_pool_config *eioc, + struct vnic_recv_pool_config *max_host, + struct vnic_recv_pool_config *max_eioc, + struct vnic_recv_pool_config *min_host, + struct vnic_recv_pool_config *min_eioc); + +int control_exchange_pools_req(struct control *control, + u64 addr, u32 rkey); +int control_exchange_pools_rsp(struct control *control, + u64 *addr, u32 *rkey); + +int control_config_link_req(struct control *control, + u16 flags, u16 mtu); +int control_config_link_rsp(struct control *control, + u16 *flags, u16 *mtu); + +int control_config_addrs_req(struct control *control, + struct vnic_address_op2 *addrs, u16 num); +int control_config_addrs_rsp(struct control *control); + +int control_report_statistics_req(struct control *control); +int control_report_statistics_rsp(struct control *control, + struct vnic_cmd_report_stats_rsp *stats); + +int control_heartbeat_req(struct control *control, u32 hb_interval); +int control_heartbeat_rsp(struct control *control); + +int control_reset_req(struct control *control); +int control_reset_rsp(struct control *control); + + +#define control_packet(io) \ + (struct vnic_control_packet *)(io)->virtual_addr +#define control_is_connected(control) \ + (vnic_ib_conn_connected(&((control)->ib_conn))) + +#define control_last_req(control) control_packet(&(control)->send_io) +#define control_features(control) (control)->features_supported + +#define control_get_mac_address(control,addr) \ + memcpy(addr, (control)->lan_switch.hw_mac_address, ETH_ALEN) + +#endif /* VNIC_CONTROL_H_INCLUDED */ diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_control_pkt.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_control_pkt.h new file mode 100644 index 0000000..1fc62fb --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_control_pkt.h @@ -0,0 +1,368 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_CONTROL_PKT_H_INCLUDED +#define VNIC_CONTROL_PKT_H_INCLUDED + +#include +#include + +#define VNIC_MAX_NODENAME_LEN 64 + +struct vnic_connection_data { + u64 path_id; + u8 vnic_instance; + u8 path_num; + u8 nodename[VNIC_MAX_NODENAME_LEN + 1]; + u8 reserved; /* for alignment */ + __be32 features_supported; +}; + +struct vnic_control_header { + u8 pkt_type; + u8 pkt_cmd; + u8 pkt_seq_num; + u8 pkt_retry_count; + u32 reserved; /* for 64-bit alignmnet */ +}; + +/* ptk_type values */ +enum { + TYPE_INFO = 0, + TYPE_REQ = 1, + TYPE_RSP = 2, + TYPE_ERR = 3 +}; + +/* ptk_cmd values */ +enum { + CMD_INVALID = 0, + CMD_INIT_VNIC = 1, + CMD_CONFIG_DATA_PATH = 2, + CMD_EXCHANGE_POOLS = 3, + CMD_CONFIG_ADDRESSES = 4, + CMD_CONFIG_LINK = 5, + CMD_REPORT_STATISTICS = 6, + CMD_CLEAR_STATISTICS = 7, + CMD_REPORT_STATUS = 8, + CMD_RESET = 9, + CMD_HEARTBEAT = 10, + CMD_CONFIG_ADDRESSES2 = 11, +}; + +/* pkt_cmd CMD_INIT_VNIC, pkt_type TYPE_REQ data format */ +struct vnic_cmd_init_vnic_req { + __be16 vnic_major_version; + __be16 vnic_minor_version; + u8 vnic_instance; + u8 num_data_paths; + __be16 num_address_entries; +}; + +/* pkt_cmd CMD_INIT_VNIC, pkt_type TYPE_RSP subdata format */ +struct vnic_lan_switch_attribs { + u8 lan_switch_num; + u8 num_enet_ports; + __be16 default_vlan; + u8 hw_mac_address[ETH_ALEN]; +}; + +/* pkt_cmd CMD_INIT_VNIC, pkt_type TYPE_RSP data format */ +struct vnic_cmd_init_vnic_rsp { + __be16 vnic_major_version; + __be16 vnic_minor_version; + u8 num_lan_switches; + u8 num_data_paths; + __be16 num_address_entries; + __be32 features_supported; + struct vnic_lan_switch_attribs lan_switch[1]; +}; + +/* features_supported values */ +enum { + VNIC_FEAT_IPV4_HEADERS = 0x0001, + VNIC_FEAT_IPV6_HEADERS = 0x0002, + VNIC_FEAT_IPV4_CSUM_RX = 0x0004, + VNIC_FEAT_IPV4_CSUM_TX = 0x0008, + VNIC_FEAT_TCP_CSUM_RX = 0x0010, + VNIC_FEAT_TCP_CSUM_TX = 0x0020, + VNIC_FEAT_UDP_CSUM_RX = 0x0040, + VNIC_FEAT_UDP_CSUM_TX = 0x0080, + VNIC_FEAT_TCP_SEGMENT = 0x0100, + VNIC_FEAT_IPV4_IPSEC_OFFLOAD = 0x0200, + VNIC_FEAT_IPV6_IPSEC_OFFLOAD = 0x0400, + VNIC_FEAT_FCS_PROPAGATE = 0x0800, + VNIC_FEAT_PF_KICK = 0x1000, + VNIC_FEAT_PF_FORCE_ROUTE = 0x2000, + VNIC_FEAT_CHASH_OFFLOAD = 0x4000, + /* host send with immediate data */ + VNIC_FEAT_RDMA_IMMED = 0x8000, + /* host ignore inbound PF_VLAN_INSERT flag */ + VNIC_FEAT_IGNORE_VLAN = 0x10000, + /* host supports IB multicast for inbound Ethernet mcast traffic */ + VNIC_FEAT_INBOUND_IB_MC = 0x20000, +}; + +/* pkt_cmd CMD_CONFIG_DATA_PATH subdata format */ +struct vnic_recv_pool_config { + __be32 size_recv_pool_entry; + __be32 num_recv_pool_entries; + __be32 timeout_before_kick; + __be32 num_recv_pool_entries_before_kick; + __be32 num_recv_pool_bytes_before_kick; + __be32 free_recv_pool_entries_per_update; +}; + +/* pkt_cmd CMD_CONFIG_DATA_PATH data format */ +struct vnic_cmd_config_data_path { + u64 path_identifier; + u8 data_path; + u8 reserved[3]; + struct vnic_recv_pool_config host_recv_pool_config; + struct vnic_recv_pool_config eioc_recv_pool_config; +}; + +/* pkt_cmd CMD_EXCHANGE_POOLS data format */ +struct vnic_cmd_exchange_pools { + u8 data_path; + u8 reserved[3]; + __be32 pool_rkey; + __be64 pool_addr; +}; + +/* pkt_cmd CMD_CONFIG_ADDRESSES subdata format */ +struct vnic_address_op { + __be16 index; + u8 operation; + u8 valid; + u8 address[6]; + __be16 vlan; +}; + +/* pkt_cmd CMD_CONFIG_ADDRESSES2 subdata format */ +struct vnic_address_op2 { + __be16 index; + u8 operation; + u8 valid; + u8 address[6]; + __be16 vlan; + u32 reserved; /* for alignment */ + union ib_gid mgid; /* valid in rsp only if both ends support mcast */ +}; + +/* operation values */ +enum { + VNIC_OP_SET_ENTRY = 0x01, + VNIC_OP_GET_ENTRY = 0x02 +}; + +/* pkt_cmd CMD_CONFIG_ADDRESSES data format */ +struct vnic_cmd_config_addresses { + u8 num_address_ops; + u8 lan_switch_num; + struct vnic_address_op list_address_ops[1]; +}; + +/* pkt_cmd CMD_CONFIG_ADDRESSES2 data format */ +struct vnic_cmd_config_addresses2 { + u8 num_address_ops; + u8 lan_switch_num; + u8 reserved1; + u8 reserved2; + u8 reserved3; + struct vnic_address_op2 list_address_ops[1]; +}; + +/* CMD_CONFIG_LINK data format */ +struct vnic_cmd_config_link { + u8 cmd_flags; + u8 lan_switch_num; + __be16 mtu_size; + __be16 default_vlan; + u8 hw_mac_address[6]; + u32 reserved; /* for alignment */ + /* valid in rsp only if both ends support mcast */ + union ib_gid allmulti_mgid; +}; + +/* cmd_flags values */ +enum { + VNIC_FLAG_ENABLE_NIC = 0x01, + VNIC_FLAG_DISABLE_NIC = 0x02, + VNIC_FLAG_ENABLE_MCAST_ALL = 0x04, + VNIC_FLAG_DISABLE_MCAST_ALL = 0x08, + VNIC_FLAG_ENABLE_PROMISC = 0x10, + VNIC_FLAG_DISABLE_PROMISC = 0x20, + VNIC_FLAG_SET_MTU = 0x40 +}; + +/* pkt_cmd CMD_REPORT_STATISTICS, pkt_type TYPE_REQ data format */ +struct vnic_cmd_report_stats_req { + u8 lan_switch_num; +}; + +/* pkt_cmd CMD_REPORT_STATISTICS, pkt_type TYPE_RSP data format */ +struct vnic_cmd_report_stats_rsp { + u8 lan_switch_num; + u8 reserved[7]; /* for 64-bit alignment */ + __be64 if_in_broadcast_pkts; + __be64 if_in_multicast_pkts; + __be64 if_in_octets; + __be64 if_in_ucast_pkts; + __be64 if_in_nucast_pkts; /* if_in_broadcast_pkts + + if_in_multicast_pkts */ + __be64 if_in_underrun; /* (OID_GEN_RCV_NO_BUFFER) */ + __be64 if_in_errors; /* (OID_GEN_RCV_ERROR) */ + __be64 if_out_errors; /* (OID_GEN_XMIT_ERROR) */ + __be64 if_out_octets; + __be64 if_out_ucast_pkts; + __be64 if_out_multicast_pkts; + __be64 if_out_broadcast_pkts; + __be64 if_out_nucast_pkts; /* if_out_broadcast_pkts + + if_out_multicast_pkts */ + __be64 if_out_ok; /* if_out_nucast_pkts + + if_out_ucast_pkts(OID_GEN_XMIT_OK) */ + __be64 if_in_ok; /* if_in_nucast_pkts + + if_in_ucast_pkts(OID_GEN_RCV_OK) */ + __be64 if_out_ucast_bytes; /* (OID_GEN_DIRECTED_BYTES_XMT) */ + __be64 if_out_multicast_bytes; /* (OID_GEN_MULTICAST_BYTES_XMT) */ + __be64 if_out_broadcast_bytes; /* (OID_GEN_BROADCAST_BYTES_XMT) */ + __be64 if_in_ucast_bytes; /* (OID_GEN_DIRECTED_BYTES_RCV) */ + __be64 if_in_multicast_bytes; /* (OID_GEN_MULTICAST_BYTES_RCV) */ + __be64 if_in_broadcast_bytes; /* (OID_GEN_BROADCAST_BYTES_RCV) */ + __be64 ethernet_status; /* OID_GEN_MEDIA_CONNECT_STATUS) */ +}; + +/* pkt_cmd CMD_CLEAR_STATISTICS data format */ +struct vnic_cmd_clear_statistics { + u8 lan_switch_num; +}; + +/* pkt_cmd CMD_REPORT_STATUS data format */ +struct vnic_cmd_report_status { + u8 lan_switch_num; + u8 is_fatal; + u8 reserved[2]; /* for 32-bit alignment */ + __be32 status_number; + __be32 status_info; + u8 file_name[32]; + u8 routine[32]; + __be32 line_num; + __be32 error_parameter; + u8 desc_text[128]; +}; + +/* pkt_cmd CMD_HEARTBEAT data format */ +struct vnic_cmd_heartbeat { + __be32 hb_interval; +}; + +enum { + VNIC_STATUS_LINK_UP = 1, + VNIC_STATUS_LINK_DOWN = 2, + VNIC_STATUS_ENET_AGGREGATION_CHANGE = 3, + VNIC_STATUS_EIOC_SHUTDOWN = 4, + VNIC_STATUS_CONTROL_ERROR = 5, + VNIC_STATUS_EIOC_ERROR = 6 +}; + +#define VNIC_MAX_CONTROLPKTSZ 256 +#define VNIC_MAX_CONTROLDATASZ \ + (VNIC_MAX_CONTROLPKTSZ - sizeof(struct vnic_control_header)) + +struct vnic_control_packet { + struct vnic_control_header hdr; + union { + struct vnic_cmd_init_vnic_req init_vnic_req; + struct vnic_cmd_init_vnic_rsp init_vnic_rsp; + struct vnic_cmd_config_data_path config_data_path_req; + struct vnic_cmd_config_data_path config_data_path_rsp; + struct vnic_cmd_exchange_pools exchange_pools_req; + struct vnic_cmd_exchange_pools exchange_pools_rsp; + struct vnic_cmd_config_addresses config_addresses_req; + struct vnic_cmd_config_addresses2 config_addresses_req2; + struct vnic_cmd_config_addresses config_addresses_rsp; + struct vnic_cmd_config_addresses2 config_addresses_rsp2; + struct vnic_cmd_config_link config_link_req; + struct vnic_cmd_config_link config_link_rsp; + struct vnic_cmd_report_stats_req report_statistics_req; + struct vnic_cmd_report_stats_rsp report_statistics_rsp; + struct vnic_cmd_clear_statistics clear_statistics_req; + struct vnic_cmd_clear_statistics clear_statistics_rsp; + struct vnic_cmd_report_status report_status; + struct vnic_cmd_heartbeat heartbeat_req; + struct vnic_cmd_heartbeat heartbeat_rsp; + + char cmd_data[VNIC_MAX_CONTROLDATASZ]; + } cmd; +}; + +union ib_gid_cpu { + u8 raw[16]; + struct { + u64 subnet_prefix; + u64 interface_id; + } global; +}; + +static inline void bswap_ib_gid(union ib_gid *mgid1, union ib_gid_cpu *mgid2) +{ + /* swap hi & low */ + __be64 low = mgid1->global.subnet_prefix; + mgid2->global.subnet_prefix = be64_to_cpu(mgid1->global.interface_id); + mgid2->global.interface_id = be64_to_cpu(low); +} + +#define VNIC_GID_FMT "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x" + +#define VNIC_GID_RAW_ARG(gid) be16_to_cpu(*(__be16 *)&(gid)[0]), \ + be16_to_cpu(*(__be16 *)&(gid)[2]), \ + be16_to_cpu(*(__be16 *)&(gid)[4]), \ + be16_to_cpu(*(__be16 *)&(gid)[6]), \ + be16_to_cpu(*(__be16 *)&(gid)[8]), \ + be16_to_cpu(*(__be16 *)&(gid)[10]), \ + be16_to_cpu(*(__be16 *)&(gid)[12]), \ + be16_to_cpu(*(__be16 *)&(gid)[14]) + + +/* These defines are used to figure out how many address entries can be passed + * in config_addresses request. + */ +#define MAX_CONFIG_ADDR_ENTRIES \ + ((VNIC_MAX_CONTROLDATASZ - (sizeof(struct vnic_cmd_config_addresses) \ + - sizeof(struct vnic_address_op)))/sizeof(struct vnic_address_op)) +#define MAX_CONFIG_ADDR_ENTRIES2 \ + ((VNIC_MAX_CONTROLDATASZ - (sizeof(struct vnic_cmd_config_addresses2) \ + - sizeof(struct vnic_address_op2)))/sizeof(struct vnic_address_op2)) + + +#endif /* VNIC_CONTROL_PKT_H_INCLUDED */ From ramachandra.kuchimanchi at qlogic.com Wed Apr 30 10:16:54 2008 From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K) Date: Wed, 30 Apr 2008 22:46:54 +0530 Subject: [ofa-general] [PATCH 02/13] QLogic VNIC: Netpath - abstraction of connection to EVIC/VEx In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain> References: <20080430171028.31725.86190.stgit@localhost.localdomain> Message-ID: <20080430171654.31725.5636.stgit@localhost.localdomain> From: Ramachandra K This patch implements the netpath layer of QLogic VNIC. Netpath is an abstraction of a connection to EVIC. It primarily includes the implementation which maintains the timers to monitor the status of the connection to EVIC/VEx. Signed-off-by: Poornima Kamath Signed-off-by: Amar Mudrankit --- drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.c | 112 +++++++++++++++++++++++ drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.h | 80 ++++++++++++++++ 2 files changed, 192 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.c create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.h diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.c b/drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.c new file mode 100644 index 0000000..820b996 --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.c @@ -0,0 +1,112 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include + +#include "vnic_util.h" +#include "vnic_main.h" +#include "vnic_viport.h" +#include "vnic_netpath.h" + +static void vnic_npevent_timeout(unsigned long data) +{ + struct netpath *netpath = (struct netpath *)data; + + if (netpath->second_bias) + vnic_npevent_queue_evt(netpath, VNIC_SECNP_TIMEREXPIRED); + else + vnic_npevent_queue_evt(netpath, VNIC_PRINP_TIMEREXPIRED); +} + +void netpath_timer(struct netpath *netpath, int timeout) +{ + if (netpath->timer_state == NETPATH_TS_ACTIVE) + del_timer_sync(&netpath->timer); + if (timeout) { + init_timer(&netpath->timer); + netpath->timer_state = NETPATH_TS_ACTIVE; + netpath->timer.expires = jiffies + timeout; + netpath->timer.data = (unsigned long)netpath; + netpath->timer.function = vnic_npevent_timeout; + add_timer(&netpath->timer); + } else + vnic_npevent_timeout((unsigned long)netpath); +} + +void netpath_timer_stop(struct netpath *netpath) +{ + if (netpath->timer_state != NETPATH_TS_ACTIVE) + return; + del_timer_sync(&netpath->timer); + if (netpath->second_bias) + vnic_npevent_dequeue_evt(netpath, VNIC_SECNP_TIMEREXPIRED); + else + vnic_npevent_dequeue_evt(netpath, VNIC_PRINP_TIMEREXPIRED); + + netpath->timer_state = NETPATH_TS_IDLE; +} + +void netpath_free(struct netpath *netpath) +{ + if (!netpath->viport) + return; + viport_free(netpath->viport); + netpath->viport = NULL; + sysfs_remove_group(&netpath->dev_info.dev.kobj, + &vnic_path_attr_group); + device_unregister(&netpath->dev_info.dev); + wait_for_completion(&netpath->dev_info.released); +} + +void netpath_init(struct netpath *netpath, struct vnic *vnic, + int second_bias) +{ + netpath->parent = vnic; + netpath->carrier = 0; + netpath->viport = NULL; + netpath->second_bias = second_bias; + netpath->timer_state = NETPATH_TS_IDLE; + init_timer(&netpath->timer); +} + +const char *netpath_to_string(struct vnic *vnic, struct netpath *netpath) +{ + if (!netpath) + return "NULL"; + else if (netpath == &vnic->primary_path) + return "PRIMARY"; + else if (netpath == &vnic->secondary_path) + return "SECONDARY"; + else + return "UNKNOWN"; +} diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.h new file mode 100644 index 0000000..1259ae0 --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.h @@ -0,0 +1,80 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_NETPATH_H_INCLUDED +#define VNIC_NETPATH_H_INCLUDED + +#include + +#include "vnic_sys.h" + +struct viport; +struct vnic; + +enum netpath_ts { + NETPATH_TS_IDLE = 0, + NETPATH_TS_ACTIVE = 1, + NETPATH_TS_EXPIRED = 2 +}; + +struct netpath { + int carrier; + struct vnic *parent; + struct viport *viport; + size_t path_idx; + u32 connect_time; + int second_bias; + u8 is_primary_path; + u8 delay_reconnect; + int cleanup_started; + struct timer_list timer; + enum netpath_ts timer_state; + struct dev_info dev_info; +}; + +void netpath_init(struct netpath *netpath, struct vnic *vnic, + int second_bias); +void netpath_free(struct netpath *netpath); + +void netpath_timer(struct netpath *netpath, int timeout); +void netpath_timer_stop(struct netpath *netpath); + +const char *netpath_to_string(struct vnic *vnic, struct netpath *netpath); + +#define netpath_get_hw_addr(netpath, address) \ + viport_get_hw_addr((netpath)->viport, address) +#define netpath_is_connected(netpath) \ + (netpath->state == NETPATH_CONNECTED) +#define netpath_can_tx_csum(netpath) \ + viport_can_tx_csum(netpath->viport) + +#endif /* VNIC_NETPATH_H_INCLUDED */ From ramachandra.kuchimanchi at qlogic.com Wed Apr 30 10:18:25 2008 From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K) Date: Wed, 30 Apr 2008 22:48:25 +0530 Subject: [ofa-general] [PATCH 05/13] QLogic VNIC: Implementation of Data path of communication protocol In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain> References: <20080430171028.31725.86190.stgit@localhost.localdomain> Message-ID: <20080430171824.31725.5212.stgit@localhost.localdomain> From: Ramachandra K This patch implements the actual data transfer part of the communication protocol with the EVIC/VEx. RDMA of ethernet packets is implemented in here. Signed-off-by: Poornima Kamath Signed-off-by: Amar Mudrankit --- drivers/infiniband/ulp/qlgc_vnic/vnic_data.c | 1473 +++++++++++++++++++++++ drivers/infiniband/ulp/qlgc_vnic/vnic_data.h | 206 +++ drivers/infiniband/ulp/qlgc_vnic/vnic_trailer.h | 103 ++ 3 files changed, 1782 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_data.c create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_data.h create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_trailer.h diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_data.c b/drivers/infiniband/ulp/qlgc_vnic/vnic_data.c new file mode 100644 index 0000000..599e716 --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_data.c @@ -0,0 +1,1473 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include + +#include "vnic_util.h" +#include "vnic_viport.h" +#include "vnic_main.h" +#include "vnic_data.h" +#include "vnic_trailer.h" +#include "vnic_stats.h" + +static void data_received_kick(struct io *io); +static void data_xmit_complete(struct io *io); + +static void mc_data_recv_routine(struct io *io); +static void mc_data_post_recvs(struct mc_data *mc_data); +static void mc_data_recv_to_skbuff(struct viport *viport, struct sk_buff *skb, + struct viport_trailer *trailer); + +static u32 min_rcv_skb = 60; +module_param(min_rcv_skb, int, 0444); +MODULE_PARM_DESC(min_rcv_skb, "Packets of size (in bytes) less than" + " or equal this value will be copied during receive." + " Default 60"); + +static u32 min_xmt_skb = 60; +module_param(min_xmt_skb, int, 0444); +MODULE_PARM_DESC(min_xmit_skb, "Packets of size (in bytes) less than" + " or equal to this value will be copied during transmit." + "Default 60"); + +int data_init(struct data *data, struct viport *viport, + struct data_config *config, struct ib_pd *pd) +{ + DATA_FUNCTION("data_init()\n"); + + data->parent = viport; + data->config = config; + data->ib_conn.viport = viport; + data->ib_conn.ib_config = &config->ib_config; + data->ib_conn.state = IB_CONN_UNINITTED; + data->ib_conn.callback_thread = NULL; + data->ib_conn.callback_thread_end = 0; + + if ((min_xmt_skb < 60) || (min_xmt_skb > 9000)) { + DATA_ERROR("min_xmt_skb (%d) must be between 60 and 9000\n", + min_xmt_skb); + goto failure; + } + if (vnic_ib_conn_init(&data->ib_conn, viport, pd, + &config->ib_config)) { + DATA_ERROR("Data IB connection initialization failed\n"); + goto failure; + } + data->mr = ib_get_dma_mr(pd, + IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_READ | + IB_ACCESS_REMOTE_WRITE); + if (IS_ERR(data->mr)) { + DATA_ERROR("failed to register memory for" + " data connection\n"); + goto destroy_conn; + } + + data->ib_conn.cm_id = ib_create_cm_id(viport->config->ibdev, + vnic_ib_cm_handler, + &data->ib_conn); + + if (IS_ERR(data->ib_conn.cm_id)) { + DATA_ERROR("creating data CM ID failed\n"); + goto dereg_mr; + } + + return 0; + +dereg_mr: + ib_dereg_mr(data->mr); +destroy_conn: + completion_callback_cleanup(&data->ib_conn); + ib_destroy_qp(data->ib_conn.qp); + ib_destroy_cq(data->ib_conn.cq); +failure: + return -1; +} + +static void data_post_recvs(struct data *data) +{ + unsigned long flags; + int i = 0; + + DATA_FUNCTION("data_post_recvs()\n"); + spin_lock_irqsave(&data->recv_ios_lock, flags); + while (!list_empty(&data->recv_ios)) { + struct io *io = list_entry(data->recv_ios.next, + struct io, list_ptrs); + struct recv_io *recv_io = (struct recv_io *)io; + + list_del(&recv_io->io.list_ptrs); + spin_unlock_irqrestore(&data->recv_ios_lock, flags); + if (vnic_ib_post_recv(&data->ib_conn, &recv_io->io)) { + viport_failure(data->parent); + return; + } + i++; + spin_lock_irqsave(&data->recv_ios_lock, flags); + } + spin_unlock_irqrestore(&data->recv_ios_lock, flags); + DATA_INFO("data posted %d %p\n", i, &data->recv_ios); +} + +static void data_init_pool_work_reqs(struct data *data, + struct recv_io *recv_io) +{ + struct recv_pool *recv_pool = &data->recv_pool; + struct xmit_pool *xmit_pool = &data->xmit_pool; + struct rdma_io *rdma_io; + struct rdma_dest *rdma_dest; + dma_addr_t xmit_dma; + u8 *xmit_data; + unsigned int i; + + INIT_LIST_HEAD(&data->recv_ios); + spin_lock_init(&data->recv_ios_lock); + spin_lock_init(&data->xmit_buf_lock); + for (i = 0; i < data->config->num_recvs; i++) { + recv_io[i].io.viport = data->parent; + recv_io[i].io.routine = data_received_kick; + recv_io[i].list.addr = data->region_data_dma; + recv_io[i].list.length = 4; + recv_io[i].list.lkey = data->mr->lkey; + + recv_io[i].io.rwr.wr_id = (u64)&recv_io[i].io; + recv_io[i].io.rwr.sg_list = &recv_io[i].list; + recv_io[i].io.rwr.num_sge = 1; + + list_add(&recv_io[i].io.list_ptrs, &data->recv_ios); + } + + INIT_LIST_HEAD(&recv_pool->avail_recv_bufs); + for (i = 0; i < recv_pool->pool_sz; i++) { + rdma_dest = &recv_pool->recv_bufs[i]; + list_add(&rdma_dest->list_ptrs, + &recv_pool->avail_recv_bufs); + } + + xmit_dma = xmit_pool->xmitdata_dma; + xmit_data = xmit_pool->xmit_data; + + for (i = 0; i < xmit_pool->num_xmit_bufs; i++) { + rdma_io = &xmit_pool->xmit_bufs[i]; + rdma_io->index = i; + rdma_io->io.viport = data->parent; + rdma_io->io.routine = data_xmit_complete; + + rdma_io->list[0].lkey = data->mr->lkey; + rdma_io->list[1].lkey = data->mr->lkey; + rdma_io->io.swr.wr_id = (u64)rdma_io; + rdma_io->io.swr.sg_list = rdma_io->list; + rdma_io->io.swr.num_sge = 2; + rdma_io->io.swr.opcode = IB_WR_RDMA_WRITE; + rdma_io->io.swr.send_flags = IB_SEND_SIGNALED; + rdma_io->io.type = RDMA; + + rdma_io->data = xmit_data; + rdma_io->data_dma = xmit_dma; + + xmit_data += ALIGN(min_xmt_skb, VIPORT_TRAILER_ALIGNMENT); + xmit_dma += ALIGN(min_xmt_skb, VIPORT_TRAILER_ALIGNMENT); + rdma_io->trailer = (struct viport_trailer *)xmit_data; + rdma_io->trailer_dma = xmit_dma; + xmit_data += sizeof(struct viport_trailer); + xmit_dma += sizeof(struct viport_trailer); + } + + xmit_pool->rdma_rkey = data->mr->rkey; + xmit_pool->rdma_addr = xmit_pool->buf_pool_dma; +} + +static void data_init_free_bufs_swrs(struct data *data) +{ + struct rdma_io *rdma_io; + struct send_io *send_io; + + rdma_io = &data->free_bufs_io; + rdma_io->io.viport = data->parent; + rdma_io->io.routine = NULL; + + rdma_io->list[0].lkey = data->mr->lkey; + + rdma_io->io.swr.wr_id = (u64)rdma_io; + rdma_io->io.swr.sg_list = rdma_io->list; + rdma_io->io.swr.num_sge = 1; + rdma_io->io.swr.opcode = IB_WR_RDMA_WRITE; + rdma_io->io.swr.send_flags = IB_SEND_SIGNALED; + rdma_io->io.type = RDMA; + + send_io = &data->kick_io; + send_io->io.viport = data->parent; + send_io->io.routine = NULL; + + send_io->list.addr = data->region_data_dma; + send_io->list.length = 0; + send_io->list.lkey = data->mr->lkey; + + send_io->io.swr.wr_id = (u64)send_io; + send_io->io.swr.sg_list = &send_io->list; + send_io->io.swr.num_sge = 1; + send_io->io.swr.opcode = IB_WR_SEND; + send_io->io.swr.send_flags = IB_SEND_SIGNALED; + send_io->io.type = SEND; +} + +static int data_init_buf_pools(struct data *data) +{ + struct recv_pool *recv_pool = &data->recv_pool; + struct xmit_pool *xmit_pool = &data->xmit_pool; + struct viport *viport = data->parent; + + recv_pool->buf_pool_len = + sizeof(struct buff_pool_entry) * recv_pool->eioc_pool_sz; + + recv_pool->buf_pool = kzalloc(recv_pool->buf_pool_len, GFP_KERNEL); + + if (!recv_pool->buf_pool) { + DATA_ERROR("failed allocating %d bytes" + " for recv pool bufpool\n", + recv_pool->buf_pool_len); + goto failure; + } + + recv_pool->buf_pool_dma = + ib_dma_map_single(viport->config->ibdev, + recv_pool->buf_pool, recv_pool->buf_pool_len, + DMA_TO_DEVICE); + + if (ib_dma_mapping_error(viport->config->ibdev, recv_pool->buf_pool_dma)) { + DATA_ERROR("xmit buf_pool dma map error\n"); + goto free_recv_pool; + } + + xmit_pool->buf_pool_len = + sizeof(struct buff_pool_entry) * xmit_pool->pool_sz; + xmit_pool->buf_pool = kzalloc(xmit_pool->buf_pool_len, GFP_KERNEL); + + if (!xmit_pool->buf_pool) { + DATA_ERROR("failed allocating %d bytes" + " for xmit pool bufpool\n", + xmit_pool->buf_pool_len); + goto unmap_recv_pool; + } + + xmit_pool->buf_pool_dma = + ib_dma_map_single(viport->config->ibdev, + xmit_pool->buf_pool, xmit_pool->buf_pool_len, + DMA_FROM_DEVICE); + + if (ib_dma_mapping_error(viport->config->ibdev, xmit_pool->buf_pool_dma)) { + DATA_ERROR("xmit buf_pool dma map error\n"); + goto free_xmit_pool; + } + + xmit_pool->xmit_data = kzalloc(xmit_pool->xmitdata_len, GFP_KERNEL); + + if (!xmit_pool->xmit_data) { + DATA_ERROR("failed allocating %d bytes for xmit data\n", + xmit_pool->xmitdata_len); + goto unmap_xmit_pool; + } + + xmit_pool->xmitdata_dma = + ib_dma_map_single(viport->config->ibdev, + xmit_pool->xmit_data, xmit_pool->xmitdata_len, + DMA_TO_DEVICE); + + if (ib_dma_mapping_error(viport->config->ibdev, xmit_pool->xmitdata_dma)) { + DATA_ERROR("xmit data dma map error\n"); + goto free_xmit_data; + } + + return 0; + +free_xmit_data: + kfree(xmit_pool->xmit_data); +unmap_xmit_pool: + ib_dma_unmap_single(data->parent->config->ibdev, + xmit_pool->buf_pool_dma, + xmit_pool->buf_pool_len, DMA_FROM_DEVICE); +free_xmit_pool: + kfree(xmit_pool->buf_pool); +unmap_recv_pool: + ib_dma_unmap_single(data->parent->config->ibdev, + recv_pool->buf_pool_dma, + recv_pool->buf_pool_len, DMA_TO_DEVICE); +free_recv_pool: + kfree(recv_pool->buf_pool); +failure: + return -1; +} + +static void data_init_xmit_pool(struct data *data) +{ + struct xmit_pool *xmit_pool = &data->xmit_pool; + + xmit_pool->pool_sz = + be32_to_cpu(data->eioc_pool_parms.num_recv_pool_entries); + xmit_pool->buffer_sz = + be32_to_cpu(data->eioc_pool_parms.size_recv_pool_entry); + + xmit_pool->notify_count = 0; + xmit_pool->notify_bundle = data->config->notify_bundle; + xmit_pool->next_xmit_pool = 0; + xmit_pool->num_xmit_bufs = xmit_pool->notify_bundle * 2; + xmit_pool->next_xmit_buf = 0; + xmit_pool->last_comp_buf = xmit_pool->num_xmit_bufs - 1; + /* This assumes that data_init_recv_pool has been called + * before. + */ + data->max_mtu = MAX_PAYLOAD(min((data)->recv_pool.buffer_sz, + (data)->xmit_pool.buffer_sz)) - VLAN_ETH_HLEN; + + xmit_pool->kick_count = 0; + xmit_pool->kick_byte_count = 0; + + xmit_pool->send_kicks = + be32_to_cpu(data-> + eioc_pool_parms.num_recv_pool_entries_before_kick) + || be32_to_cpu(data-> + eioc_pool_parms.num_recv_pool_bytes_before_kick); + xmit_pool->kick_bundle = + be32_to_cpu(data-> + eioc_pool_parms.num_recv_pool_entries_before_kick); + xmit_pool->kick_byte_bundle = + be32_to_cpu(data-> + eioc_pool_parms.num_recv_pool_bytes_before_kick); + + xmit_pool->need_buffers = 1; + + xmit_pool->xmitdata_len = + BUFFER_SIZE(min_xmt_skb) * xmit_pool->num_xmit_bufs; +} + +static void data_init_recv_pool(struct data *data) +{ + struct recv_pool *recv_pool = &data->recv_pool; + + recv_pool->pool_sz = data->config->host_recv_pool_entries; + recv_pool->eioc_pool_sz = + be32_to_cpu(data->host_pool_parms.num_recv_pool_entries); + if (recv_pool->pool_sz > recv_pool->eioc_pool_sz) + recv_pool->pool_sz = + be32_to_cpu(data->host_pool_parms.num_recv_pool_entries); + + recv_pool->buffer_sz = + be32_to_cpu(data->host_pool_parms.size_recv_pool_entry); + + recv_pool->sz_free_bundle = + be32_to_cpu(data-> + host_pool_parms.free_recv_pool_entries_per_update); + recv_pool->num_free_bufs = 0; + recv_pool->num_posted_bufs = 0; + + recv_pool->next_full_buf = 0; + recv_pool->next_free_buf = 0; + recv_pool->kick_on_free = 0; +} + +int data_connect(struct data *data) +{ + struct xmit_pool *xmit_pool = &data->xmit_pool; + struct recv_pool *recv_pool = &data->recv_pool; + struct recv_io *recv_io; + unsigned int sz; + struct viport *viport = data->parent; + + DATA_FUNCTION("data_connect()\n"); + + /* Do not interchange the order of the functions + * called below as this will affect the MAX MTU + * calculation + */ + + data_init_recv_pool(data); + data_init_xmit_pool(data); + + sz = sizeof(struct rdma_dest) * recv_pool->pool_sz + + sizeof(struct recv_io) * data->config->num_recvs + + sizeof(struct rdma_io) * xmit_pool->num_xmit_bufs; + + data->local_storage = vmalloc(sz); + + if (!data->local_storage) { + DATA_ERROR("failed allocating %d bytes" + " local storage\n", sz); + goto out; + } + + memset(data->local_storage, 0, sz); + + recv_pool->recv_bufs = (struct rdma_dest *)data->local_storage; + sz = sizeof(struct rdma_dest) * recv_pool->pool_sz; + + recv_io = (struct recv_io *)(data->local_storage + sz); + sz += sizeof(struct recv_io) * data->config->num_recvs; + + xmit_pool->xmit_bufs = (struct rdma_io *)(data->local_storage + sz); + data->region_data = kzalloc(4, GFP_KERNEL); + + if (!data->region_data) { + DATA_ERROR("failed to alloc memory for region data\n"); + goto free_local_storage; + } + + data->region_data_dma = + ib_dma_map_single(viport->config->ibdev, + data->region_data, 4, DMA_BIDIRECTIONAL); + + if (ib_dma_mapping_error(viport->config->ibdev, data->region_data_dma)) { + DATA_ERROR("region data dma map error\n"); + goto free_region_data; + } + + if (data_init_buf_pools(data)) + goto unmap_region_data; + + data_init_free_bufs_swrs(data); + data_init_pool_work_reqs(data, recv_io); + + data_post_recvs(data); + + if (vnic_ib_cm_connect(&data->ib_conn)) + goto unmap_region_data; + + return 0; + +unmap_region_data: + ib_dma_unmap_single(data->parent->config->ibdev, + data->region_data_dma, 4, DMA_BIDIRECTIONAL); +free_region_data: + kfree(data->region_data); +free_local_storage: + vfree(data->local_storage); +out: + return -1; +} + +static void data_add_free_buffer(struct data *data, int index, + struct rdma_dest *rdma_dest) +{ + struct recv_pool *pool = &data->recv_pool; + struct buff_pool_entry *bpe; + dma_addr_t vaddr_dma; + + DATA_FUNCTION("data_add_free_buffer()\n"); + rdma_dest->trailer->connection_hash_and_valid = 0; + ib_dma_sync_single_for_cpu(data->parent->config->ibdev, + pool->buf_pool_dma, pool->buf_pool_len, + DMA_TO_DEVICE); + + bpe = &pool->buf_pool[index]; + bpe->rkey = cpu_to_be32(data->mr->rkey); + vaddr_dma = ib_dma_map_single(data->parent->config->ibdev, + rdma_dest->data, pool->buffer_sz, + DMA_FROM_DEVICE); + if (ib_dma_mapping_error(data->parent->config->ibdev, vaddr_dma)) { + DATA_ERROR("rdma_dest->data dma map error\n"); + goto failure; + } + bpe->remote_addr = cpu_to_be64(vaddr_dma); + bpe->valid = (u32) (rdma_dest - &pool->recv_bufs[0]) + 1; + ++pool->num_free_bufs; +failure: + ib_dma_sync_single_for_device(data->parent->config->ibdev, + pool->buf_pool_dma, pool->buf_pool_len, + DMA_TO_DEVICE); +} + +/* NOTE: this routine is not reentrant */ +static void data_alloc_buffers(struct data *data, int initial_allocation) +{ + struct recv_pool *pool = &data->recv_pool; + struct rdma_dest *rdma_dest; + struct sk_buff *skb; + int index; + + DATA_FUNCTION("data_alloc_buffers()\n"); + index = ADD(pool->next_free_buf, pool->num_free_bufs, + pool->eioc_pool_sz); + + while (!list_empty(&pool->avail_recv_bufs)) { + rdma_dest = + list_entry(pool->avail_recv_bufs.next, + struct rdma_dest, list_ptrs); + if (!rdma_dest->skb) { + if (initial_allocation) + skb = alloc_skb(pool->buffer_sz + 2, + GFP_KERNEL); + else + skb = dev_alloc_skb(pool->buffer_sz + 2); + if (!skb) + break; + skb_reserve(skb, 2); + skb_put(skb, pool->buffer_sz); + rdma_dest->skb = skb; + rdma_dest->data = skb->data; + rdma_dest->trailer = + (struct viport_trailer *)(rdma_dest->data + + pool->buffer_sz - + sizeof(struct + viport_trailer)); + } + rdma_dest->trailer->connection_hash_and_valid = 0; + + list_del_init(&rdma_dest->list_ptrs); + + data_add_free_buffer(data, index, rdma_dest); + index = NEXT(index, pool->eioc_pool_sz); + } +} + +static void data_send_kick_message(struct data *data) +{ + struct xmit_pool *pool = &data->xmit_pool; + DATA_FUNCTION("data_send_kick_message()\n"); + /* stop timer for bundle_timeout */ + if (data->kick_timer_on) { + del_timer(&data->kick_timer); + data->kick_timer_on = 0; + } + pool->kick_count = 0; + pool->kick_byte_count = 0; + + /* TODO: keep track of when kick is outstanding, and + * don't reuse until complete + */ + if (vnic_ib_post_send(&data->ib_conn, &data->free_bufs_io.io)) { + DATA_ERROR("failed to post send\n"); + viport_failure(data->parent); + } +} + +static void data_send_free_recv_buffers(struct data *data) +{ + struct recv_pool *pool = &data->recv_pool; + struct ib_send_wr *swr = &data->free_bufs_io.io.swr; + + int bufs_sent = 0; + u64 rdma_addr; + u32 offset; + u32 sz; + unsigned int num_to_send, next_increment; + + DATA_FUNCTION("data_send_free_recv_buffers()\n"); + + for (num_to_send = pool->sz_free_bundle; + num_to_send <= pool->num_free_bufs; + num_to_send += pool->sz_free_bundle) { + /* handle multiple bundles as one when possible. */ + next_increment = num_to_send + pool->sz_free_bundle; + if ((next_increment <= pool->num_free_bufs) + && (pool->next_free_buf + next_increment <= + pool->eioc_pool_sz)) + continue; + + offset = pool->next_free_buf * + sizeof(struct buff_pool_entry); + sz = num_to_send * sizeof(struct buff_pool_entry); + rdma_addr = pool->eioc_rdma_addr + offset; + swr->sg_list->length = sz; + swr->sg_list->addr = pool->buf_pool_dma + offset; + swr->wr.rdma.remote_addr = rdma_addr; + + if (vnic_ib_post_send(&data->ib_conn, + &data->free_bufs_io.io)) { + DATA_ERROR("failed to post send\n"); + viport_failure(data->parent); + return; + } + INC(pool->next_free_buf, num_to_send, pool->eioc_pool_sz); + pool->num_free_bufs -= num_to_send; + pool->num_posted_bufs += num_to_send; + bufs_sent = 1; + } + + if (bufs_sent) { + if (pool->kick_on_free) + data_send_kick_message(data); + } + if (pool->num_posted_bufs == 0) { + struct vnic *vnic = data->parent->vnic; + + if (vnic->current_path == &vnic->primary_path) + DATA_ERROR("%s: primary path: " + "unable to allocate receive buffers\n", + vnic->config->name); + else if (vnic->current_path == &vnic->secondary_path) + DATA_ERROR("%s: secondary path: " + "unable to allocate receive buffers\n", + vnic->config->name); + data->ib_conn.state = IB_CONN_ERRORED; + viport_failure(data->parent); + } +} + +void data_connected(struct data *data) +{ + DATA_FUNCTION("data_connected()\n"); + data->free_bufs_io.io.swr.wr.rdma.rkey = + data->recv_pool.eioc_rdma_rkey; + data_alloc_buffers(data, 1); + data_send_free_recv_buffers(data); + data->connected = 1; +} + +void data_disconnect(struct data *data) +{ + struct xmit_pool *xmit_pool = &data->xmit_pool; + struct recv_pool *recv_pool = &data->recv_pool; + unsigned int i; + + DATA_FUNCTION("data_disconnect()\n"); + + data->connected = 0; + if (data->kick_timer_on) { + del_timer_sync(&data->kick_timer); + data->kick_timer_on = 0; + } + + if (ib_send_cm_dreq(data->ib_conn.cm_id, NULL, 0)) + DATA_ERROR("data CM DREQ sending failed\n"); + data->ib_conn.state = IB_CONN_DISCONNECTED; + + completion_callback_cleanup(&data->ib_conn); + + for (i = 0; i < xmit_pool->num_xmit_bufs; i++) { + if (xmit_pool->xmit_bufs[i].skb) + dev_kfree_skb(xmit_pool->xmit_bufs[i].skb); + xmit_pool->xmit_bufs[i].skb = NULL; + + } + for (i = 0; i < recv_pool->pool_sz; i++) { + if (data->recv_pool.recv_bufs[i].skb) + dev_kfree_skb(recv_pool->recv_bufs[i].skb); + recv_pool->recv_bufs[i].skb = NULL; + } + vfree(data->local_storage); + if (data->region_data) { + ib_dma_unmap_single(data->parent->config->ibdev, + data->region_data_dma, 4, + DMA_BIDIRECTIONAL); + kfree(data->region_data); + } + + if (recv_pool->buf_pool) { + ib_dma_unmap_single(data->parent->config->ibdev, + recv_pool->buf_pool_dma, + recv_pool->buf_pool_len, DMA_TO_DEVICE); + kfree(recv_pool->buf_pool); + } + + if (xmit_pool->buf_pool) { + ib_dma_unmap_single(data->parent->config->ibdev, + xmit_pool->buf_pool_dma, + xmit_pool->buf_pool_len, DMA_FROM_DEVICE); + kfree(xmit_pool->buf_pool); + } + + if (xmit_pool->xmit_data) { + ib_dma_unmap_single(data->parent->config->ibdev, + xmit_pool->xmitdata_dma, + xmit_pool->xmitdata_len, DMA_TO_DEVICE); + kfree(xmit_pool->xmit_data); + } +} + +void data_cleanup(struct data *data) +{ + ib_destroy_cm_id(data->ib_conn.cm_id); + + /* Completion callback cleanup called again. + * This is to cleanup the threads in case there is an + * error before state LINK_DATACONNECT due to which + * data_disconnect is not called. + */ + completion_callback_cleanup(&data->ib_conn); + ib_destroy_qp(data->ib_conn.qp); + ib_destroy_cq(data->ib_conn.cq); + ib_dereg_mr(data->mr); + +} + +static int data_alloc_xmit_buffer(struct data *data, struct sk_buff *skb, + struct buff_pool_entry **pp_bpe, + struct rdma_io **pp_rdma_io, + int *last) +{ + struct xmit_pool *pool = &data->xmit_pool; + unsigned long flags; + int ret; + + DATA_FUNCTION("data_alloc_xmit_buffer()\n"); + + spin_lock_irqsave(&data->xmit_buf_lock, flags); + ib_dma_sync_single_for_cpu(data->parent->config->ibdev, + pool->buf_pool_dma, pool->buf_pool_len, + DMA_TO_DEVICE); + *last = 0; + *pp_rdma_io = &pool->xmit_bufs[pool->next_xmit_buf]; + *pp_bpe = &pool->buf_pool[pool->next_xmit_pool]; + + if ((*pp_bpe)->valid && pool->next_xmit_buf != + pool->last_comp_buf) { + INC(pool->next_xmit_buf, 1, pool->num_xmit_bufs); + INC(pool->next_xmit_pool, 1, pool->pool_sz); + if (!pool->buf_pool[pool->next_xmit_pool].valid) { + DATA_INFO("just used the last EIOU" + " receive buffer\n"); + *last = 1; + pool->need_buffers = 1; + vnic_stop_xmit(data->parent->vnic, + data->parent->parent); + data_kickreq_stats(data); + } else if (pool->next_xmit_buf == pool->last_comp_buf) { + DATA_INFO("just used our last xmit buffer\n"); + pool->need_buffers = 1; + vnic_stop_xmit(data->parent->vnic, + data->parent->parent); + } + (*pp_rdma_io)->skb = skb; + (*pp_bpe)->valid = 0; + ret = 0; + } else { + data_no_xmitbuf_stats(data); + DATA_ERROR("Out of xmit buffers\n"); + vnic_stop_xmit(data->parent->vnic, + data->parent->parent); + ret = -1; + } + + ib_dma_sync_single_for_device(data->parent->config->ibdev, + pool->buf_pool_dma, + pool->buf_pool_len, DMA_TO_DEVICE); + spin_unlock_irqrestore(&data->xmit_buf_lock, flags); + return ret; +} + +static void data_rdma_packet(struct data *data, struct buff_pool_entry *bpe, + struct rdma_io *rdma_io) +{ + struct ib_send_wr *swr; + struct sk_buff *skb; + dma_addr_t trailer_data_dma; + dma_addr_t skb_data_dma; + struct xmit_pool *xmit_pool = &data->xmit_pool; + struct viport *viport = data->parent; + u8 *d; + int len; + int fill_len; + + DATA_FUNCTION("data_rdma_packet()\n"); + swr = &rdma_io->io.swr; + skb = rdma_io->skb; + len = ALIGN(rdma_io->len, VIPORT_TRAILER_ALIGNMENT); + fill_len = len - skb->len; + + ib_dma_sync_single_for_cpu(data->parent->config->ibdev, + xmit_pool->xmitdata_dma, + xmit_pool->xmitdata_len, DMA_TO_DEVICE); + + d = (u8 *) rdma_io->trailer - fill_len; + trailer_data_dma = rdma_io->trailer_dma - fill_len; + memset(d, 0, fill_len); + + swr->sg_list[0].length = skb->len; + if (skb->len <= min_xmt_skb) { + memcpy(rdma_io->data, skb->data, skb->len); + swr->sg_list[0].lkey = data->mr->lkey; + swr->sg_list[0].addr = rdma_io->data_dma; + dev_kfree_skb_any(skb); + rdma_io->skb = NULL; + } else { + swr->sg_list[0].lkey = data->mr->lkey; + + skb_data_dma = ib_dma_map_single(viport->config->ibdev, + skb->data, skb->len, + DMA_TO_DEVICE); + + if (ib_dma_mapping_error(viport->config->ibdev, skb_data_dma)) { + DATA_ERROR("skb data dma map error\n"); + goto failure; + } + + rdma_io->skb_data_dma = skb_data_dma; + + swr->sg_list[0].addr = skb_data_dma; + skb_orphan(skb); + } + ib_dma_sync_single_for_cpu(data->parent->config->ibdev, + xmit_pool->buf_pool_dma, + xmit_pool->buf_pool_len, DMA_TO_DEVICE); + + swr->sg_list[1].addr = trailer_data_dma; + swr->sg_list[1].length = fill_len + sizeof(struct viport_trailer); + swr->sg_list[0].lkey = data->mr->lkey; + swr->wr.rdma.remote_addr = be64_to_cpu(bpe->remote_addr); + swr->wr.rdma.remote_addr += data->xmit_pool.buffer_sz; + swr->wr.rdma.remote_addr -= (sizeof(struct viport_trailer) + len); + swr->wr.rdma.rkey = be32_to_cpu(bpe->rkey); + + ib_dma_sync_single_for_device(data->parent->config->ibdev, + xmit_pool->buf_pool_dma, + xmit_pool->buf_pool_len, DMA_TO_DEVICE); + + /* If VNIC_FEAT_RDMA_IMMED is supported then change the work request + * opcode to IB_WR_RDMA_WRITE_WITH_IMM + */ + + if (data->parent->features_supported & VNIC_FEAT_RDMA_IMMED) { + swr->ex.imm_data = 0; + swr->opcode = IB_WR_RDMA_WRITE_WITH_IMM; + } + + data->xmit_pool.notify_count++; + if (data->xmit_pool.notify_count >= data->xmit_pool.notify_bundle) { + data->xmit_pool.notify_count = 0; + swr->send_flags = IB_SEND_SIGNALED; + } else { + swr->send_flags = 0; + } + ib_dma_sync_single_for_device(data->parent->config->ibdev, + xmit_pool->xmitdata_dma, + xmit_pool->xmitdata_len, DMA_TO_DEVICE); + if (vnic_ib_post_send(&data->ib_conn, &rdma_io->io)) { + DATA_ERROR("failed to post send for data RDMA write\n"); + viport_failure(data->parent); + goto failure; + } + + data_xmits_stats(data); +failure: + ib_dma_sync_single_for_device(data->parent->config->ibdev, + xmit_pool->xmitdata_dma, + xmit_pool->xmitdata_len, DMA_TO_DEVICE); +} + +static void data_kick_timeout_handler(unsigned long arg) +{ + struct data *data = (struct data *)arg; + + DATA_FUNCTION("data_kick_timeout_handler()\n"); + data->kick_timer_on = 0; + data_send_kick_message(data); +} + +int data_xmit_packet(struct data *data, struct sk_buff *skb) +{ + struct xmit_pool *pool = &data->xmit_pool; + struct rdma_io *rdma_io; + struct buff_pool_entry *bpe; + struct viport_trailer *trailer; + unsigned int sz = skb->len; + int last; + + DATA_FUNCTION("data_xmit_packet()\n"); + if (sz > pool->buffer_sz) { + DATA_ERROR("outbound packet too large, size = %d\n", sz); + return -1; + } + + if (data_alloc_xmit_buffer(data, skb, &bpe, &rdma_io, &last)) { + DATA_ERROR("error in allocating data xmit buffer\n"); + return -1; + } + + ib_dma_sync_single_for_cpu(data->parent->config->ibdev, + pool->xmitdata_dma, pool->xmitdata_len, + DMA_TO_DEVICE); + trailer = rdma_io->trailer; + + memset(trailer, 0, sizeof *trailer); + memcpy(trailer->dest_mac_addr, skb->data, ETH_ALEN); + + if (skb->sk) + trailer->connection_hash_and_valid = 0x40 | + ((be16_to_cpu(inet_sk(skb->sk)->sport) + + be16_to_cpu(inet_sk(skb->sk)->dport)) & 0x3f); + + trailer->connection_hash_and_valid |= CHV_VALID; + + if ((sz > 16) && (*(__be16 *) (skb->data + 12) == + __constant_cpu_to_be16(ETH_P_8021Q))) { + trailer->vlan = *(__be16 *) (skb->data + 14); + memmove(skb->data + 4, skb->data, 12); + skb_pull(skb, 4); + sz -= 4; + trailer->pkt_flags |= PF_VLAN_INSERT; + } + if (last) + trailer->pkt_flags |= PF_KICK; + if (sz < ETH_ZLEN) { + /* EIOU requires all packets to be + * of ethernet minimum packet size. + */ + trailer->data_length = __constant_cpu_to_be16(ETH_ZLEN); + rdma_io->len = ETH_ZLEN; + } else { + trailer->data_length = cpu_to_be16(sz); + rdma_io->len = sz; + } + + if (skb->ip_summed == CHECKSUM_PARTIAL) { + trailer->tx_chksum_flags = TX_CHKSUM_FLAGS_CHECKSUM_V4 + | TX_CHKSUM_FLAGS_IP_CHECKSUM + | TX_CHKSUM_FLAGS_TCP_CHECKSUM + | TX_CHKSUM_FLAGS_UDP_CHECKSUM; + } + + ib_dma_sync_single_for_device(data->parent->config->ibdev, + pool->xmitdata_dma, pool->xmitdata_len, + DMA_TO_DEVICE); + data_rdma_packet(data, bpe, rdma_io); + + if (pool->send_kicks) { + /* EIOC needs kicks to inform it of sent packets */ + pool->kick_count++; + pool->kick_byte_count += sz; + if ((pool->kick_count >= pool->kick_bundle) + || (pool->kick_byte_count >= pool->kick_byte_bundle)) { + data_send_kick_message(data); + } else if (pool->kick_count == 1) { + init_timer(&data->kick_timer); + /* timeout_before_kick is in usec */ + data->kick_timer.expires = + msecs_to_jiffies(be32_to_cpu(data-> + eioc_pool_parms.timeout_before_kick) * 1000) + + jiffies; + data->kick_timer.data = (unsigned long)data; + data->kick_timer.function = data_kick_timeout_handler; + add_timer(&data->kick_timer); + data->kick_timer_on = 1; + } + } + return 0; +} + +static void data_check_xmit_buffers(struct data *data) +{ + struct xmit_pool *pool = &data->xmit_pool; + unsigned long flags; + + DATA_FUNCTION("data_check_xmit_buffers()\n"); + spin_lock_irqsave(&data->xmit_buf_lock, flags); + ib_dma_sync_single_for_cpu(data->parent->config->ibdev, + pool->buf_pool_dma, pool->buf_pool_len, + DMA_TO_DEVICE); + + if (data->xmit_pool.need_buffers + && pool->buf_pool[pool->next_xmit_pool].valid + && pool->next_xmit_buf != pool->last_comp_buf) { + data->xmit_pool.need_buffers = 0; + vnic_restart_xmit(data->parent->vnic, + data->parent->parent); + DATA_INFO("there are free xmit buffers\n"); + } + ib_dma_sync_single_for_device(data->parent->config->ibdev, + pool->buf_pool_dma, pool->buf_pool_len, + DMA_TO_DEVICE); + + spin_unlock_irqrestore(&data->xmit_buf_lock, flags); +} + +static struct sk_buff *data_recv_to_skbuff(struct data *data, + struct rdma_dest *rdma_dest) +{ + struct viport_trailer *trailer; + struct sk_buff *skb = NULL; + int start; + unsigned int len; + u8 rx_chksum_flags; + + DATA_FUNCTION("data_recv_to_skbuff()\n"); + trailer = rdma_dest->trailer; + start = data_offset(data, trailer); + len = data_len(data, trailer); + + if (len <= min_rcv_skb) + skb = dev_alloc_skb(len + VLAN_HLEN + 2); + /* leave room for VLAN header and alignment */ + if (skb) { + skb_reserve(skb, VLAN_HLEN + 2); + memcpy(skb->data, rdma_dest->data + start, len); + skb_put(skb, len); + } else { + skb = rdma_dest->skb; + rdma_dest->skb = NULL; + rdma_dest->trailer = NULL; + rdma_dest->data = NULL; + skb_pull(skb, start); + skb_trim(skb, len); + } + + rx_chksum_flags = trailer->rx_chksum_flags; + DATA_INFO("rx_chksum_flags = %d, LOOP = %c, IP = %c," + " TCP = %c, UDP = %c\n", + rx_chksum_flags, + (rx_chksum_flags & RX_CHKSUM_FLAGS_LOOPBACK) ? 'Y' : 'N', + (rx_chksum_flags & RX_CHKSUM_FLAGS_IP_CHECKSUM_SUCCEEDED) ? 'Y' + : (rx_chksum_flags & RX_CHKSUM_FLAGS_IP_CHECKSUM_FAILED) ? 'N' : + '-', + (rx_chksum_flags & RX_CHKSUM_FLAGS_TCP_CHECKSUM_SUCCEEDED) ? 'Y' + : (rx_chksum_flags & RX_CHKSUM_FLAGS_TCP_CHECKSUM_FAILED) ? 'N' : + '-', + (rx_chksum_flags & RX_CHKSUM_FLAGS_UDP_CHECKSUM_SUCCEEDED) ? 'Y' + : (rx_chksum_flags & RX_CHKSUM_FLAGS_UDP_CHECKSUM_FAILED) ? 'N' : + '-'); + + if ((rx_chksum_flags & RX_CHKSUM_FLAGS_LOOPBACK) + || ((rx_chksum_flags & RX_CHKSUM_FLAGS_IP_CHECKSUM_SUCCEEDED) + && ((rx_chksum_flags & RX_CHKSUM_FLAGS_TCP_CHECKSUM_SUCCEEDED) + || (rx_chksum_flags & + RX_CHKSUM_FLAGS_UDP_CHECKSUM_SUCCEEDED)))) + skb->ip_summed = CHECKSUM_UNNECESSARY; + else + skb->ip_summed = CHECKSUM_NONE; + + if ((trailer->pkt_flags & PF_VLAN_INSERT) && + !(data->parent->features_supported & VNIC_FEAT_IGNORE_VLAN)) { + u8 *rv; + + rv = skb_push(skb, 4); + memmove(rv, rv + 4, 12); + *(__be16 *) (rv + 12) = __constant_cpu_to_be16(ETH_P_8021Q); + if (trailer->pkt_flags & PF_PVID_OVERRIDDEN) + *(__be16 *) (rv + 14) = trailer->vlan & + __constant_cpu_to_be16(0xF000); + else + *(__be16 *) (rv + 14) = trailer->vlan; + } + + return skb; +} + +static int data_incoming_recv(struct data *data) +{ + struct recv_pool *pool = &data->recv_pool; + struct rdma_dest *rdma_dest; + struct viport_trailer *trailer; + struct buff_pool_entry *bpe; + struct sk_buff *skb; + dma_addr_t vaddr_dma; + + DATA_FUNCTION("data_incoming_recv()\n"); + if (pool->next_full_buf == pool->next_free_buf) + return -1; + bpe = &pool->buf_pool[pool->next_full_buf]; + vaddr_dma = be64_to_cpu(bpe->remote_addr); + rdma_dest = &pool->recv_bufs[bpe->valid - 1]; + trailer = rdma_dest->trailer; + + if (!trailer + || !(trailer->connection_hash_and_valid & CHV_VALID)) + return -1; + + /* received a packet */ + if (trailer->pkt_flags & PF_KICK) + pool->kick_on_free = 1; + + skb = data_recv_to_skbuff(data, rdma_dest); + + if (skb) { + vnic_recv_packet(data->parent->vnic, + data->parent->parent, skb); + list_add(&rdma_dest->list_ptrs, &pool->avail_recv_bufs); + } + + ib_dma_unmap_single(data->parent->config->ibdev, + vaddr_dma, pool->buffer_sz, + DMA_FROM_DEVICE); + ib_dma_sync_single_for_cpu(data->parent->config->ibdev, + pool->buf_pool_dma, pool->buf_pool_len, + DMA_TO_DEVICE); + + bpe->valid = 0; + ib_dma_sync_single_for_device(data->parent->config->ibdev, + pool->buf_pool_dma, pool->buf_pool_len, + DMA_TO_DEVICE); + + INC(pool->next_full_buf, 1, pool->eioc_pool_sz); + pool->num_posted_bufs--; + data_recvs_stats(data); + return 0; +} + +static void data_received_kick(struct io *io) +{ + struct data *data = &io->viport->data; + unsigned long flags; + + DATA_FUNCTION("data_received_kick()\n"); + data_note_kickrcv_time(); + spin_lock_irqsave(&data->recv_ios_lock, flags); + list_add(&io->list_ptrs, &data->recv_ios); + spin_unlock_irqrestore(&data->recv_ios_lock, flags); + data_post_recvs(data); + data_rcvkicks_stats(data); + data_check_xmit_buffers(data); + + while (!data_incoming_recv(data)); + + if (data->connected) { + data_alloc_buffers(data, 0); + data_send_free_recv_buffers(data); + } +} + +static void data_xmit_complete(struct io *io) +{ + struct rdma_io *rdma_io = (struct rdma_io *)io; + struct data *data = &io->viport->data; + struct xmit_pool *pool = &data->xmit_pool; + struct sk_buff *skb; + + DATA_FUNCTION("data_xmit_complete()\n"); + + if (rdma_io->skb) + ib_dma_unmap_single(data->parent->config->ibdev, + rdma_io->skb_data_dma, rdma_io->skb->len, + DMA_TO_DEVICE); + + while (pool->last_comp_buf != rdma_io->index) { + INC(pool->last_comp_buf, 1, pool->num_xmit_bufs); + skb = pool->xmit_bufs[pool->last_comp_buf].skb; + if (skb) + dev_kfree_skb_any(skb); + pool->xmit_bufs[pool->last_comp_buf].skb = NULL; + } + + data_check_xmit_buffers(data); +} + +static int mc_data_alloc_skb(struct ud_recv_io *recv_io, u32 len, + int initial_allocation) +{ + struct sk_buff *skb; + struct mc_data *mc_data = &recv_io->io.viport->mc_data; + + DATA_FUNCTION("mc_data_alloc_skb\n"); + if (initial_allocation) + skb = alloc_skb(len, GFP_KERNEL); + else + skb = alloc_skb(len, GFP_ATOMIC); + if (!skb) { + DATA_ERROR("failed to alloc MULTICAST skb\n"); + return -1; + } + skb_put(skb, len); + recv_io->skb = skb; + + recv_io->skb_data_dma = ib_dma_map_single( + recv_io->io.viport->config->ibdev, + skb->data, skb->len, + DMA_FROM_DEVICE); + + if (ib_dma_mapping_error(recv_io->io.viport->config->ibdev, + recv_io->skb_data_dma)) { + DATA_ERROR("skb data dma map error\n"); + dev_kfree_skb(skb); + return -1; + } + + recv_io->list[0].addr = recv_io->skb_data_dma; + recv_io->list[0].length = sizeof(struct ib_grh); + recv_io->list[0].lkey = mc_data->mr->lkey; + + recv_io->list[1].addr = recv_io->skb_data_dma + sizeof(struct ib_grh); + recv_io->list[1].length = len - sizeof(struct ib_grh); + recv_io->list[1].lkey = mc_data->mr->lkey; + + recv_io->io.rwr.wr_id = (u64)&recv_io->io; + recv_io->io.rwr.sg_list = recv_io->list; + recv_io->io.rwr.num_sge = 2; + recv_io->io.rwr.next = NULL; + + return 0; +} + +static int mc_data_alloc_buffers(struct mc_data *mc_data) +{ + unsigned int i, num; + struct ud_recv_io *bufs = NULL, *recv_io; + + DATA_FUNCTION("mc_data_alloc_buffers\n"); + if (!mc_data->skb_len) { + unsigned int len; + /* align multicast msg buffer on viport_trailer boundary */ + len = (MCAST_MSG_SIZE + VIPORT_TRAILER_ALIGNMENT - 1) & + (~((unsigned int)VIPORT_TRAILER_ALIGNMENT - 1)); + /* + * Add size of grh and trailer - + * note, we don't need a + 4 for vlan because we have room in + * netbuf for grh & trailer and we'll strip them both, so there + * will be room enough to handle the 4 byte insertion for vlan. + */ + len += sizeof(struct ib_grh) + + sizeof(struct viport_trailer); + mc_data->skb_len = len; + DATA_INFO("mc_data->skb_len %d (sizes:%d %d)\n", + len, (int)sizeof(struct ib_grh), + (int)sizeof(struct viport_trailer)); + } + mc_data->recv_len = sizeof(struct ud_recv_io) * mc_data->num_recvs; + bufs = kmalloc(mc_data->recv_len, GFP_KERNEL); + if (!bufs) { + DATA_ERROR("failed to allocate MULTICAST buffers size:%d\n", + mc_data->recv_len); + return -1; + } + DATA_INFO("allocated num_recvs:%d recv_len:%d \n", + mc_data->num_recvs, mc_data->recv_len); + for (num = 0; num < mc_data->num_recvs; num++) { + recv_io = &bufs[num]; + recv_io->len = mc_data->skb_len; + recv_io->io.type = RECV_UD; + recv_io->io.viport = mc_data->parent; + recv_io->io.routine = mc_data_recv_routine; + + if (mc_data_alloc_skb(recv_io, mc_data->skb_len, 1)) { + for (i = 0; i < num; i++) { + recv_io = &bufs[i]; + ib_dma_unmap_single(recv_io->io.viport->config->ibdev, + recv_io->skb_data_dma, + recv_io->skb->len, + DMA_FROM_DEVICE); + dev_kfree_skb(recv_io->skb); + } + kfree(bufs); + return -1; + } + list_add_tail(&recv_io->io.list_ptrs, + &mc_data->avail_recv_ios_list); + } + mc_data->recv_ios = bufs; + return 0; +} + +void mc_data_cleanup(struct mc_data *mc_data) +{ + DATA_FUNCTION("mc_data_cleanup\n"); + completion_callback_cleanup(&mc_data->ib_conn); + if (!IS_ERR(mc_data->ib_conn.qp)) { + ib_destroy_qp(mc_data->ib_conn.qp); + mc_data->ib_conn.qp = (struct ib_qp *)ERR_PTR(-EINVAL); + } + if (!IS_ERR(mc_data->ib_conn.cq)) { + ib_destroy_cq(mc_data->ib_conn.cq); + mc_data->ib_conn.cq = (struct ib_cq *)ERR_PTR(-EINVAL); + } + kfree(mc_data->recv_ios); + mc_data->recv_ios = (struct ud_recv_io *)NULL; + if (mc_data->mr) { + ib_dereg_mr(mc_data->mr); + mc_data->mr = (struct ib_mr *)NULL; + } + DATA_FUNCTION("mc_data_cleanup done\n"); + +} + +int mc_data_init(struct mc_data *mc_data, struct viport *viport, + struct data_config *config, struct ib_pd *pd) +{ + DATA_FUNCTION("mc_data_init()\n"); + + mc_data->num_recvs = viport->data.config->num_recvs; + + INIT_LIST_HEAD(&mc_data->avail_recv_ios_list); + spin_lock_init(&mc_data->recv_lock); + + mc_data->parent = viport; + mc_data->config = config; + + mc_data->ib_conn.cm_id = NULL; + mc_data->ib_conn.viport = viport; + mc_data->ib_conn.ib_config = &config->ib_config; + mc_data->ib_conn.state = IB_CONN_UNINITTED; + mc_data->ib_conn.callback_thread = NULL; + mc_data->ib_conn.callback_thread_end = 0; + + if (vnic_ib_mc_init(mc_data, viport, pd, + &config->ib_config)) { + DATA_ERROR("vnic_ib_mc_init failed\n"); + goto failure; + } + mc_data->mr = ib_get_dma_mr(pd, + IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_WRITE); + if (IS_ERR(mc_data->mr)) { + DATA_ERROR("failed to register memory for" + " mc_data connection\n"); + goto destroy_conn; + } + + if (mc_data_alloc_buffers(mc_data)) + goto dereg_mr; + + mc_data_post_recvs(mc_data); + if (vnic_ib_mc_mod_qp_to_rts(mc_data->ib_conn.qp)) + goto dereg_mr; + + return 0; + +dereg_mr: + ib_dereg_mr(mc_data->mr); + mc_data->mr = (struct ib_mr *)NULL; +destroy_conn: + completion_callback_cleanup(&mc_data->ib_conn); + ib_destroy_qp(mc_data->ib_conn.qp); + mc_data->ib_conn.qp = (struct ib_qp *)ERR_PTR(-EINVAL); + ib_destroy_cq(mc_data->ib_conn.cq); + mc_data->ib_conn.cq = (struct ib_cq *)ERR_PTR(-EINVAL); +failure: + return -1; +} + +static void mc_data_post_recvs(struct mc_data *mc_data) +{ + unsigned long flags; + int i = 0; + DATA_FUNCTION("mc_data_post_recvs\n"); + spin_lock_irqsave(&mc_data->recv_lock, flags); + while (!list_empty(&mc_data->avail_recv_ios_list)) { + struct io *io = list_entry(mc_data->avail_recv_ios_list.next, + struct io, list_ptrs); + struct ud_recv_io *recv_io = + container_of(io, struct ud_recv_io, io); + list_del(&recv_io->io.list_ptrs); + spin_unlock_irqrestore(&mc_data->recv_lock, flags); + if (vnic_ib_mc_post_recv(mc_data, &recv_io->io)) { + viport_failure(mc_data->parent); + return; + } + spin_lock_irqsave(&mc_data->recv_lock, flags); + i++; + } + DATA_INFO("mcdata posted %d %p\n", i, &mc_data->avail_recv_ios_list); + spin_unlock_irqrestore(&mc_data->recv_lock, flags); +} + +static void mc_data_recv_routine(struct io *io) +{ + struct sk_buff *skb; + struct ib_grh *grh; + struct viport_trailer *trailer; + struct mc_data *mc_data; + unsigned long flags; + struct ud_recv_io *recv_io = container_of(io, struct ud_recv_io, io); + union ib_gid_cpu sgid; + + DATA_FUNCTION("mc_data_recv_routine\n"); + skb = recv_io->skb; + grh = (struct ib_grh *)skb->data; + mc_data = &recv_io->io.viport->mc_data; + + ib_dma_unmap_single(recv_io->io.viport->config->ibdev, + recv_io->skb_data_dma, recv_io->skb->len, + DMA_FROM_DEVICE); + + /* first - check if we've got our own mc packet */ + /* convert sgid from host to cpu form before comparing */ + bswap_ib_gid(&grh->sgid, &sgid); + if (cpu_to_be64(sgid.global.interface_id) == + io->viport->config->path_info.path.sgid.global.interface_id) { + DATA_ERROR("dropping - our mc packet\n"); + dev_kfree_skb(skb); + } else { + /* GRH is at head and trailer at end. Remove GRH from head. */ + trailer = (struct viport_trailer *) + (skb->data + recv_io->len - + sizeof(struct viport_trailer)); + skb_pull(skb, sizeof(struct ib_grh)); + if (trailer->connection_hash_and_valid & CHV_VALID) { + mc_data_recv_to_skbuff(io->viport, skb, trailer); + vnic_recv_packet(io->viport->vnic, io->viport->parent, + skb); + vnic_multicast_recv_pkt_stats(io->viport->vnic); + } else { + DATA_ERROR("dropping - no CHV_VALID in HashAndValid\n"); + dev_kfree_skb(skb); + } + } + recv_io->skb = NULL; + if (mc_data_alloc_skb(recv_io, mc_data->skb_len, 0)) + return; + + spin_lock_irqsave(&mc_data->recv_lock, flags); + list_add_tail(&recv_io->io.list_ptrs, &mc_data->avail_recv_ios_list); + spin_unlock_irqrestore(&mc_data->recv_lock, flags); + mc_data_post_recvs(mc_data); + return; +} + +static void mc_data_recv_to_skbuff(struct viport *viport, struct sk_buff *skb, + struct viport_trailer *trailer) +{ + u8 rx_chksum_flags = trailer->rx_chksum_flags; + + /* drop alignment bytes at start */ + skb_pull(skb, trailer->data_alignment_offset); + /* drop excess from end */ + skb_trim(skb, __be16_to_cpu(trailer->data_length)); + + if ((rx_chksum_flags & RX_CHKSUM_FLAGS_LOOPBACK) + || ((rx_chksum_flags & RX_CHKSUM_FLAGS_IP_CHECKSUM_SUCCEEDED) + && ((rx_chksum_flags & RX_CHKSUM_FLAGS_TCP_CHECKSUM_SUCCEEDED) + || (rx_chksum_flags & + RX_CHKSUM_FLAGS_UDP_CHECKSUM_SUCCEEDED)))) + skb->ip_summed = CHECKSUM_UNNECESSARY; + else + skb->ip_summed = CHECKSUM_NONE; + + if ((trailer->pkt_flags & PF_VLAN_INSERT) && + !(viport->features_supported & VNIC_FEAT_IGNORE_VLAN)) { + u8 *rv; + + /* insert VLAN id between source & length */ + DATA_INFO("VLAN adjustment\n"); + rv = skb_push(skb, 4); + memmove(rv, rv + 4, 12); + *(__be16 *) (rv + 12) = __constant_cpu_to_be16(ETH_P_8021Q); + if (trailer->pkt_flags & PF_PVID_OVERRIDDEN) + /* + * Indicates VLAN is 0 but we keep the protocol id. + */ + *(__be16 *) (rv + 14) = trailer->vlan & + __constant_cpu_to_be16(0xF000); + else + *(__be16 *) (rv + 14) = trailer->vlan; + DATA_INFO("vlan:%x\n", *(int *)(rv+14)); + } + + return; +} diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_data.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_data.h new file mode 100644 index 0000000..ad77aa9 --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_data.h @@ -0,0 +1,206 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_DATA_H_INCLUDED +#define VNIC_DATA_H_INCLUDED + +#include + +#ifdef CONFIG_INFINIBAND_QLGC_VNIC_STATS +#include +#endif /* CONFIG_INFINIBAND_QLGC_VNIC_STATS */ + +#include "vnic_ib.h" +#include "vnic_control_pkt.h" +#include "vnic_trailer.h" + +struct rdma_dest { + struct list_head list_ptrs; + struct sk_buff *skb; + u8 *data; + struct viport_trailer *trailer __attribute__((aligned(32))); +}; + +struct buff_pool_entry { + __be64 remote_addr; + __be32 rkey; + u32 valid; +}; + +struct recv_pool { + u32 buffer_sz; + u32 pool_sz; + u32 eioc_pool_sz; + u32 eioc_rdma_rkey; + u64 eioc_rdma_addr; + u32 next_full_buf; + u32 next_free_buf; + u32 num_free_bufs; + u32 num_posted_bufs; + u32 sz_free_bundle; + int kick_on_free; + struct buff_pool_entry *buf_pool; + dma_addr_t buf_pool_dma; + int buf_pool_len; + struct rdma_dest *recv_bufs; + struct list_head avail_recv_bufs; +}; + +struct xmit_pool { + u32 buffer_sz; + u32 pool_sz; + u32 notify_count; + u32 notify_bundle; + u32 next_xmit_buf; + u32 last_comp_buf; + u32 num_xmit_bufs; + u32 next_xmit_pool; + u32 kick_count; + u32 kick_byte_count; + u32 kick_bundle; + u32 kick_byte_bundle; + int need_buffers; + int send_kicks; + uint32_t rdma_rkey; + u64 rdma_addr; + struct buff_pool_entry *buf_pool; + dma_addr_t buf_pool_dma; + int buf_pool_len; + struct rdma_io *xmit_bufs; + u8 *xmit_data; + dma_addr_t xmitdata_dma; + int xmitdata_len; +}; + +struct data { + struct viport *parent; + struct data_config *config; + struct ib_mr *mr; + struct vnic_ib_conn ib_conn; + u8 *local_storage; + struct vnic_recv_pool_config host_pool_parms; + struct vnic_recv_pool_config eioc_pool_parms; + struct recv_pool recv_pool; + struct xmit_pool xmit_pool; + u8 *region_data; + dma_addr_t region_data_dma; + struct rdma_io free_bufs_io; + struct send_io kick_io; + struct list_head recv_ios; + spinlock_t recv_ios_lock; + spinlock_t xmit_buf_lock; + int kick_timer_on; + int connected; + u16 max_mtu; + struct timer_list kick_timer; + struct completion done; +#ifdef CONFIG_INFINIBAND_QLGC_VNIC_STATS + struct { + u32 xmit_num; + u32 recv_num; + u32 free_buf_sends; + u32 free_buf_num; + u32 free_buf_min; + u32 kick_recvs; + u32 kick_reqs; + u32 no_xmit_bufs; + cycles_t no_xmit_buf_time; + } statistics; +#endif /* CONFIG_INFINIBAND_QLGC_VNIC_STATS */ +}; + +struct mc_data { + struct viport *parent; + struct data_config *config; + struct ib_mr *mr; + struct vnic_ib_conn ib_conn; + + u32 num_recvs; + u32 skb_len; + spinlock_t recv_lock; + int recv_len; + struct ud_recv_io *recv_ios; + struct list_head avail_recv_ios_list; +}; + +int data_init(struct data *data, struct viport *viport, + struct data_config *config, struct ib_pd *pd); + +int data_connect(struct data *data); +void data_connected(struct data *data); +void data_disconnect(struct data *data); + +int data_xmit_packet(struct data *data, struct sk_buff *skb); + +void data_cleanup(struct data *data); + +#define data_is_connected(data) \ + (vnic_ib_conn_connected(&((data)->ib_conn))) +#define data_path_id(data) (data)->config->path_id +#define data_eioc_pool(data) &(data)->eioc_pool_parms +#define data_host_pool(data) &(data)->host_pool_parms +#define data_eioc_pool_min(data) &(data)->config->eioc_min +#define data_host_pool_min(data) &(data)->config->host_min +#define data_eioc_pool_max(data) &(data)->config->eioc_max +#define data_host_pool_max(data) &(data)->config->host_max +#define data_local_pool_addr(data) (data)->xmit_pool.rdma_addr +#define data_local_pool_rkey(data) (data)->xmit_pool.rdma_rkey +#define data_remote_pool_addr(data) &(data)->recv_pool.eioc_rdma_addr +#define data_remote_pool_rkey(data) &(data)->recv_pool.eioc_rdma_rkey + +#define data_max_mtu(data) (data)->max_mtu + + +#define data_len(data, trailer) be16_to_cpu(trailer->data_length) +#define data_offset(data, trailer) \ + ((data)->recv_pool.buffer_sz - sizeof(struct viport_trailer) \ + - ALIGN(data_len((data), (trailer)), VIPORT_TRAILER_ALIGNMENT) \ + + (trailer->data_alignment_offset)) + +/* the following macros manipulate ring buffer indexes. + * the ring buffer size must be a power of 2. + */ +#define ADD(index, increment, size) (((index) + (increment))&((size) - 1)) +#define NEXT(index, size) ADD(index, 1, size) +#define INC(index, increment, size) (index) = ADD(index, increment, size) + +/* this is max multicast msg embedded will send */ +#define MCAST_MSG_SIZE \ + (2048 - sizeof(struct ib_grh) - sizeof(struct viport_trailer)) + +int mc_data_init(struct mc_data *mc_data, struct viport *viport, + struct data_config *config, + struct ib_pd *pd); + +void mc_data_cleanup(struct mc_data *mc_data); + +#endif /* VNIC_DATA_H_INCLUDED */ diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_trailer.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_trailer.h new file mode 100644 index 0000000..dd8a073 --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_trailer.h @@ -0,0 +1,103 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_TRAILER_H_INCLUDED +#define VNIC_TRAILER_H_INCLUDED + +/* pkt_flags values */ +enum { + PF_CHASH_VALID = 0x01, + PF_IPSEC_VALID = 0x02, + PF_TCP_SEGMENT = 0x04, + PF_KICK = 0x08, + PF_VLAN_INSERT = 0x10, + PF_PVID_OVERRIDDEN = 0x20, + PF_FCS_INCLUDED = 0x40, + PF_FORCE_ROUTE = 0x80 +}; + +/* tx_chksum_flags values */ +enum { + TX_CHKSUM_FLAGS_CHECKSUM_V4 = 0x01, + TX_CHKSUM_FLAGS_CHECKSUM_V6 = 0x02, + TX_CHKSUM_FLAGS_TCP_CHECKSUM = 0x04, + TX_CHKSUM_FLAGS_UDP_CHECKSUM = 0x08, + TX_CHKSUM_FLAGS_IP_CHECKSUM = 0x10 +}; + +/* rx_chksum_flags values */ +enum { + RX_CHKSUM_FLAGS_TCP_CHECKSUM_FAILED = 0x01, + RX_CHKSUM_FLAGS_UDP_CHECKSUM_FAILED = 0x02, + RX_CHKSUM_FLAGS_IP_CHECKSUM_FAILED = 0x04, + RX_CHKSUM_FLAGS_TCP_CHECKSUM_SUCCEEDED = 0x08, + RX_CHKSUM_FLAGS_UDP_CHECKSUM_SUCCEEDED = 0x10, + RX_CHKSUM_FLAGS_IP_CHECKSUM_SUCCEEDED = 0x20, + RX_CHKSUM_FLAGS_LOOPBACK = 0x40, + RX_CHKSUM_FLAGS_RESERVED = 0x80 +}; + +/* connection_hash_and_valid values */ +enum { + CHV_VALID = 0x80, + CHV_HASH_MASH = 0x7f +}; + +struct viport_trailer { + s8 data_alignment_offset; + u8 rndis_header_length; /* reserved for use by edp */ + __be16 data_length; + u8 pkt_flags; + u8 tx_chksum_flags; + u8 rx_chksum_flags; + u8 ip_sec_flags; + u32 tcp_seq_no; + u32 ip_sec_offload_handle; + u32 ip_sec_next_offload_handle; + u8 dest_mac_addr[6]; + __be16 vlan; + u16 time_stamp; + u8 origin; + u8 connection_hash_and_valid; +}; + +#define VIPORT_TRAILER_ALIGNMENT 32 + +#define BUFFER_SIZE(len) \ + (sizeof(struct viport_trailer) + \ + ALIGN((len), VIPORT_TRAILER_ALIGNMENT)) + +#define MAX_PAYLOAD(len) \ + ALIGN_DOWN((len) - sizeof(struct viport_trailer), \ + VIPORT_TRAILER_ALIGNMENT) + +#endif /* VNIC_TRAILER_H_INCLUDED */ From ramachandra.kuchimanchi at qlogic.com Wed Apr 30 10:19:25 2008 From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K) Date: Wed, 30 Apr 2008 22:49:25 +0530 Subject: [ofa-general] [PATCH 07/13] QLogic VNIC: Handling configurable parameters of the driver In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain> References: <20080430171028.31725.86190.stgit@localhost.localdomain> Message-ID: <20080430171925.31725.22023.stgit@localhost.localdomain> From: Poornima Kamath This patch adds the files that handle various configurable parameters of the VNIC driver ---- configuration of virtual NIC, control, data connections to the EVIC and general IB connection parameters. Signed-off-by: Ramachandra K Signed-off-by: Amar Mudrankit --- drivers/infiniband/ulp/qlgc_vnic/vnic_config.c | 380 ++++++++++++++++++++++++ drivers/infiniband/ulp/qlgc_vnic/vnic_config.h | 242 +++++++++++++++ 2 files changed, 622 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_config.c create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_config.h diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_config.c b/drivers/infiniband/ulp/qlgc_vnic/vnic_config.c new file mode 100644 index 0000000..86d99b6 --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_config.c @@ -0,0 +1,380 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include + +#include + +#include "vnic_util.h" +#include "vnic_config.h" +#include "vnic_trailer.h" +#include "vnic_main.h" + +u16 vnic_max_mtu = MAX_MTU; + +static u32 default_no_path_timeout = DEFAULT_NO_PATH_TIMEOUT; +static u32 sa_path_rec_get_timeout = SA_PATH_REC_GET_TIMEOUT; +static u32 default_primary_reconnect_timeout = + DEFAULT_PRIMARY_RECONNECT_TIMEOUT; +static u32 default_primary_switch_timeout = DEFAULT_PRIMARY_SWITCH_TIMEOUT; +static int default_prefer_primary = DEFAULT_PREFER_PRIMARY; + +static int use_rx_csum = VNIC_USE_RX_CSUM; +static int use_tx_csum = VNIC_USE_TX_CSUM; + +static u32 control_response_timeout = CONTROL_RSP_TIMEOUT; +static u32 completion_limit = DEFAULT_COMPLETION_LIMIT; + +module_param(vnic_max_mtu, ushort, 0444); +MODULE_PARM_DESC(vnic_max_mtu, "Maximum MTU size (1500-9500). Default is 9500"); + +module_param(default_prefer_primary, bool, 0444); +MODULE_PARM_DESC(default_prefer_primary, "Determines if primary path is" + " preferred (1) or not (0). Defaults to 0"); +module_param(use_rx_csum, bool, 0444); +MODULE_PARM_DESC(use_rx_csum, "Determines if RX checksum is done on VEx (1)" + " or not (0). Defaults to 1"); +module_param(use_tx_csum, bool, 0444); +MODULE_PARM_DESC(use_tx_csum, "Determines if TX checksum is done on VEx (1)" + " or not (0). Defaults to 1"); +module_param(default_no_path_timeout, uint, 0444); +MODULE_PARM_DESC(default_no_path_timeout, "Time to wait in milliseconds" + " before reconnecting to VEx after connection loss"); +module_param(default_primary_reconnect_timeout, uint, 0444); +MODULE_PARM_DESC(default_primary_reconnect_timeout, "Time to wait in" + " milliseconds before reconnecting the" + " primary path to VEx"); +module_param(default_primary_switch_timeout, uint, 0444); +MODULE_PARM_DESC(default_primary_switch_timeout, "Time to wait before" + " switching back to primary path if" + " primary path is preferred"); +module_param(sa_path_rec_get_timeout, uint, 0444); +MODULE_PARM_DESC(sa_path_rec_get_timeout, "Time out value in milliseconds" + " for SA path record get queries"); + +module_param(control_response_timeout, uint, 0444); +MODULE_PARM_DESC(control_response_timeout, "Time out value in milliseconds" + " to wait for response to control requests"); + +module_param(completion_limit, uint, 0444); +MODULE_PARM_DESC(completion_limit, "Maximum completions to process" + " in a single completion callback invocation. Default is 100" + " Minimum value is 10"); + +static void config_control_defaults(struct control_config *control_config, + struct path_param *params) +{ + int len; + char *dot; + u64 sid; + + sid = (SST_AGN << 56) | (SST_OUI << 32) | (CONTROL_PATH_ID << 8) + | IOC_NUMBER(be64_to_cpu(params->ioc_guid)); + + control_config->ib_config.service_id = cpu_to_be64(sid); + control_config->ib_config.conn_data.path_id = 0; + control_config->ib_config.conn_data.vnic_instance = params->instance; + control_config->ib_config.conn_data.path_num = 0; + control_config->ib_config.conn_data.features_supported = + __constant_cpu_to_be32((u32) (VNIC_FEAT_IGNORE_VLAN | + VNIC_FEAT_RDMA_IMMED)); + dot = strchr(init_utsname()->nodename, '.'); + + if (dot) + len = dot - init_utsname()->nodename; + else + len = strlen(init_utsname()->nodename); + + if (len > VNIC_MAX_NODENAME_LEN) + len = VNIC_MAX_NODENAME_LEN; + + memcpy(control_config->ib_config.conn_data.nodename, + init_utsname()->nodename, len); + + if (params->ib_multicast == 1) + control_config->ib_multicast = 1; + else if (params->ib_multicast == 0) + control_config->ib_multicast = 0; + else { + /* parameter is not set - enable it by default */ + control_config->ib_multicast = 1; + CONFIG_ERROR("IOCGUID=%llx INSTANCE=%d IB_MULTICAST defaulted" + " to TRUE\n", params->ioc_guid, + (char)params->instance); + } + + if (control_config->ib_multicast) + control_config->ib_config.conn_data.features_supported |= + __constant_cpu_to_be32(VNIC_FEAT_INBOUND_IB_MC); + + control_config->ib_config.retry_count = RETRY_COUNT; + control_config->ib_config.rnr_retry_count = RETRY_COUNT; + control_config->ib_config.min_rnr_timer = MIN_RNR_TIMER; + + /* These values are not configurable*/ + control_config->ib_config.num_recvs = 5; + control_config->ib_config.num_sends = 1; + control_config->ib_config.recv_scatter = 1; + control_config->ib_config.send_gather = 1; + control_config->ib_config.completion_limit = completion_limit; + + control_config->num_recvs = control_config->ib_config.num_recvs; + + control_config->vnic_instance = params->instance; + control_config->max_address_entries = MAX_ADDRESS_ENTRIES; + control_config->min_address_entries = MIN_ADDRESS_ENTRIES; + control_config->rsp_timeout = msecs_to_jiffies(control_response_timeout); +} + +static void config_data_defaults(struct data_config *data_config, + struct path_param *params) +{ + u64 sid; + + sid = (SST_AGN << 56) | (SST_OUI << 32) | (DATA_PATH_ID << 8) + | IOC_NUMBER(be64_to_cpu(params->ioc_guid)); + + data_config->ib_config.service_id = cpu_to_be64(sid); + data_config->ib_config.conn_data.path_id = jiffies; /* random */ + data_config->ib_config.conn_data.vnic_instance = params->instance; + data_config->ib_config.conn_data.path_num = 0; + + data_config->ib_config.retry_count = RETRY_COUNT; + data_config->ib_config.rnr_retry_count = RETRY_COUNT; + data_config->ib_config.min_rnr_timer = MIN_RNR_TIMER; + + /* + * NOTE: the num_recvs size assumes that the EIOC could + * RDMA enough packets to fill all of the host recv + * pool entries, plus send a kick message after each + * packet, plus RDMA new buffers for the size of + * the EIOC recv buffer pool, plus send kick messages + * after each min_host_update_sz of new buffers all + * before the host can even pull off the first completed + * receive off the completion queue, and repost the + * receive. NOT LIKELY! + */ + data_config->ib_config.num_recvs = HOST_RECV_POOL_ENTRIES + + (MAX_EIOC_POOL_SZ / MIN_HOST_UPDATE_SZ); + + data_config->ib_config.num_sends = (2 * NOTIFY_BUNDLE_SZ) + + (HOST_RECV_POOL_ENTRIES / MIN_EIOC_UPDATE_SZ) + 1; + + data_config->ib_config.recv_scatter = 1; /* not configurable */ + data_config->ib_config.send_gather = 2; /* not configurable */ + data_config->ib_config.completion_limit = completion_limit; + + data_config->num_recvs = data_config->ib_config.num_recvs; + data_config->path_id = data_config->ib_config.conn_data.path_id; + + + data_config->host_recv_pool_entries = HOST_RECV_POOL_ENTRIES; + + data_config->host_min.size_recv_pool_entry = + cpu_to_be32(BUFFER_SIZE(VLAN_ETH_HLEN + MIN_MTU)); + data_config->host_max.size_recv_pool_entry = + cpu_to_be32(BUFFER_SIZE(VLAN_ETH_HLEN + vnic_max_mtu)); + data_config->eioc_min.size_recv_pool_entry = + cpu_to_be32(BUFFER_SIZE(VLAN_ETH_HLEN + MIN_MTU)); + data_config->eioc_max.size_recv_pool_entry = + __constant_cpu_to_be32(MAX_PARAM_VALUE); + + data_config->host_min.num_recv_pool_entries = + __constant_cpu_to_be32(MIN_HOST_POOL_SZ); + data_config->host_max.num_recv_pool_entries = + __constant_cpu_to_be32(MAX_PARAM_VALUE); + data_config->eioc_min.num_recv_pool_entries = + __constant_cpu_to_be32(MIN_EIOC_POOL_SZ); + data_config->eioc_max.num_recv_pool_entries = + __constant_cpu_to_be32(MAX_EIOC_POOL_SZ); + + data_config->host_min.timeout_before_kick = + __constant_cpu_to_be32(MIN_HOST_KICK_TIMEOUT); + data_config->host_max.timeout_before_kick = + __constant_cpu_to_be32(MAX_HOST_KICK_TIMEOUT); + data_config->eioc_min.timeout_before_kick = 0; + data_config->eioc_max.timeout_before_kick = + __constant_cpu_to_be32(MAX_PARAM_VALUE); + + data_config->host_min.num_recv_pool_entries_before_kick = + __constant_cpu_to_be32(MIN_HOST_KICK_ENTRIES); + data_config->host_max.num_recv_pool_entries_before_kick = + __constant_cpu_to_be32(MAX_HOST_KICK_ENTRIES); + data_config->eioc_min.num_recv_pool_entries_before_kick = 0; + data_config->eioc_max.num_recv_pool_entries_before_kick = + __constant_cpu_to_be32(MAX_PARAM_VALUE); + + data_config->host_min.num_recv_pool_bytes_before_kick = + __constant_cpu_to_be32(MIN_HOST_KICK_BYTES); + data_config->host_max.num_recv_pool_bytes_before_kick = + __constant_cpu_to_be32(MAX_HOST_KICK_BYTES); + data_config->eioc_min.num_recv_pool_bytes_before_kick = 0; + data_config->eioc_max.num_recv_pool_bytes_before_kick = + __constant_cpu_to_be32(MAX_PARAM_VALUE); + + data_config->host_min.free_recv_pool_entries_per_update = + __constant_cpu_to_be32(MIN_HOST_UPDATE_SZ); + data_config->host_max.free_recv_pool_entries_per_update = + __constant_cpu_to_be32(MAX_HOST_UPDATE_SZ); + data_config->eioc_min.free_recv_pool_entries_per_update = + __constant_cpu_to_be32(MIN_EIOC_UPDATE_SZ); + data_config->eioc_max.free_recv_pool_entries_per_update = + __constant_cpu_to_be32(MAX_EIOC_UPDATE_SZ); + + data_config->notify_bundle = NOTIFY_BUNDLE_SZ; +} + +static void config_path_info_defaults(struct viport_config *config, + struct path_param *params) +{ + int i; + ib_get_cached_gid(config->ibdev, config->port, 0, + &config->path_info.path.sgid); + for (i = 0; i < 16; i++) + config->path_info.path.dgid.raw[i] = params->dgid[i]; + + config->path_info.path.pkey = params->pkey; + config->path_info.path.numb_path = 1; + config->sa_path_rec_get_timeout = sa_path_rec_get_timeout; + +} + +static void config_viport_defaults(struct viport_config *config, + struct path_param *params) +{ + config->ibdev = params->ibdev; + config->port = params->port; + config->ioc_guid = params->ioc_guid; + config->stats_interval = msecs_to_jiffies(VIPORT_STATS_INTERVAL); + config->hb_interval = msecs_to_jiffies(VIPORT_HEARTBEAT_INTERVAL); + config->hb_timeout = VIPORT_HEARTBEAT_TIMEOUT * 1000; + /*hb_timeout needs to be in usec*/ + strcpy(config->ioc_string, params->ioc_string); + config_path_info_defaults(config, params); + + config_control_defaults(&config->control_config, params); + config_data_defaults(&config->data_config, params); +} + +static void config_vnic_defaults(struct vnic_config *config) +{ + config->no_path_timeout = msecs_to_jiffies(default_no_path_timeout); + config->primary_connect_timeout = + msecs_to_jiffies(DEFAULT_PRIMARY_CONNECT_TIMEOUT); + config->primary_reconnect_timeout = + msecs_to_jiffies(default_primary_reconnect_timeout); + config->primary_switch_timeout = + msecs_to_jiffies(default_primary_switch_timeout); + config->prefer_primary = default_prefer_primary; + config->use_rx_csum = use_rx_csum; + config->use_tx_csum = use_tx_csum; +} + +struct viport_config *config_alloc_viport(struct path_param *params) +{ + struct viport_config *config; + + config = kzalloc(sizeof *config, GFP_KERNEL); + if (!config) { + CONFIG_ERROR("could not allocate memory for" + " struct viport_config\n"); + return NULL; + } + + config_viport_defaults(config, params); + + return config; +} + +struct vnic_config *config_alloc_vnic(void) +{ + struct vnic_config *config; + + config = kzalloc(sizeof *config, GFP_KERNEL); + if (!config) { + CONFIG_ERROR("couldn't allocate memory for" + " struct vnic_config\n"); + + return NULL; + } + + config_vnic_defaults(config); + return config; +} + +char *config_viport_name(struct viport_config *config) +{ + /* function only called by one thread, can return a static string */ + static char str[64]; + + sprintf(str, "GUID %llx instance %d", + be64_to_cpu(config->ioc_guid), + config->control_config.vnic_instance); + return str; +} + +int config_start(void) +{ + vnic_max_mtu = min_t(u16, vnic_max_mtu, MAX_MTU); + vnic_max_mtu = max_t(u16, vnic_max_mtu, MIN_MTU); + + sa_path_rec_get_timeout = min_t(u32, sa_path_rec_get_timeout, + MAX_SA_TIMEOUT); + sa_path_rec_get_timeout = max_t(u32, sa_path_rec_get_timeout, + MIN_SA_TIMEOUT); + + control_response_timeout = min_t(u32, control_response_timeout, + MAX_CONTROL_RSP_TIMEOUT); + + control_response_timeout = max_t(u32, control_response_timeout, + MIN_CONTROL_RSP_TIMEOUT); + + completion_limit = max_t(u32, completion_limit, + MIN_COMPLETION_LIMIT); + + if (!default_no_path_timeout) + default_no_path_timeout = DEFAULT_NO_PATH_TIMEOUT; + + if (!default_primary_reconnect_timeout) + default_primary_reconnect_timeout = + DEFAULT_PRIMARY_RECONNECT_TIMEOUT; + + if (!default_primary_switch_timeout) + default_primary_switch_timeout = + DEFAULT_PRIMARY_SWITCH_TIMEOUT; + + return 0; + +} diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_config.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_config.h new file mode 100644 index 0000000..c5b00b9 --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_config.h @@ -0,0 +1,242 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_CONFIG_H_INCLUDED +#define VNIC_CONFIG_H_INCLUDED + +#include +#include +#include + +#include "vnic_control.h" +#include "vnic_ib.h" + +#define SST_AGN 0x10ULL +#define SST_OUI 0x00066AULL + +enum { + CONTROL_PATH_ID = 0x0, + DATA_PATH_ID = 0x1 +}; + +#define IOC_NUMBER(GUID) (((GUID) >> 32) & 0xFF) + +enum { + VNIC_CLASS_SUBCLASS = 0x2000066A, + VNIC_PROTOCOL = 0, + VNIC_PROT_VERSION = 1 +}; + +enum { + MIN_MTU = 1500, /* minimum negotiated MTU size */ + MAX_MTU = 9500 /* jumbo frame */ +}; + +/* + * TODO: tune the pool parameter values + */ +enum { + MIN_ADDRESS_ENTRIES = 16, + MAX_ADDRESS_ENTRIES = 64 +}; + +enum { + HOST_RECV_POOL_ENTRIES = 512, + MIN_HOST_POOL_SZ = 64, + MIN_EIOC_POOL_SZ = 64, + MAX_EIOC_POOL_SZ = 256, + MIN_HOST_UPDATE_SZ = 8, + MAX_HOST_UPDATE_SZ = 32, + MIN_EIOC_UPDATE_SZ = 8, + MAX_EIOC_UPDATE_SZ = 32, + NOTIFY_BUNDLE_SZ = 32 +}; + +enum { + MIN_HOST_KICK_TIMEOUT = 10, /* in usec */ + MAX_HOST_KICK_TIMEOUT = 100 /* in usec */ +}; + +enum { + MIN_HOST_KICK_ENTRIES = 1, + MAX_HOST_KICK_ENTRIES = 128 +}; + +enum { + MIN_HOST_KICK_BYTES = 0, + MAX_HOST_KICK_BYTES = 5000 +}; + +enum { + DEFAULT_NO_PATH_TIMEOUT = 10000, + DEFAULT_PRIMARY_CONNECT_TIMEOUT = 10000, + DEFAULT_PRIMARY_RECONNECT_TIMEOUT = 10000, + DEFAULT_PRIMARY_SWITCH_TIMEOUT = 10000 +}; + +enum { + VIPORT_STATS_INTERVAL = 500, /* .5 sec */ + VIPORT_HEARTBEAT_INTERVAL = 1000, /* 1 second */ + VIPORT_HEARTBEAT_TIMEOUT = 64000 /* 64 sec */ +}; + +enum { + /* 5 sec increased for EVIC support for large number of + * host connections + */ + CONTROL_RSP_TIMEOUT = 5000, + MIN_CONTROL_RSP_TIMEOUT = 1000, /* 1 sec */ + MAX_CONTROL_RSP_TIMEOUT = 60000 /* 60 sec */ +}; + +/* Maximum number of completions to be processed + * during a single completion callback invocation + */ +enum { + DEFAULT_COMPLETION_LIMIT = 100, + MIN_COMPLETION_LIMIT = 10 +}; + +/* infiniband connection parameters */ +enum { + RETRY_COUNT = 3, + MIN_RNR_TIMER = 22, /* 20 ms */ + DEFAULT_PKEY = 0 /* pkey table index */ +}; + +enum { + SA_PATH_REC_GET_TIMEOUT = 1000, /* 1000 ms */ + MIN_SA_TIMEOUT = 100, /* 100 ms */ + MAX_SA_TIMEOUT = 20000 /* 20s */ +}; + +#define MAX_PARAM_VALUE 0x40000000 +#define VNIC_USE_RX_CSUM 1 +#define VNIC_USE_TX_CSUM 1 +#define DEFAULT_PREFER_PRIMARY 0 + +/* As per IBTA specification, IOCString Maximum length can be 512 bits. */ +#define MAX_IOC_STRING_LEN (512/8) + +struct path_param { + __be64 ioc_guid; + u8 ioc_string[MAX_IOC_STRING_LEN+1]; + u8 port; + u8 instance; + struct ib_device *ibdev; + struct vnic_ib_port *ibport; + char name[IFNAMSIZ]; + u8 dgid[16]; + __be16 pkey; + int rx_csum; + int tx_csum; + int heartbeat; + int ib_multicast; +}; + +struct vnic_ib_config { + __be64 service_id; + struct vnic_connection_data conn_data; + u32 retry_count; + u32 rnr_retry_count; + u8 min_rnr_timer; + u32 num_sends; + u32 num_recvs; + u32 recv_scatter; /* 1 */ + u32 send_gather; /* 1 or 2 */ + u32 completion_limit; +}; + +struct control_config { + struct vnic_ib_config ib_config; + u32 num_recvs; + u8 vnic_instance; + u16 max_address_entries; + u16 min_address_entries; + u32 rsp_timeout; + u32 ib_multicast; +}; + +struct data_config { + struct vnic_ib_config ib_config; + u64 path_id; + u32 num_recvs; + u32 host_recv_pool_entries; + struct vnic_recv_pool_config host_min; + struct vnic_recv_pool_config host_max; + struct vnic_recv_pool_config eioc_min; + struct vnic_recv_pool_config eioc_max; + u32 notify_bundle; +}; + +struct viport_config { + struct viport *viport; + struct control_config control_config; + struct data_config data_config; + struct vnic_ib_path_info path_info; + u32 sa_path_rec_get_timeout; + struct ib_device *ibdev; + u32 port; + u32 stats_interval; + u32 hb_interval; + u32 hb_timeout; + __be64 ioc_guid; + u8 ioc_string[MAX_IOC_STRING_LEN+1]; + size_t path_idx; +}; + +/* + * primary_connect_timeout - if the secondary connects first, + * how long do we give the primary? + * primary_reconnect_timeout - same as above, but used when recovering + * from the case where both paths fail + * primary_switch_timeout - how long do we wait before switching to the + * primary when it comes back? + */ +struct vnic_config { + struct vnic *vnic; + char name[IFNAMSIZ]; + u32 no_path_timeout; + u32 primary_connect_timeout; + u32 primary_reconnect_timeout; + u32 primary_switch_timeout; + int prefer_primary; + int use_rx_csum; + int use_tx_csum; +}; + +int config_start(void); +struct viport_config *config_alloc_viport(struct path_param *params); +struct vnic_config *config_alloc_vnic(void); +char *config_viport_name(struct viport_config *config); + +#endif /* VNIC_CONFIG_H_INCLUDED */ From ramachandra.kuchimanchi at qlogic.com Wed Apr 30 10:18:55 2008 From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K) Date: Wed, 30 Apr 2008 22:48:55 +0530 Subject: [ofa-general] [PATCH 06/13] QLogic VNIC: IB core stack interaction In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain> References: <20080430171028.31725.86190.stgit@localhost.localdomain> Message-ID: <20080430171855.31725.89658.stgit@localhost.localdomain> From: Ramachandra K The patch implements the interaction of the QLogic VNIC driver with the underlying core infiniband stack. Signed-off-by: Poornima Kamath Signed-off-by: Amar Mudrankit --- drivers/infiniband/ulp/qlgc_vnic/vnic_ib.c | 1046 ++++++++++++++++++++++++++++ drivers/infiniband/ulp/qlgc_vnic/vnic_ib.h | 206 ++++++ 2 files changed, 1252 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_ib.c create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_ib.h diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_ib.c b/drivers/infiniband/ulp/qlgc_vnic/vnic_ib.c new file mode 100644 index 0000000..3bf6455 --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_ib.c @@ -0,0 +1,1046 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include +#include + +#include "vnic_util.h" +#include "vnic_data.h" +#include "vnic_config.h" +#include "vnic_ib.h" +#include "vnic_viport.h" +#include "vnic_sys.h" +#include "vnic_main.h" +#include "vnic_stats.h" + +static int vnic_ib_inited; +static void vnic_add_one(struct ib_device *device); +static void vnic_remove_one(struct ib_device *device); +static int vnic_defer_completion(void *ptr); + +static int vnic_ib_mc_init_qp(struct mc_data *mc_data, + struct vnic_ib_config *config, + struct ib_pd *pd, + struct viport_config *viport_config); + +static struct ib_client vnic_client = { + .name = "vnic", + .add = vnic_add_one, + .remove = vnic_remove_one +}; + +struct ib_sa_client vnic_sa_client; + +int vnic_ib_init(void) +{ + int ret = -1; + + IB_FUNCTION("vnic_ib_init()\n"); + + /* class has to be registered before + * calling ib_register_client() because, that call + * will trigger vnic_add_port() which will register + * class_device for the port with the parent class + * as vnic_class + */ + ret = class_register(&vnic_class); + if (ret) { + printk(KERN_ERR PFX "couldn't register class" + " infiniband_qlgc_vnic; error %d", ret); + goto out; + } + + ib_sa_register_client(&vnic_sa_client); + ret = ib_register_client(&vnic_client); + if (ret) { + printk(KERN_ERR PFX "couldn't register IB client;" + " error %d", ret); + goto err_ib_reg; + } + + interface_dev.dev.class = &vnic_class; + interface_dev.dev.release = vnic_release_dev; + snprintf(interface_dev.dev.bus_id, + BUS_ID_SIZE, "interfaces"); + init_completion(&interface_dev.released); + ret = device_register(&interface_dev.dev); + if (ret) { + printk(KERN_ERR PFX "couldn't register class interfaces;" + " error %d", ret); + goto err_class_dev; + } + ret = device_create_file(&interface_dev.dev, + &dev_attr_delete_vnic); + if (ret) { + printk(KERN_ERR PFX "couldn't create class file" + " 'delete_vnic'; error %d", ret); + goto err_class_file; + } + + vnic_ib_inited = 1; + + return ret; +err_class_file: + device_unregister(&interface_dev.dev); +err_class_dev: + ib_unregister_client(&vnic_client); +err_ib_reg: + ib_sa_unregister_client(&vnic_sa_client); + class_unregister(&vnic_class); +out: + return ret; +} + +static struct vnic_ib_port *vnic_add_port(struct vnic_ib_device *device, + u8 port_num) +{ + struct vnic_ib_port *port; + + port = kzalloc(sizeof *port, GFP_KERNEL); + if (!port) + return NULL; + + init_completion(&port->pdev_info.released); + port->dev = device; + port->port_num = port_num; + + port->pdev_info.dev.class = &vnic_class; + port->pdev_info.dev.parent = NULL; + port->pdev_info.dev.release = vnic_release_dev; + snprintf(port->pdev_info.dev.bus_id, BUS_ID_SIZE, + "vnic-%s-%d", device->dev->name, port_num); + + if (device_register(&port->pdev_info.dev)) + goto free_port; + + if (device_create_file(&port->pdev_info.dev, + &dev_attr_create_primary)) + goto err_class; + if (device_create_file(&port->pdev_info.dev, + &dev_attr_create_secondary)) + goto err_class; + + return port; +err_class: + device_unregister(&port->pdev_info.dev); +free_port: + kfree(port); + + return NULL; +} + +static void vnic_add_one(struct ib_device *device) +{ + struct vnic_ib_device *vnic_dev; + struct vnic_ib_port *port; + int s, e, p; + + vnic_dev = kmalloc(sizeof *vnic_dev, GFP_KERNEL); + if (!vnic_dev) + return; + + vnic_dev->dev = device; + INIT_LIST_HEAD(&vnic_dev->port_list); + + if (device->node_type == RDMA_NODE_IB_SWITCH) { + s = 0; + e = 0; + + } else { + s = 1; + e = device->phys_port_cnt; + + } + + for (p = s; p <= e; p++) { + port = vnic_add_port(vnic_dev, p); + if (port) + list_add_tail(&port->list, &vnic_dev->port_list); + } + + ib_set_client_data(device, &vnic_client, vnic_dev); + +} + +static void vnic_remove_one(struct ib_device *device) +{ + struct vnic_ib_device *vnic_dev; + struct vnic_ib_port *port, *tmp_port; + + vnic_dev = ib_get_client_data(device, &vnic_client); + list_for_each_entry_safe(port, tmp_port, + &vnic_dev->port_list, list) { + device_unregister(&port->pdev_info.dev); + /* + * wait for sysfs entries to go away, so that no new vnics + * are created + */ + wait_for_completion(&port->pdev_info.released); + kfree(port); + + } + kfree(vnic_dev); + + /* TODO Only those vnic interfaces associated with + * the HCA whose remove event is called should be freed + * Currently all the vnic interfaces are freed + */ + + while (!list_empty(&vnic_list)) { + struct vnic *vnic = + list_entry(vnic_list.next, struct vnic, list_ptrs); + vnic_free(vnic); + } + + vnic_npevent_cleanup(); + viport_cleanup(); + +} + +void vnic_ib_cleanup(void) +{ + IB_FUNCTION("vnic_ib_cleanup()\n"); + + if (!vnic_ib_inited) + return; + + device_unregister(&interface_dev.dev); + wait_for_completion(&interface_dev.released); + + ib_unregister_client(&vnic_client); + ib_sa_unregister_client(&vnic_sa_client); + class_unregister(&vnic_class); +} + +static void vnic_path_rec_completion(int status, + struct ib_sa_path_rec *pathrec, + void *context) +{ + struct vnic_ib_path_info *p = context; + p->status = status; + if (!status) + p->path = *pathrec; + + complete(&p->done); +} + +int vnic_ib_get_path(struct netpath *netpath, struct vnic *vnic) +{ + struct viport_config *config = netpath->viport->config; + int ret = 0; + + init_completion(&config->path_info.done); + IB_INFO("Using SA path rec get time out value of %d\n", + config->sa_path_rec_get_timeout); + config->path_info.path_query_id = + ib_sa_path_rec_get(&vnic_sa_client, + config->ibdev, + config->port, + &config->path_info.path, + IB_SA_PATH_REC_DGID | + IB_SA_PATH_REC_SGID | + IB_SA_PATH_REC_NUMB_PATH | + IB_SA_PATH_REC_PKEY, + config->sa_path_rec_get_timeout, + GFP_KERNEL, + vnic_path_rec_completion, + &config->path_info, + &config->path_info.path_query); + + if (config->path_info.path_query_id < 0) { + IB_ERROR("SA path record query failed; error %d\n", + config->path_info.path_query_id); + ret = config->path_info.path_query_id; + goto out; + } + + wait_for_completion(&config->path_info.done); + + if (config->path_info.status < 0) { + printk(KERN_WARNING PFX "connection not available to dgid " + "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x", + (int)be16_to_cpu(*(__be16 *) &config->path_info.path. + dgid.raw[0]), + (int)be16_to_cpu(*(__be16 *) &config->path_info.path. + dgid.raw[2]), + (int)be16_to_cpu(*(__be16 *) &config->path_info.path. + dgid.raw[4]), + (int)be16_to_cpu(*(__be16 *) &config->path_info.path. + dgid.raw[6]), + (int)be16_to_cpu(*(__be16 *) &config->path_info.path. + dgid.raw[8]), + (int)be16_to_cpu(*(__be16 *) &config->path_info.path. + dgid.raw[10]), + (int)be16_to_cpu(*(__be16 *) &config->path_info.path. + dgid.raw[12]), + (int)be16_to_cpu(*(__be16 *) &config->path_info.path. + dgid.raw[14])); + + if (config->path_info.status == -ETIMEDOUT) + printk(KERN_INFO " path query timed out\n"); + else if (config->path_info.status == -EIO) + printk(KERN_INFO " path query sending error\n"); + else + printk(KERN_INFO " error %d\n", + config->path_info.status); + + ret = config->path_info.status; + } +out: + if (ret) + netpath_timer(netpath, vnic->config->no_path_timeout); + + return ret; +} + +static inline void vnic_ib_handle_completions(struct ib_wc *wc, + struct vnic_ib_conn *ib_conn, + u32 *comp_num, + cycles_t *comp_time) +{ + struct io *io; + + io = (struct io *)(wc->wr_id); + vnic_ib_comp_stats(ib_conn, comp_num); + if (wc->status) { + IB_INFO("completion error wc.status %d" + " wc.opcode %d vendor err 0x%x\n", + wc->status, wc->opcode, wc->vendor_err); + } else if (io) { + vnic_ib_io_stats(io, ib_conn, *comp_time); + if (io->type == RECV_UD) { + struct ud_recv_io *recv_io = + container_of(io, struct ud_recv_io, io); + recv_io->len = wc->byte_len; + } + if (io->routine) + (*io->routine) (io); + } +} + +static void ib_qp_event(struct ib_event *event, void *context) +{ + IB_ERROR("QP event %d\n", event->event); +} + +static void vnic_ib_completion(struct ib_cq *cq, void *ptr) +{ + struct vnic_ib_conn *ib_conn = ptr; + unsigned long flags; + int compl_received; + struct ib_wc wc; + cycles_t comp_time; + u32 comp_num = 0; + + /* for multicast, cm_id is NULL, so skip that test */ + if (ib_conn->cm_id && + (ib_conn->state != IB_CONN_CONNECTED)) + return; + + /* Check if completion processing is taking place in thread + * If not then process completions in this handler, + * else set compl_received if not set, to indicate that + * there are more completions to process in thread. + */ + + spin_lock_irqsave(&ib_conn->compl_received_lock, flags); + compl_received = ib_conn->compl_received; + spin_unlock_irqrestore(&ib_conn->compl_received_lock, flags); + + if (ib_conn->in_thread || compl_received) { + if (!compl_received) { + spin_lock_irqsave(&ib_conn->compl_received_lock, flags); + ib_conn->compl_received = 1; + spin_unlock_irqrestore(&ib_conn->compl_received_lock, + flags); + } + wake_up(&(ib_conn->callback_wait_queue)); + } else { + vnic_ib_note_comptime_stats(&comp_time); + vnic_ib_callback_stats(ib_conn); + ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); + while (ib_poll_cq(cq, 1, &wc) > 0) { + vnic_ib_handle_completions(&wc, ib_conn, &comp_num, + &comp_time); + if (ib_conn->cm_id && + ib_conn->state != IB_CONN_CONNECTED) + break; + + /* If we get more completions than the completion limit + * defer completion to the thread + */ + if ((!ib_conn->in_thread) && + (comp_num >= ib_conn->ib_config->completion_limit)) { + ib_conn->in_thread = 1; + spin_lock_irqsave( + &ib_conn->compl_received_lock, flags); + ib_conn->compl_received = 1; + spin_unlock_irqrestore( + &ib_conn->compl_received_lock, flags); + wake_up(&(ib_conn->callback_wait_queue)); + break; + } + + } + vnic_ib_maxio_stats(ib_conn, comp_num); + } +} + +static int vnic_ib_mod_qp_to_rts(struct ib_cm_id *cm_id, + struct vnic_ib_conn *ib_conn) +{ + int attr_mask = 0; + int ret; + struct ib_qp_attr *qp_attr = NULL; + + qp_attr = kmalloc(sizeof *qp_attr, GFP_KERNEL); + if (!qp_attr) + return -ENOMEM; + + qp_attr->qp_state = IB_QPS_RTR; + + ret = ib_cm_init_qp_attr(cm_id, qp_attr, &attr_mask); + if (ret) + goto out; + + ret = ib_modify_qp(ib_conn->qp, qp_attr, attr_mask); + if (ret) + goto out; + + IB_INFO("QP RTR\n"); + + qp_attr->qp_state = IB_QPS_RTS; + + ret = ib_cm_init_qp_attr(cm_id, qp_attr, &attr_mask); + if (ret) + goto out; + + ret = ib_modify_qp(ib_conn->qp, qp_attr, attr_mask); + if (ret) + goto out; + + IB_INFO("QP RTS\n"); + + ret = ib_send_cm_rtu(cm_id, NULL, 0); + if (ret) + goto out; +out: + kfree(qp_attr); + return ret; +} + +int vnic_ib_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) +{ + struct vnic_ib_conn *ib_conn = cm_id->context; + struct viport *viport = ib_conn->viport; + int err = 0; + + switch (event->event) { + case IB_CM_REQ_ERROR: + IB_ERROR("sending CM REQ failed\n"); + err = 1; + viport->retry = 1; + break; + case IB_CM_REP_RECEIVED: + IB_INFO("CM REP recvd\n"); + if (vnic_ib_mod_qp_to_rts(cm_id, ib_conn)) + err = 1; + else { + ib_conn->state = IB_CONN_CONNECTED; + vnic_ib_connected_time_stats(ib_conn); + IB_INFO("RTU SENT\n"); + } + break; + case IB_CM_REJ_RECEIVED: + printk(KERN_ERR PFX " CM rejected control connection\n"); + if (event->param.rej_rcvd.reason == + IB_CM_REJ_INVALID_SERVICE_ID) + printk(KERN_ERR "reason: invalid service ID. " + "IOCGUID value specified may be incorrect\n"); + else + printk(KERN_ERR "reason code : 0x%x\n", + event->param.rej_rcvd.reason); + + err = 1; + viport->retry = 1; + break; + case IB_CM_MRA_RECEIVED: + IB_INFO("CM MRA received\n"); + break; + + case IB_CM_DREP_RECEIVED: + IB_INFO("CM DREP recvd\n"); + ib_conn->state = IB_CONN_DISCONNECTED; + break; + + case IB_CM_TIMEWAIT_EXIT: + IB_ERROR("CM timewait exit\n"); + err = 1; + break; + + default: + IB_INFO("unhandled CM event %d\n", event->event); + break; + + } + + if (err) { + ib_conn->state = IB_CONN_DISCONNECTED; + viport_failure(viport); + } + + viport_kick(viport); + return 0; +} + + +int vnic_ib_cm_connect(struct vnic_ib_conn *ib_conn) +{ + struct ib_cm_req_param *req = NULL; + struct viport *viport; + int ret = -1; + + if (!vnic_ib_conn_initted(ib_conn)) { + IB_ERROR("IB Connection out of state for CM connect (%d)\n", + ib_conn->state); + return -EINVAL; + } + + vnic_ib_conntime_stats(ib_conn); + req = kzalloc(sizeof *req, GFP_KERNEL); + if (!req) + return -ENOMEM; + + viport = ib_conn->viport; + + req->primary_path = &viport->config->path_info.path; + req->alternate_path = NULL; + req->qp_num = ib_conn->qp->qp_num; + req->qp_type = ib_conn->qp->qp_type; + req->service_id = ib_conn->ib_config->service_id; + req->private_data = &ib_conn->ib_config->conn_data; + req->private_data_len = sizeof(struct vnic_connection_data); + req->flow_control = 1; + + get_random_bytes(&req->starting_psn, 4); + req->starting_psn &= 0xffffff; + + /* + * Both responder_resources and initiator_depth are set to zero + * as we do not need RDMA read. + * + * They also must be set to zero, otherwise data connections + * are rejected by VEx. + */ + req->responder_resources = 0; + req->initiator_depth = 0; + req->remote_cm_response_timeout = 20; + req->local_cm_response_timeout = 20; + req->retry_count = ib_conn->ib_config->retry_count; + req->rnr_retry_count = ib_conn->ib_config->rnr_retry_count; + req->max_cm_retries = 15; + + ib_conn->state = IB_CONN_CONNECTING; + + ret = ib_send_cm_req(ib_conn->cm_id, req); + + kfree(req); + + if (ret) { + IB_ERROR("CM REQ sending failed; error %d \n", ret); + ib_conn->state = IB_CONN_DISCONNECTED; + } + + return ret; +} + +static int vnic_ib_init_qp(struct vnic_ib_conn *ib_conn, + struct vnic_ib_config *config, + struct ib_pd *pd, + struct viport_config *viport_config) +{ + struct ib_qp_init_attr *init_attr; + struct ib_qp_attr *attr; + int ret; + + init_attr = kzalloc(sizeof *init_attr, GFP_KERNEL); + if (!init_attr) + return -ENOMEM; + + init_attr->event_handler = ib_qp_event; + init_attr->cap.max_send_wr = config->num_sends; + init_attr->cap.max_recv_wr = config->num_recvs; + init_attr->cap.max_recv_sge = config->recv_scatter; + init_attr->cap.max_send_sge = config->send_gather; + init_attr->sq_sig_type = IB_SIGNAL_ALL_WR; + init_attr->qp_type = IB_QPT_RC; + init_attr->send_cq = ib_conn->cq; + init_attr->recv_cq = ib_conn->cq; + + ib_conn->qp = ib_create_qp(pd, init_attr); + + if (IS_ERR(ib_conn->qp)) { + ret = -1; + IB_ERROR("could not create QP\n"); + goto free_init_attr; + } + + attr = kmalloc(sizeof *attr, GFP_KERNEL); + if (!attr) { + ret = -ENOMEM; + goto destroy_qp; + } + + ret = ib_find_cached_pkey(viport_config->ibdev, + viport_config->port, + be16_to_cpu(viport_config->path_info.path. + pkey), + &attr->pkey_index); + if (ret) { + printk(KERN_WARNING PFX "ib_find_cached_pkey() failed; " + "error %d\n", ret); + goto freeattr; + } + + attr->qp_state = IB_QPS_INIT; + attr->qp_access_flags = IB_ACCESS_REMOTE_WRITE; + attr->port_num = viport_config->port; + + ret = ib_modify_qp(ib_conn->qp, attr, + IB_QP_STATE | + IB_QP_PKEY_INDEX | + IB_QP_ACCESS_FLAGS | IB_QP_PORT); + if (ret) { + printk(KERN_WARNING PFX "could not modify QP; error %d \n", + ret); + goto freeattr; + } + + kfree(attr); + kfree(init_attr); + return ret; + +freeattr: + kfree(attr); +destroy_qp: + ib_destroy_qp(ib_conn->qp); +free_init_attr: + kfree(init_attr); + return ret; +} + +int vnic_ib_conn_init(struct vnic_ib_conn *ib_conn, struct viport *viport, + struct ib_pd *pd, struct vnic_ib_config *config) +{ + struct viport_config *viport_config = viport->config; + int ret = -1; + unsigned int cq_size = config->num_sends + config->num_recvs; + + + if (!vnic_ib_conn_uninitted(ib_conn)) { + IB_ERROR("IB Connection out of state for init (%d)\n", + ib_conn->state); + return -EINVAL; + } + + ib_conn->cq = ib_create_cq(viport_config->ibdev, vnic_ib_completion, +#ifdef BUILD_FOR_OFED_1_2 + NULL, ib_conn, cq_size); +#else + NULL, ib_conn, cq_size, 0); +#endif + if (IS_ERR(ib_conn->cq)) { + IB_ERROR("could not create CQ\n"); + goto out; + } + + IB_INFO("cq created %p %d\n", ib_conn->cq, cq_size); + ib_req_notify_cq(ib_conn->cq, IB_CQ_NEXT_COMP); + init_waitqueue_head(&(ib_conn->callback_wait_queue)); + init_completion(&(ib_conn->callback_thread_exit)); + + spin_lock_init(&ib_conn->compl_received_lock); + + ib_conn->callback_thread = kthread_run(vnic_defer_completion, ib_conn, + "qlgc_vnic_def_compl"); + if (IS_ERR(ib_conn->callback_thread)) { + IB_ERROR("Could not create vnic_callback_thread;" + " error %d\n", (int) PTR_ERR(ib_conn->callback_thread)); + ib_conn->callback_thread = NULL; + goto destroy_cq; + } + + ret = vnic_ib_init_qp(ib_conn, config, pd, viport_config); + + if (ret) + goto destroy_thread; + + spin_lock_init(&ib_conn->conn_lock); + ib_conn->state = IB_CONN_INITTED; + + return ret; + +destroy_thread: + completion_callback_cleanup(ib_conn); +destroy_cq: + ib_destroy_cq(ib_conn->cq); +out: + return ret; +} + +int vnic_ib_post_recv(struct vnic_ib_conn *ib_conn, struct io *io) +{ + cycles_t post_time; + struct ib_recv_wr *bad_wr; + int ret = -1; + unsigned long flags; + + IB_FUNCTION("vnic_ib_post_recv()\n"); + + spin_lock_irqsave(&ib_conn->conn_lock, flags); + + if (!vnic_ib_conn_initted(ib_conn) && + !vnic_ib_conn_connected(ib_conn)) { + ret = -EINVAL; + goto out; + } + + vnic_ib_pre_rcvpost_stats(ib_conn, io, &post_time); + io->type = RECV; + ret = ib_post_recv(ib_conn->qp, &io->rwr, &bad_wr); + if (ret) { + IB_ERROR("error in posting rcv wr; error %d\n", ret); + ib_conn->state = IB_CONN_ERRORED; + goto out; + } + + vnic_ib_post_rcvpost_stats(ib_conn, post_time); +out: + spin_unlock_irqrestore(&ib_conn->conn_lock, flags); + return ret; + +} + +int vnic_ib_post_send(struct vnic_ib_conn *ib_conn, struct io *io) +{ + cycles_t post_time; + unsigned long flags; + struct ib_send_wr *bad_wr; + int ret = -1; + + IB_FUNCTION("vnic_ib_post_send()\n"); + + spin_lock_irqsave(&ib_conn->conn_lock, flags); + if (!vnic_ib_conn_connected(ib_conn)) { + IB_ERROR("IB Connection out of state for" + " posting sends (%d)\n", ib_conn->state); + goto out; + } + + vnic_ib_pre_sendpost_stats(io, &post_time); + if (io->swr.opcode == IB_WR_RDMA_WRITE) + io->type = RDMA; + else + io->type = SEND; + + ret = ib_post_send(ib_conn->qp, &io->swr, &bad_wr); + if (ret) { + IB_ERROR("error in posting send wr; error %d\n", ret); + ib_conn->state = IB_CONN_ERRORED; + goto out; + } + + vnic_ib_post_sendpost_stats(ib_conn, io, post_time); +out: + spin_unlock_irqrestore(&ib_conn->conn_lock, flags); + return ret; +} + +static int vnic_defer_completion(void *ptr) +{ + struct vnic_ib_conn *ib_conn = ptr; + struct ib_wc wc; + struct ib_cq *cq = ib_conn->cq; + cycles_t comp_time; + u32 comp_num = 0; + unsigned long flags; + + while (!ib_conn->callback_thread_end) { + wait_event_interruptible(ib_conn->callback_wait_queue, + ib_conn->compl_received || + ib_conn->callback_thread_end); + ib_conn->in_thread = 1; + spin_lock_irqsave(&ib_conn->compl_received_lock, flags); + ib_conn->compl_received = 0; + spin_unlock_irqrestore(&ib_conn->compl_received_lock, flags); + if (ib_conn->cm_id && + ib_conn->state != IB_CONN_CONNECTED) + goto out_thread; + + vnic_ib_note_comptime_stats(&comp_time); + vnic_ib_callback_stats(ib_conn); + ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); + while (ib_poll_cq(cq, 1, &wc) > 0) { + vnic_ib_handle_completions(&wc, ib_conn, &comp_num, + &comp_time); + if (ib_conn->cm_id && + ib_conn->state != IB_CONN_CONNECTED) + break; + } + vnic_ib_maxio_stats(ib_conn, comp_num); +out_thread: + ib_conn->in_thread = 0; + } + complete_and_exit(&(ib_conn->callback_thread_exit), 0); + return 0; +} + +void completion_callback_cleanup(struct vnic_ib_conn *ib_conn) +{ + if (ib_conn->callback_thread) { + ib_conn->callback_thread_end = 1; + wake_up(&(ib_conn->callback_wait_queue)); + wait_for_completion(&(ib_conn->callback_thread_exit)); + ib_conn->callback_thread = NULL; + } +} + +int vnic_ib_mc_init(struct mc_data *mc_data, struct viport *viport, + struct ib_pd *pd, struct vnic_ib_config *config) +{ + struct viport_config *viport_config = viport->config; + int ret = -1; + unsigned int cq_size = config->num_recvs; /* recvs only */ + + IB_FUNCTION("vnic_ib_mc_init\n"); + + mc_data->ib_conn.cq = ib_create_cq(viport_config->ibdev, vnic_ib_completion, +#ifdef BUILD_FOR_OFED_1_2 + NULL, &mc_data->ib_conn, cq_size); +#else + NULL, &mc_data->ib_conn, cq_size, 0); +#endif + if (IS_ERR(mc_data->ib_conn.cq)) { + IB_ERROR("ib_create_cq failed\n"); + goto out; + } + IB_INFO("mc cq created %p %d\n", mc_data->ib_conn.cq, cq_size); + + ret = ib_req_notify_cq(mc_data->ib_conn.cq, IB_CQ_NEXT_COMP); + if (ret) { + IB_ERROR("ib_req_notify_cq failed %x \n", ret); + goto destroy_cq; + } + + init_waitqueue_head(&(mc_data->ib_conn.callback_wait_queue)); + init_completion(&(mc_data->ib_conn.callback_thread_exit)); + + spin_lock_init(&mc_data->ib_conn.compl_received_lock); + mc_data->ib_conn.callback_thread = kthread_run(vnic_defer_completion, + &mc_data->ib_conn, + "qlgc_vnic_mc_def_compl"); + if (IS_ERR(mc_data->ib_conn.callback_thread)) { + IB_ERROR("Could not create vnic_callback_thread for MULTICAST;" + " error %d\n", + (int) PTR_ERR(mc_data->ib_conn.callback_thread)); + mc_data->ib_conn.callback_thread = NULL; + goto destroy_cq; + } + IB_INFO("callback_thread created\n"); + + ret = vnic_ib_mc_init_qp(mc_data, config, pd, viport_config); + if (ret) + goto destroy_thread; + + spin_lock_init(&mc_data->ib_conn.conn_lock); + mc_data->ib_conn.state = IB_CONN_INITTED; /* stays in this state */ + + return ret; + +destroy_thread: + completion_callback_cleanup(&mc_data->ib_conn); +destroy_cq: + ib_destroy_cq(mc_data->ib_conn.cq); + mc_data->ib_conn.cq = (struct ib_cq *)ERR_PTR(-EINVAL); +out: + return ret; +} + +static int vnic_ib_mc_init_qp(struct mc_data *mc_data, + struct vnic_ib_config *config, + struct ib_pd *pd, + struct viport_config *viport_config) +{ + struct ib_qp_init_attr *init_attr; + struct ib_qp_attr *qp_attr; + int ret; + + IB_FUNCTION("vnic_ib_mc_init_qp\n"); + + if (!mc_data->ib_conn.cq) { + IB_ERROR("cq is null\n"); + return -ENOMEM; + } + + init_attr = kzalloc(sizeof *init_attr, GFP_KERNEL); + if (!init_attr) { + IB_ERROR("failed to alloc init_attr\n"); + return -ENOMEM; + } + + init_attr->cap.max_recv_wr = config->num_recvs; + init_attr->cap.max_send_wr = 1; + init_attr->cap.max_recv_sge = 2; + init_attr->cap.max_send_sge = 1; + + /* Completion for all work requests. */ + init_attr->sq_sig_type = IB_SIGNAL_ALL_WR; + + init_attr->qp_type = IB_QPT_UD; + + init_attr->send_cq = mc_data->ib_conn.cq; + init_attr->recv_cq = mc_data->ib_conn.cq; + + IB_INFO("creating qp %d \n", config->num_recvs); + + mc_data->ib_conn.qp = ib_create_qp(pd, init_attr); + + if (IS_ERR(mc_data->ib_conn.qp)) { + ret = -1; + IB_ERROR("could not create QP\n"); + goto free_init_attr; + } + + qp_attr = kzalloc(sizeof *qp_attr, GFP_KERNEL); + if (!qp_attr) { + ret = -ENOMEM; + goto destroy_qp; + } + + qp_attr->qp_state = IB_QPS_INIT; + qp_attr->port_num = viport_config->port; + qp_attr->qkey = IOC_NUMBER(be64_to_cpu(viport_config->ioc_guid)); + qp_attr->pkey_index = 0; + /* cannot set access flags for UD qp + qp_attr->qp_access_flags = IB_ACCESS_REMOTE_WRITE; */ + + IB_INFO("port_num:%d qkey:%d pkey:%d\n", qp_attr->port_num, + qp_attr->qkey, qp_attr->pkey_index); + ret = ib_modify_qp(mc_data->ib_conn.qp, qp_attr, + IB_QP_STATE | + IB_QP_PKEY_INDEX | + IB_QP_QKEY | + + /* cannot set this for UD + IB_QP_ACCESS_FLAGS | */ + + IB_QP_PORT); + if (ret) { + IB_ERROR("ib_modify_qp to INIT failed %d \n", ret); + goto free_qp_attr; + } + + kfree(qp_attr); + kfree(init_attr); + return ret; + +free_qp_attr: + kfree(qp_attr); +destroy_qp: + ib_destroy_qp(mc_data->ib_conn.qp); + mc_data->ib_conn.qp = ERR_PTR(-EINVAL); +free_init_attr: + kfree(init_attr); + return ret; +} + +int vnic_ib_mc_mod_qp_to_rts(struct ib_qp *qp) +{ + int ret; + struct ib_qp_attr *qp_attr = NULL; + + IB_FUNCTION("vnic_ib_mc_mod_qp_to_rts\n"); + qp_attr = kmalloc(sizeof *qp_attr, GFP_KERNEL); + if (!qp_attr) + return -ENOMEM; + + memset(qp_attr, 0, sizeof *qp_attr); + qp_attr->qp_state = IB_QPS_RTR; + + ret = ib_modify_qp(qp, qp_attr, IB_QP_STATE); + if (ret) { + IB_ERROR("ib_modify_qp to RTR failed %d\n", ret); + goto out; + } + IB_INFO("MC QP RTR\n"); + + memset(qp_attr, 0, sizeof *qp_attr); + qp_attr->qp_state = IB_QPS_RTS; + qp_attr->sq_psn = 0; + + ret = ib_modify_qp(qp, qp_attr, IB_QP_STATE | IB_QP_SQ_PSN); + if (ret) { + IB_ERROR("ib_modify_qp to RTS failed %d\n", ret); + goto out; + } + IB_INFO("MC QP RTS\n"); + + return 0; + +out: + kfree(qp_attr); + return -1; +} + +int vnic_ib_mc_post_recv(struct mc_data *mc_data, struct io *io) +{ + cycles_t post_time; + struct ib_recv_wr *bad_wr; + int ret = -1; + + IB_FUNCTION("vnic_ib_mc_post_recv()\n"); + + vnic_ib_pre_rcvpost_stats(&mc_data->ib_conn, io, &post_time); + io->type = RECV_UD; + ret = ib_post_recv(mc_data->ib_conn.qp, &io->rwr, &bad_wr); + if (ret) { + IB_ERROR("error in posting rcv wr; error %d\n", ret); + goto out; + } + vnic_ib_post_rcvpost_stats(&mc_data->ib_conn, post_time); + +out: + return ret; +} diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_ib.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_ib.h new file mode 100644 index 0000000..ebf9ef5 --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_ib.h @@ -0,0 +1,206 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_IB_H_INCLUDED +#define VNIC_IB_H_INCLUDED + +#include +#include +#include +#include +#include +#include + +#include "vnic_sys.h" +#include "vnic_netpath.h" +#define PFX "qlgc_vnic: " + +struct io; +typedef void (comp_routine_t) (struct io *io); + +enum vnic_ib_conn_state { + IB_CONN_UNINITTED = 0, + IB_CONN_INITTED = 1, + IB_CONN_CONNECTING = 2, + IB_CONN_CONNECTED = 3, + IB_CONN_DISCONNECTED = 4, + IB_CONN_ERRORED = 5 +}; + +struct vnic_ib_conn { + struct viport *viport; + struct vnic_ib_config *ib_config; + spinlock_t conn_lock; + enum vnic_ib_conn_state state; + struct ib_qp *qp; + struct ib_cq *cq; + struct ib_cm_id *cm_id; + int callback_thread_end; + struct task_struct *callback_thread; + wait_queue_head_t callback_wait_queue; + u32 in_thread; + u32 compl_received; + struct completion callback_thread_exit; + spinlock_t compl_received_lock; +#ifdef CONFIG_INFINIBAND_QLGC_VNIC_STATS + struct { + cycles_t connection_time; + cycles_t rdma_post_time; + u32 rdma_post_ios; + cycles_t rdma_comp_time; + u32 rdma_comp_ios; + cycles_t send_post_time; + u32 send_post_ios; + cycles_t send_comp_time; + u32 send_comp_ios; + cycles_t recv_post_time; + u32 recv_post_ios; + cycles_t recv_comp_time; + u32 recv_comp_ios; + u32 num_ios; + u32 num_callbacks; + u32 max_ios; + } statistics; +#endif /* CONFIG_INFINIBAND_QLGC_VNIC_STATS */ +}; + +struct vnic_ib_path_info { + struct ib_sa_path_rec path; + struct ib_sa_query *path_query; + int path_query_id; + int status; + struct completion done; +}; + +struct vnic_ib_device { + struct ib_device *dev; + struct list_head port_list; +}; + +struct vnic_ib_port { + struct vnic_ib_device *dev; + u8 port_num; + struct dev_info pdev_info; + struct list_head list; +}; + +struct io { + struct list_head list_ptrs; + struct viport *viport; + comp_routine_t *routine; + struct ib_recv_wr rwr; + struct ib_send_wr swr; +#ifdef CONFIG_INFINIBAND_QLGC_VNIC_STATS + cycles_t time; +#endif /* CONFIG_INFINIBAND_QLGC_VNIC_STATS */ + enum {RECV, RDMA, SEND, RECV_UD} type; +}; + +struct rdma_io { + struct io io; + struct ib_sge list[2]; + u16 index; + u16 len; + u8 *data; + dma_addr_t data_dma; + struct sk_buff *skb; + dma_addr_t skb_data_dma; + struct viport_trailer *trailer; + dma_addr_t trailer_dma; +}; + +struct send_io { + struct io io; + struct ib_sge list; + u8 *virtual_addr; +}; + +struct recv_io { + struct io io; + struct ib_sge list; + u8 *virtual_addr; +}; + +struct ud_recv_io { + struct io io; + u16 len; + dma_addr_t skb_data_dma; + struct ib_sge list[2]; /* one for grh and other for rest of pkt. */ + struct sk_buff *skb; +}; + +int vnic_ib_init(void); +void vnic_ib_cleanup(void); + +struct vnic; +int vnic_ib_get_path(struct netpath *netpath, struct vnic *vnic); +int vnic_ib_conn_init(struct vnic_ib_conn *ib_conn, struct viport *viport, + struct ib_pd *pd, struct vnic_ib_config *config); + +int vnic_ib_post_recv(struct vnic_ib_conn *ib_conn, struct io *io); +int vnic_ib_post_send(struct vnic_ib_conn *ib_conn, struct io *io); +int vnic_ib_cm_connect(struct vnic_ib_conn *ib_conn); +int vnic_ib_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event); + +#define vnic_ib_conn_uninitted(ib_conn) \ + ((ib_conn)->state == IB_CONN_UNINITTED) +#define vnic_ib_conn_initted(ib_conn) \ + ((ib_conn)->state == IB_CONN_INITTED) +#define vnic_ib_conn_connecting(ib_conn) \ + ((ib_conn)->state == IB_CONN_CONNECTING) +#define vnic_ib_conn_connected(ib_conn) \ + ((ib_conn)->state == IB_CONN_CONNECTED) +#define vnic_ib_conn_disconnected(ib_conn) \ + ((ib_conn)->state == IB_CONN_DISCONNECTED) + +#define MCAST_GROUP_INVALID 0x00 /* viport failed to join or left mc group */ +#define MCAST_GROUP_JOINING 0x01 /* wait for completion */ +#define MCAST_GROUP_JOINED 0x02 /* join process completed successfully */ + +/* vnic_sa_client is used to register with sa once. It is needed to join and + * leave multicast groups. + */ +extern struct ib_sa_client vnic_sa_client; + +/* The following functions are using initialize and handle multicast + * components. + */ +struct mc_data; /* forward declaration */ +/* Initialize all necessary mc components */ +int vnic_ib_mc_init(struct mc_data *mc_data, struct viport *viport, + struct ib_pd *pd, struct vnic_ib_config *config); +/* Put multicast qp in RTS */ +int vnic_ib_mc_mod_qp_to_rts(struct ib_qp *qp); +/* Post multicast receive buffers */ +int vnic_ib_mc_post_recv(struct mc_data *mc_data, struct io *io); + +#endif /* VNIC_IB_H_INCLUDED */ From ramachandra.kuchimanchi at qlogic.com Wed Apr 30 10:19:55 2008 From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K) Date: Wed, 30 Apr 2008 22:49:55 +0530 Subject: [ofa-general] [PATCH 08/13] QLogic VNIC: sysfs interface implementation for the driver In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain> References: <20080430171028.31725.86190.stgit@localhost.localdomain> Message-ID: <20080430171955.31725.7771.stgit@localhost.localdomain> From: Amar Mudrankit The sysfs interface for the QLogic VNIC driver is implemented through this patch. Signed-off-by: Ramachandra K Signed-off-by: Poornima Kamath --- drivers/infiniband/ulp/qlgc_vnic/vnic_sys.c | 1127 +++++++++++++++++++++++++++ drivers/infiniband/ulp/qlgc_vnic/vnic_sys.h | 62 + 2 files changed, 1189 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_sys.c create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_sys.h diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_sys.c b/drivers/infiniband/ulp/qlgc_vnic/vnic_sys.c new file mode 100644 index 0000000..7e70b0c --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_sys.c @@ -0,0 +1,1127 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include + +#include "vnic_util.h" +#include "vnic_config.h" +#include "vnic_ib.h" +#include "vnic_viport.h" +#include "vnic_main.h" +#include "vnic_stats.h" + +/* + * target eiocs are added by writing + * + * ioc_guid=,dgid=,pkey=,name= + * to the create_primary sysfs attribute. + */ +enum { + VNIC_OPT_ERR = 0, + VNIC_OPT_IOC_GUID = 1 << 0, + VNIC_OPT_DGID = 1 << 1, + VNIC_OPT_PKEY = 1 << 2, + VNIC_OPT_NAME = 1 << 3, + VNIC_OPT_INSTANCE = 1 << 4, + VNIC_OPT_RXCSUM = 1 << 5, + VNIC_OPT_TXCSUM = 1 << 6, + VNIC_OPT_HEARTBEAT = 1 << 7, + VNIC_OPT_IOC_STRING = 1 << 8, + VNIC_OPT_IB_MULTICAST = 1 << 9, + VNIC_OPT_ALL = (VNIC_OPT_IOC_GUID | + VNIC_OPT_DGID | VNIC_OPT_NAME | VNIC_OPT_PKEY), +}; + +static match_table_t vnic_opt_tokens = { + {VNIC_OPT_IOC_GUID, "ioc_guid=%s"}, + {VNIC_OPT_DGID, "dgid=%s"}, + {VNIC_OPT_PKEY, "pkey=%x"}, + {VNIC_OPT_NAME, "name=%s"}, + {VNIC_OPT_INSTANCE, "instance=%d"}, + {VNIC_OPT_RXCSUM, "rx_csum=%s"}, + {VNIC_OPT_TXCSUM, "tx_csum=%s"}, + {VNIC_OPT_HEARTBEAT, "heartbeat=%d"}, + {VNIC_OPT_IOC_STRING, "ioc_string=\"%s"}, + {VNIC_OPT_IB_MULTICAST, "ib_multicast=%s"}, + {VNIC_OPT_ERR, NULL} +}; + +void vnic_release_dev(struct device *dev) +{ + struct dev_info *dev_info = + container_of(dev, struct dev_info, dev); + + complete(&dev_info->released); + +} + +struct class vnic_class = { + .name = "infiniband_qlgc_vnic", + .dev_release = vnic_release_dev +}; + +struct dev_info interface_dev; + +DEVICE_ATTR(create_primary, S_IWUSR, NULL, vnic_create_primary); +DEVICE_ATTR(create_secondary, S_IWUSR, NULL, vnic_create_secondary); +DEVICE_ATTR(delete_vnic, S_IWUSR, NULL, vnic_delete); + +static int vnic_parse_options(const char *buf, struct path_param *param) +{ + char *options, *sep_opt; + char *p; + char dgid[3]; + substring_t args[MAX_OPT_ARGS]; + int opt_mask = 0; + int token; + int ret = -EINVAL; + int i, len; + + options = kstrdup(buf, GFP_KERNEL); + if (!options) + return -ENOMEM; + + sep_opt = options; + while ((p = strsep(&sep_opt, ",")) != NULL) { + if (!*p) + continue; + + token = match_token(p, vnic_opt_tokens, args); + opt_mask |= token; + + switch (token) { + case VNIC_OPT_IOC_GUID: + p = match_strdup(args); + param->ioc_guid = cpu_to_be64(simple_strtoull(p, NULL, + 16)); + kfree(p); + break; + + case VNIC_OPT_DGID: + p = match_strdup(args); + if (strlen(p) != 32) { + printk(KERN_WARNING PFX + "bad dest GID parameter '%s'\n", p); + kfree(p); + goto out; + } + + for (i = 0; i < 16; ++i) { + strlcpy(dgid, p + i * 2, 3); + param->dgid[i] = simple_strtoul(dgid, NULL, + 16); + + } + kfree(p); + break; + + case VNIC_OPT_PKEY: + if (match_hex(args, &token)) { + printk(KERN_WARNING PFX + "bad P_key parameter '%s'\n", p); + goto out; + } + param->pkey = cpu_to_be16(token); + break; + + case VNIC_OPT_NAME: + p = match_strdup(args); + if (strlen(p) >= IFNAMSIZ) { + printk(KERN_WARNING PFX + "interface name parameter too long\n"); + kfree(p); + goto out; + } + strcpy(param->name, p); + kfree(p); + break; + case VNIC_OPT_INSTANCE: + if (match_int(args, &token)) { + printk(KERN_WARNING PFX + "bad instance parameter '%s'\n", p); + goto out; + } + + if (token > 255 || token < 0) { + printk(KERN_WARNING PFX + "instance parameter must be" + " >= 0 and <= 255\n"); + goto out; + } + + param->instance = token; + break; + case VNIC_OPT_RXCSUM: + p = match_strdup(args); + if (!strncmp(p, "true", 4)) + param->rx_csum = 1; + else if (!strncmp(p, "false", 5)) + param->rx_csum = 0; + else { + printk(KERN_WARNING PFX + "bad rx_csum parameter." + " must be 'true' or 'false'\n"); + kfree(p); + goto out; + } + kfree(p); + break; + case VNIC_OPT_TXCSUM: + p = match_strdup(args); + if (!strncmp(p, "true", 4)) + param->tx_csum = 1; + else if (!strncmp(p, "false", 5)) + param->tx_csum = 0; + else { + printk(KERN_WARNING PFX + "bad tx_csum parameter." + " must be 'true' or 'false'\n"); + kfree(p); + goto out; + } + kfree(p); + break; + case VNIC_OPT_HEARTBEAT: + if (match_int(args, &token)) { + printk(KERN_WARNING PFX + "bad instance parameter '%s'\n", p); + goto out; + } + + if (token > 6000 || token <= 0) { + printk(KERN_WARNING PFX + "heartbeat parameter must be" + " > 0 and <= 6000\n"); + goto out; + } + param->heartbeat = token; + break; + case VNIC_OPT_IOC_STRING: + p = match_strdup(args); + len = strlen(p); + if (len > MAX_IOC_STRING_LEN) { + printk(KERN_WARNING PFX + "ioc string parameter too long\n"); + kfree(p); + goto out; + } + strcpy(param->ioc_string, p); + if (*(p + len - 1) != '\"') { + strcat(param->ioc_string, ","); + kfree(p); + p = strsep(&sep_opt, "\""); + strcat(param->ioc_string, p); + sep_opt++; + } else { + *(param->ioc_string + len - 1) = '\0'; + kfree(p); + } + break; + case VNIC_OPT_IB_MULTICAST: + p = match_strdup(args); + if (!strncmp(p, "true", 4)) + param->ib_multicast = 1; + else if (!strncmp(p, "false", 5)) + param->ib_multicast = 0; + else { + printk(KERN_WARNING PFX + "bad ib_multicast parameter." + " must be 'true' or 'false'\n"); + kfree(p); + goto out; + } + kfree(p); + break; + default: + printk(KERN_WARNING PFX + "unknown parameter or missing value " + "'%s' in target creation request\n", p); + goto out; + } + + } + + if ((opt_mask & VNIC_OPT_ALL) == VNIC_OPT_ALL) + ret = 0; + else + for (i = 0; i < ARRAY_SIZE(vnic_opt_tokens); ++i) + if ((vnic_opt_tokens[i].token & VNIC_OPT_ALL) && + !(vnic_opt_tokens[i].token & opt_mask)) + printk(KERN_WARNING PFX + "target creation request is " + "missing parameter '%s'\n", + vnic_opt_tokens[i].pattern); + +out: + kfree(options); + return ret; + +} + +static ssize_t show_vnic_state(struct device *dev, + struct device_attribute *dev_attr, char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + struct vnic *vnic = container_of(info, struct vnic, dev_info); + switch (vnic->state) { + case VNIC_UNINITIALIZED: + return sprintf(buf, "VNIC_UNINITIALIZED\n"); + case VNIC_REGISTERED: + return sprintf(buf, "VNIC_REGISTERED\n"); + default: + return sprintf(buf, "INVALID STATE\n"); + } + +} + +static DEVICE_ATTR(vnic_state, S_IRUGO, show_vnic_state, NULL); + +static ssize_t show_rx_csum(struct device *dev, + struct device_attribute *dev_attr, char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + struct vnic *vnic = container_of(info, struct vnic, dev_info); + + if (vnic->config->use_rx_csum) + return sprintf(buf, "true\n"); + else + return sprintf(buf, "false\n"); +} + +static DEVICE_ATTR(rx_csum, S_IRUGO, show_rx_csum, NULL); + +static ssize_t show_tx_csum(struct device *dev, + struct device_attribute *dev_attr, char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + struct vnic *vnic = container_of(info, struct vnic, dev_info); + + if (vnic->config->use_tx_csum) + return sprintf(buf, "true\n"); + else + return sprintf(buf, "false\n"); +} + +static DEVICE_ATTR(tx_csum, S_IRUGO, show_tx_csum, NULL); + +static ssize_t show_current_path(struct device *dev, + struct device_attribute *dev_attr, char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + struct vnic *vnic = container_of(info, struct vnic, dev_info); + + if (vnic->current_path == &vnic->primary_path) + return sprintf(buf, "primary path\n"); + else if (vnic->current_path == &vnic->secondary_path) + return sprintf(buf, "secondary path\n"); + else + return sprintf(buf, "none\n"); + +} + +static DEVICE_ATTR(current_path, S_IRUGO, show_current_path, NULL); + +static struct attribute *vnic_dev_attrs[] = { + &dev_attr_vnic_state.attr, + &dev_attr_rx_csum.attr, + &dev_attr_tx_csum.attr, + &dev_attr_current_path.attr, + NULL +}; + +struct attribute_group vnic_dev_attr_group = { + .attrs = vnic_dev_attrs, +}; + +static inline void print_dgid(u8 *dgid) +{ + int i; + + for (i = 0; i < 16; i += 2) + printk("%04x", be16_to_cpu(*(__be16 *)&dgid[i])); +} + +static inline int is_dgid_zero(u8 *dgid) +{ + int i; + + for (i = 0; i < 16; i++) { + if (dgid[i] != 0) + return 1; + } + return 0; +} + +static int create_netpath(struct netpath *npdest, + struct path_param *p_params) +{ + struct viport_config *viport_config; + struct viport *viport; + struct vnic *vnic; + struct list_head *ptr; + int ret = 0; + + list_for_each(ptr, &vnic_list) { + vnic = list_entry(ptr, struct vnic, list_ptrs); + if (vnic->primary_path.viport) { + viport_config = vnic->primary_path.viport->config; + if ((viport_config->ioc_guid == p_params->ioc_guid) + && (viport_config->control_config.vnic_instance + == p_params->instance) + && (be64_to_cpu(p_params->ioc_guid))) { + SYS_ERROR("GUID %llx," + " INSTANCE %d already in use\n", + be64_to_cpu(p_params->ioc_guid), + p_params->instance); + ret = -EINVAL; + goto out; + } + } + + if (vnic->secondary_path.viport) { + viport_config = vnic->secondary_path.viport->config; + if ((viport_config->ioc_guid == p_params->ioc_guid) + && (viport_config->control_config.vnic_instance + == p_params->instance) + && (be64_to_cpu(p_params->ioc_guid))) { + SYS_ERROR("GUID %llx," + " INSTANCE %d already in use\n", + be64_to_cpu(p_params->ioc_guid), + p_params->instance); + ret = -EINVAL; + goto out; + } + } + } + + if (npdest->viport) { + SYS_ERROR("create_netpath: path already exists\n"); + ret = -EINVAL; + goto out; + } + + viport_config = config_alloc_viport(p_params); + if (!viport_config) { + SYS_ERROR("create_netpath: failed creating viport config\n"); + ret = -1; + goto out; + } + + /*User specified heartbeat value is in 1/100s of a sec*/ + if (p_params->heartbeat != -1) { + viport_config->hb_interval = + msecs_to_jiffies(p_params->heartbeat * 10); + viport_config->hb_timeout = + (p_params->heartbeat << 6) * 10000; /* usec */ + } + + viport_config->path_idx = 0; + + viport = viport_allocate(viport_config); + if (!viport) { + SYS_ERROR("create_netpath: failed creating viport\n"); + kfree(viport_config); + ret = -1; + goto out; + } + + npdest->viport = viport; + viport->parent = npdest; + viport->vnic = npdest->parent; + + if (is_dgid_zero(p_params->dgid) && p_params->ioc_guid != 0 + && p_params->pkey != 0) { + viport_kick(viport); + vnic_disconnected(npdest->parent, npdest); + } else { + printk(KERN_WARNING "Specified parameters IOCGUID=%llx, " + "P_Key=%x, DGID=", be64_to_cpu(p_params->ioc_guid), + p_params->pkey); + print_dgid(p_params->dgid); + printk(" insufficient for establishing %s path for interface " + "%s. Hence, path will not be established.\n", + (npdest->second_bias ? "secondary" : "primary"), + p_params->name); + } +out: + return ret; +} + +static struct vnic *create_vnic(struct path_param *param) +{ + struct vnic_config *vnic_config; + struct vnic *vnic; + struct list_head *ptr; + + SYS_INFO("create_vnic: name = %s\n", param->name); + list_for_each(ptr, &vnic_list) { + vnic = list_entry(ptr, struct vnic, list_ptrs); + if (!strcmp(vnic->config->name, param->name)) { + SYS_ERROR("vnic %s already exists\n", + param->name); + return NULL; + } + } + + vnic_config = config_alloc_vnic(); + if (!vnic_config) { + SYS_ERROR("create_vnic: failed creating vnic config\n"); + return NULL; + } + + if (param->rx_csum != -1) + vnic_config->use_rx_csum = param->rx_csum; + + if (param->tx_csum != -1) + vnic_config->use_tx_csum = param->tx_csum; + + strcpy(vnic_config->name, param->name); + vnic = vnic_allocate(vnic_config); + if (!vnic) { + SYS_ERROR("create_vnic: failed allocating vnic\n"); + goto free_vnic_config; + } + + init_completion(&vnic->dev_info.released); + + vnic->dev_info.dev.class = NULL; + vnic->dev_info.dev.parent = &interface_dev.dev; + vnic->dev_info.dev.release = vnic_release_dev; + snprintf(vnic->dev_info.dev.bus_id, BUS_ID_SIZE, + vnic_config->name); + + if (device_register(&vnic->dev_info.dev)) { + SYS_ERROR("create_vnic: error in registering" + " vnic class dev\n"); + goto free_vnic; + } + + if (sysfs_create_group(&vnic->dev_info.dev.kobj, + &vnic_dev_attr_group)) { + SYS_ERROR("create_vnic: error in creating" + "vnic attr group\n"); + goto err_attr; + + } + + if (vnic_setup_stats_files(vnic)) + goto err_stats; + + return vnic; +err_stats: + sysfs_remove_group(&vnic->dev_info.dev.kobj, + &vnic_dev_attr_group); +err_attr: + device_unregister(&vnic->dev_info.dev); + wait_for_completion(&vnic->dev_info.released); +free_vnic: + list_del(&vnic->list_ptrs); + kfree(vnic); +free_vnic_config: + kfree(vnic_config); + return NULL; +} + +ssize_t vnic_delete(struct device *dev, struct device_attribute *dev_attr, + const char *buf, size_t count) +{ + struct vnic *vnic; + struct list_head *ptr; + int ret = -EINVAL; + + if (count > IFNAMSIZ) { + printk(KERN_WARNING PFX "invalid vnic interface name\n"); + return ret; + } + + SYS_INFO("vnic_delete: name = %s\n", buf); + list_for_each(ptr, &vnic_list) { + vnic = list_entry(ptr, struct vnic, list_ptrs); + if (!strcmp(vnic->config->name, buf)) { + vnic_free(vnic); + return count; + } + } + + printk(KERN_WARNING PFX "vnic interface '%s' does not exist\n", buf); + return ret; +} + +static ssize_t show_viport_state(struct device *dev, + struct device_attribute *dev_attr, char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + struct netpath *path = container_of(info, struct netpath, dev_info); + switch (path->viport->state) { + case VIPORT_DISCONNECTED: + return sprintf(buf, "VIPORT_DISCONNECTED\n"); + case VIPORT_CONNECTED: + return sprintf(buf, "VIPORT_CONNECTED\n"); + default: + return sprintf(buf, "INVALID STATE\n"); + } + +} + +static DEVICE_ATTR(viport_state, S_IRUGO, show_viport_state, NULL); + +static ssize_t show_link_state(struct device *dev, + struct device_attribute *dev_attr, char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + struct netpath *path = container_of(info, struct netpath, dev_info); + + switch (path->viport->link_state) { + case LINK_UNINITIALIZED: + return sprintf(buf, "LINK_UNINITIALIZED\n"); + case LINK_INITIALIZE: + return sprintf(buf, "LINK_INITIALIZE\n"); + case LINK_INITIALIZECONTROL: + return sprintf(buf, "LINK_INITIALIZECONTROL\n"); + case LINK_INITIALIZEDATA: + return sprintf(buf, "LINK_INITIALIZEDATA\n"); + case LINK_CONTROLCONNECT: + return sprintf(buf, "LINK_CONTROLCONNECT\n"); + case LINK_CONTROLCONNECTWAIT: + return sprintf(buf, "LINK_CONTROLCONNECTWAIT\n"); + case LINK_INITVNICREQ: + return sprintf(buf, "LINK_INITVNICREQ\n"); + case LINK_INITVNICRSP: + return sprintf(buf, "LINK_INITVNICRSP\n"); + case LINK_BEGINDATAPATH: + return sprintf(buf, "LINK_BEGINDATAPATH\n"); + case LINK_CONFIGDATAPATHREQ: + return sprintf(buf, "LINK_CONFIGDATAPATHREQ\n"); + case LINK_CONFIGDATAPATHRSP: + return sprintf(buf, "LINK_CONFIGDATAPATHRSP\n"); + case LINK_DATACONNECT: + return sprintf(buf, "LINK_DATACONNECT\n"); + case LINK_DATACONNECTWAIT: + return sprintf(buf, "LINK_DATACONNECTWAIT\n"); + case LINK_XCHGPOOLREQ: + return sprintf(buf, "LINK_XCHGPOOLREQ\n"); + case LINK_XCHGPOOLRSP: + return sprintf(buf, "LINK_XCHGPOOLRSP\n"); + case LINK_INITIALIZED: + return sprintf(buf, "LINK_INITIALIZED\n"); + case LINK_IDLE: + return sprintf(buf, "LINK_IDLE\n"); + case LINK_IDLING: + return sprintf(buf, "LINK_IDLING\n"); + case LINK_CONFIGLINKREQ: + return sprintf(buf, "LINK_CONFIGLINKREQ\n"); + case LINK_CONFIGLINKRSP: + return sprintf(buf, "LINK_CONFIGLINKRSP\n"); + case LINK_CONFIGADDRSREQ: + return sprintf(buf, "LINK_CONFIGADDRSREQ\n"); + case LINK_CONFIGADDRSRSP: + return sprintf(buf, "LINK_CONFIGADDRSRSP\n"); + case LINK_REPORTSTATREQ: + return sprintf(buf, "LINK_REPORTSTATREQ\n"); + case LINK_REPORTSTATRSP: + return sprintf(buf, "LINK_REPORTSTATRSP\n"); + case LINK_HEARTBEATREQ: + return sprintf(buf, "LINK_HEARTBEATREQ\n"); + case LINK_HEARTBEATRSP: + return sprintf(buf, "LINK_HEARTBEATRSP\n"); + case LINK_RESET: + return sprintf(buf, "LINK_RESET\n"); + case LINK_RESETRSP: + return sprintf(buf, "LINK_RESETRSP\n"); + case LINK_RESETCONTROL: + return sprintf(buf, "LINK_RESETCONTROL\n"); + case LINK_RESETCONTROLRSP: + return sprintf(buf, "LINK_RESETCONTROLRSP\n"); + case LINK_DATADISCONNECT: + return sprintf(buf, "LINK_DATADISCONNECT\n"); + case LINK_CONTROLDISCONNECT: + return sprintf(buf, "LINK_CONTROLDISCONNECT\n"); + case LINK_CLEANUPDATA: + return sprintf(buf, "LINK_CLEANUPDATA\n"); + case LINK_CLEANUPCONTROL: + return sprintf(buf, "LINK_CLEANUPCONTROL\n"); + case LINK_DISCONNECTED: + return sprintf(buf, "LINK_DISCONNECTED\n"); + case LINK_RETRYWAIT: + return sprintf(buf, "LINK_RETRYWAIT\n"); + default: + return sprintf(buf, "INVALID STATE\n"); + + } + +} +static DEVICE_ATTR(link_state, S_IRUGO, show_link_state, NULL); + +static ssize_t show_heartbeat(struct device *dev, + struct device_attribute *dev_attr, char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + + struct netpath *path = container_of(info, struct netpath, dev_info); + + /* hb_inteval is in jiffies, convert it back to + * 1/100ths of a second + */ + return sprintf(buf, "%d\n", + (jiffies_to_msecs(path->viport->config->hb_interval)/10)); +} + +static DEVICE_ATTR(heartbeat, S_IRUGO, show_heartbeat, NULL); + +static ssize_t show_ioc_guid(struct device *dev, + struct device_attribute *dev_attr, char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + + struct netpath *path = container_of(info, struct netpath, dev_info); + + return sprintf(buf, "%llx\n", + __be64_to_cpu(path->viport->config->ioc_guid)); +} + +static DEVICE_ATTR(ioc_guid, S_IRUGO, show_ioc_guid, NULL); + +static inline void get_dgid_string(u8 *dgid, char *buf) +{ + int i; + char holder[5]; + + for (i = 0; i < 16; i += 2) { + sprintf(holder, "%04x", be16_to_cpu(*(__be16 *)&dgid[i])); + strcat(buf, holder); + } + + strcat(buf, "\n"); +} + +static ssize_t show_dgid(struct device *dev, + struct device_attribute *dev_attr, char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + + struct netpath *path = container_of(info, struct netpath, dev_info); + + get_dgid_string(path->viport->config->path_info.path.dgid.raw, buf); + + return strlen(buf); +} + +static DEVICE_ATTR(dgid, S_IRUGO, show_dgid, NULL); + +static ssize_t show_pkey(struct device *dev, + struct device_attribute *dev_attr, char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + + struct netpath *path = container_of(info, struct netpath, dev_info); + + return sprintf(buf, "%x\n", path->viport->config->path_info.path.pkey); +} + +static DEVICE_ATTR(pkey, S_IRUGO, show_pkey, NULL); + +static ssize_t show_hca_info(struct device *dev, + struct device_attribute *dev_attr, char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + + struct netpath *path = container_of(info, struct netpath, dev_info); + + return sprintf(buf, "vnic-%s-%d\n", path->viport->config->ibdev->name, + path->viport->config->port); +} + +static DEVICE_ATTR(hca_info, S_IRUGO, show_hca_info, NULL); + +static ssize_t show_ioc_string(struct device *dev, + struct device_attribute *dev_attr, char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + + struct netpath *path = container_of(info, struct netpath, dev_info); + + return sprintf(buf, "%s\n", path->viport->config->ioc_string); +} + +static DEVICE_ATTR(ioc_string, S_IRUGO, show_ioc_string, NULL); + +static ssize_t show_multicast_state(struct device *dev, + struct device_attribute *dev_attr, + char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + + struct netpath *path = container_of(info, struct netpath, dev_info); + + if (!(path->viport->features_supported & VNIC_FEAT_INBOUND_IB_MC)) + return sprintf(buf, "feature not enabled\n"); + + switch (path->viport->mc_info.state) { + case MCAST_STATE_INVALID: + return sprintf(buf, "state=Invalid\n"); + case MCAST_STATE_JOINING: + return sprintf(buf, "state=Joining MGID:" VNIC_GID_FMT "\n", + VNIC_GID_RAW_ARG(path->viport->mc_info.mgid.raw)); + case MCAST_STATE_ATTACHING: + return sprintf(buf, "state=Attaching MGID:" VNIC_GID_FMT + " MLID:%X\n", + VNIC_GID_RAW_ARG(path->viport->mc_info.mgid.raw), + path->viport->mc_info.mlid); + case MCAST_STATE_JOINED_ATTACHED: + return sprintf(buf, + "state=Joined & Attached MGID:" VNIC_GID_FMT + " MLID:%X\n", + VNIC_GID_RAW_ARG(path->viport->mc_info.mgid.raw), + path->viport->mc_info.mlid); + case MCAST_STATE_DETACHING: + return sprintf(buf, "state=Detaching MGID: " VNIC_GID_FMT "\n", + VNIC_GID_RAW_ARG(path->viport->mc_info.mgid.raw)); + case MCAST_STATE_RETRIED: + return sprintf(buf, "state=Retries Exceeded\n"); + } + return sprintf(buf, "invalid state\n"); +} + +static DEVICE_ATTR(multicast_state, S_IRUGO, show_multicast_state, NULL); + +static struct attribute *vnic_path_attrs[] = { + &dev_attr_viport_state.attr, + &dev_attr_link_state.attr, + &dev_attr_heartbeat.attr, + &dev_attr_ioc_guid.attr, + &dev_attr_dgid.attr, + &dev_attr_pkey.attr, + &dev_attr_hca_info.attr, + &dev_attr_ioc_string.attr, + &dev_attr_multicast_state.attr, + NULL +}; + +struct attribute_group vnic_path_attr_group = { + .attrs = vnic_path_attrs, +}; + + +static int setup_path_class_files(struct netpath *path, char *name) +{ + init_completion(&path->dev_info.released); + + path->dev_info.dev.class = NULL; + path->dev_info.dev.parent = &path->parent->dev_info.dev; + path->dev_info.dev.release = vnic_release_dev; + snprintf(path->dev_info.dev.bus_id, BUS_ID_SIZE, name); + + if (device_register(&path->dev_info.dev)) { + SYS_ERROR("error in registering path class dev\n"); + goto out; + } + + if (sysfs_create_group(&path->dev_info.dev.kobj, + &vnic_path_attr_group)) { + SYS_ERROR("error in creating vnic path group attrs"); + goto err_path; + } + + return 0; + +err_path: + device_unregister(&path->dev_info.dev); + wait_for_completion(&path->dev_info.released); +out: + return -1; + +} + +static inline void update_dgids(u8 *old, u8 *new, char *vnic_name, + char *path_name) +{ + int i; + + if (!memcmp(old, new, 16)) + return; + + printk(KERN_INFO PFX "Changing dgid from 0x"); + print_dgid(old); + printk(" to 0x"); + print_dgid(new); + printk(" for %s path of %s\n", path_name, vnic_name); + for (i = 0; i < 16; i++) + old[i] = new[i]; +} + +static inline void update_ioc_guids(struct path_param *params, + struct netpath *path, + char *vnic_name, char *path_name) +{ + u64 sid; + + if (path->viport->config->ioc_guid == params->ioc_guid) + return; + + printk(KERN_INFO PFX "Changing IOC GUID from 0x%llx to 0x%llx " + "for %s path of %s\n", + __be64_to_cpu(path->viport->config->ioc_guid), + __be64_to_cpu(params->ioc_guid), path_name, vnic_name); + + path->viport->config->ioc_guid = params->ioc_guid; + + sid = (SST_AGN << 56) | (SST_OUI << 32) | (CONTROL_PATH_ID << 8) + | IOC_NUMBER(be64_to_cpu(params->ioc_guid)); + + path->viport->config->control_config.ib_config.service_id = + cpu_to_be64(sid); + + sid = (SST_AGN << 56) | (SST_OUI << 32) | (DATA_PATH_ID << 8) + | IOC_NUMBER(be64_to_cpu(params->ioc_guid)); + + path->viport->config->data_config.ib_config.service_id = + cpu_to_be64(sid); +} + +static inline void update_pkeys(__be16 *old, __be16 *new, char *vnic_name, + char *path_name) +{ + if (*old == *new) + return; + + printk(KERN_INFO PFX "Changing P_Key from 0x%x to 0x%x " + "for %s path of %s\n", *old, *new, + path_name, vnic_name); + *old = *new; +} + +static void update_ioc_strings(struct path_param *params, struct netpath *path, + char *path_name) +{ + if (!strcmp(params->ioc_string, path->viport->config->ioc_string)) + return; + + printk(KERN_INFO PFX "Changing ioc_string to %s for %s path of %s\n", + params->ioc_string, path_name, params->name); + + strcpy(path->viport->config->ioc_string, params->ioc_string); +} + +static void update_path_parameters(struct path_param *params, + struct netpath *path) +{ + update_dgids(path->viport->config->path_info.path.dgid.raw, + params->dgid, params->name, + (path->second_bias ? "secondary" : "primary")); + + update_ioc_guids(params, path, params->name, + (path->second_bias ? "secondary" : "primary")); + + update_pkeys(&path->viport->config->path_info.path.pkey, + ¶ms->pkey, params->name, + (path->second_bias ? "secondary" : "primary")); + + update_ioc_strings(params, path, + (path->second_bias ? "secondary" : "primary")); +} + +static ssize_t update_params_and_connect(struct path_param *params, + struct netpath *path, size_t count) +{ + if (is_dgid_zero(params->dgid) && params->ioc_guid != 0 && + params->pkey != 0) { + + if (!memcmp(path->viport->config->path_info.path.dgid.raw, + params->dgid, 16) && + params->ioc_guid == path->viport->config->ioc_guid && + params->pkey == path->viport->config->path_info.path.pkey) { + + printk(KERN_WARNING PFX "All of the dgid, ioc_guid and " + "pkeys are same as the existing" + " one. Not updating values.\n"); + return -EINVAL; + } else { + if (path->viport->state == VIPORT_CONNECTED) { + printk(KERN_WARNING PFX "%s path of %s " + "interface is already in connected " + "state. Not updating values.\n", + (path->second_bias ? "Secondary" : "Primary"), + path->parent->config->name); + return -EINVAL; + } else { + update_path_parameters(params, path); + viport_kick(path->viport); + vnic_disconnected(path->parent, path); + return count; + } + } + } else { + printk(KERN_WARNING PFX "Either dgid, iocguid, pkey is zero. " + "No update.\n"); + return -EINVAL; + } +} + +ssize_t vnic_create_primary(struct device *dev, + struct device_attribute *dev_attr, const char *buf, + size_t count) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + struct vnic_ib_port *target = + container_of(info, struct vnic_ib_port, pdev_info); + + struct path_param param; + int ret = -EINVAL; + struct vnic *vnic; + struct list_head *ptr; + + param.instance = 0; + param.rx_csum = -1; + param.tx_csum = -1; + param.heartbeat = -1; + param.ib_multicast = -1; + *param.ioc_string = '\0'; + + ret = vnic_parse_options(buf, ¶m); + + if (ret) + goto out; + + list_for_each(ptr, &vnic_list) { + vnic = list_entry(ptr, struct vnic, list_ptrs); + if (!strcmp(vnic->config->name, param.name)) { + ret = update_params_and_connect(¶m, + &vnic->primary_path, + count); + goto out; + } + } + + param.ibdev = target->dev->dev; + param.ibport = target; + param.port = target->port_num; + + vnic = create_vnic(¶m); + if (!vnic) { + printk(KERN_ERR PFX "creating vnic failed\n"); + ret = -EINVAL; + goto out; + } + + if (create_netpath(&vnic->primary_path, ¶m)) { + printk(KERN_ERR PFX "creating primary netpath failed\n"); + goto free_vnic; + } + + if (setup_path_class_files(&vnic->primary_path, "primary_path")) + goto free_vnic; + + if (vnic && !vnic->primary_path.viport) { + printk(KERN_ERR PFX "no valid netpaths\n"); + goto free_vnic; + } + + return count; + +free_vnic: + vnic_free(vnic); + ret = -EINVAL; +out: + return ret; +} + +ssize_t vnic_create_secondary(struct device *dev, + struct device_attribute *dev_attr, + const char *buf, size_t count) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + struct vnic_ib_port *target = + container_of(info, struct vnic_ib_port, pdev_info); + + struct path_param param; + struct vnic *vnic = NULL; + int ret = -EINVAL; + struct list_head *ptr; + int found = 0; + + param.instance = 0; + param.rx_csum = -1; + param.tx_csum = -1; + param.heartbeat = -1; + param.ib_multicast = -1; + *param.ioc_string = '\0'; + + ret = vnic_parse_options(buf, ¶m); + + if (ret) + goto out; + + list_for_each(ptr, &vnic_list) { + vnic = list_entry(ptr, struct vnic, list_ptrs); + if (!strncmp(vnic->config->name, param.name, IFNAMSIZ)) { + if (vnic->secondary_path.viport) { + ret = update_params_and_connect(¶m, + &vnic->secondary_path, + count); + goto out; + } + found = 1; + break; + } + } + + if (!found) { + printk(KERN_ERR PFX + "primary connection with name '%s' does not exist\n", + param.name); + ret = -EINVAL; + goto out; + } + + param.ibdev = target->dev->dev; + param.ibport = target; + param.port = target->port_num; + + if (create_netpath(&vnic->secondary_path, ¶m)) { + printk(KERN_ERR PFX "creating secondary netpath failed\n"); + ret = -EINVAL; + goto out; + } + + if (setup_path_class_files(&vnic->secondary_path, "secondary_path")) + goto free_vnic; + + return count; + +free_vnic: + vnic_free(vnic); + ret = -EINVAL; +out: + return ret; +} diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_sys.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_sys.h new file mode 100644 index 0000000..b41e770 --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_sys.h @@ -0,0 +1,62 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_SYS_H_INCLUDED +#define VNIC_SYS_H_INCLUDED + +struct dev_info { + struct device dev; + struct completion released; +}; + +extern struct class vnic_class; +extern struct dev_info interface_dev; +extern struct attribute_group vnic_dev_attr_group; +extern struct attribute_group vnic_path_attr_group; +extern struct device_attribute dev_attr_create_primary; +extern struct device_attribute dev_attr_create_secondary; +extern struct device_attribute dev_attr_delete_vnic; + +extern void vnic_release_dev(struct device *dev); + +extern ssize_t vnic_create_primary(struct device *dev, + struct device_attribute *dev_attr, + const char *buf, size_t count); + +extern ssize_t vnic_create_secondary(struct device *dev, + struct device_attribute *dev_attr, + const char *buf, size_t count); + +extern ssize_t vnic_delete(struct device *dev, + struct device_attribute *dev_attr, + const char *buf, size_t count); +#endif /*VNIC_SYS_H_INCLUDED*/ From ramachandra.kuchimanchi at qlogic.com Wed Apr 30 10:20:25 2008 From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K) Date: Wed, 30 Apr 2008 22:50:25 +0530 Subject: [ofa-general] [PATCH 09/13] QLogic VNIC: IB Multicast for Ethernet broadcast/multicast In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain> References: <20080430171028.31725.86190.stgit@localhost.localdomain> Message-ID: <20080430172025.31725.97795.stgit@localhost.localdomain> From: Usha Srinivasan Implementation of ethernet broadcasting and multicasting for QLogic VNIC interface by making use of underlying IB multicasting. Signed-off-by: Ramachandra K Signed-off-by: Poornima Kamath Signed-off-by: Amar Mudrankit --- drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.c | 332 +++++++++++++++++++++ drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.h | 76 +++++ 2 files changed, 408 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.c create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.h diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.c b/drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.c new file mode 100644 index 0000000..044d447 --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.c @@ -0,0 +1,332 @@ +/* + * Copyright (c) 2008 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include +#include "vnic_viport.h" +#include "vnic_netpath.h" +#include "vnic_main.h" +#include "vnic_config.h" +#include "vnic_control.h" +#include "vnic_util.h" +#include "vnic_ib.h" + +#define control_ifcfg_name(p)\ + ((p)->parent->parent->parent->config->name) + +#define SET_MCAST_STATE_INVALID \ +do { \ + viport->mc_info.state = MCAST_STATE_INVALID; \ + viport->mc_info.mc = NULL; \ + memset(&viport->mc_info.mgid, 0, sizeof(union ib_gid)); \ +} while (0); + +int vnic_mc_init(struct viport *viport) +{ + MCAST_FUNCTION("vnic_mc_init %p\n", viport); + SET_MCAST_STATE_INVALID; + viport->mc_info.retries = 0; + spin_lock_init(&viport->mc_info.lock); + + return 0; +} + +void vnic_mc_uninit(struct viport *viport) +{ + unsigned long flags; + MCAST_FUNCTION("vnic_mc_uninit %p\n", viport); + + spin_lock_irqsave(&viport->mc_info.lock, flags); + if ((viport->mc_info.state != MCAST_STATE_INVALID) && + (viport->mc_info.state != MCAST_STATE_RETRIED)) { + MCAST_ERROR("%s mcast state is not INVALID or RETRIED %d\n", + control_ifcfg_name(&viport->control), + viport->mc_info.state); + } + spin_unlock_irqrestore(&viport->mc_info.lock, flags); + MCAST_FUNCTION("vnic_mc_uninit done\n"); +} + + +/* This function is called when NEED_MCAST_COMPLETION is set. + * It finishes off the join multicast work. + */ +int vnic_mc_join_handle_completion(struct viport *viport) +{ + unsigned long flags; + unsigned int ret = 0; + + MCAST_FUNCTION("in vnic_mc_join_handle_completion\n"); + spin_lock_irqsave(&viport->mc_info.lock, flags); + if (viport->mc_info.state != MCAST_STATE_JOINING) { + MCAST_ERROR("%s unexpected mcast state in handle_completion: " + " %d\n", control_ifcfg_name(&viport->control), + viport->mc_info.state); + spin_unlock_irqrestore(&viport->mc_info.lock, flags); + return -1; + } + viport->mc_info.state = MCAST_STATE_ATTACHING; + spin_unlock_irqrestore(&viport->mc_info.lock, flags); + MCAST_INFO("%s calling ib_attach_mcast %lx mgid:" + VNIC_GID_FMT " mlid:%x\n", + control_ifcfg_name(&viport->control), jiffies, + VNIC_GID_RAW_ARG(viport->mc_info.mgid.raw), + viport->mc_info.mlid); + ret = ib_attach_mcast(viport->mc_data.ib_conn.qp, &viport->mc_info.mgid, + viport->mc_info.mlid); + if (ret) { + MCAST_ERROR("%s attach mcast qp failed %d\n", + control_ifcfg_name(&viport->control), ret); + return -1; + } + MCAST_INFO("%s attached\n", + control_ifcfg_name(&viport->control)); + spin_lock_irqsave(&viport->mc_info.lock, flags); + viport->mc_info.state = MCAST_STATE_JOINED_ATTACHED; + MCAST_INFO("%s qp attached to mcast group\n", + control_ifcfg_name(&viport->control)); + spin_unlock_irqrestore(&viport->mc_info.lock, flags); + return 0; +} + +/* NOTE: ib_sa.h says "returning a non-zero value from this callback will + * result in destroying the multicast tracking structure. + */ +static int vnic_mc_join_complete(int status, + struct ib_sa_multicast *multicast) +{ + struct viport *viport = (struct viport *)multicast->context; + unsigned long flags; + + MCAST_FUNCTION("in vnic_mc_join_complete status:%x\n", status); + if (status) { + spin_lock_irqsave(&viport->mc_info.lock, flags); + if (status == -ENETRESET) { + SET_MCAST_STATE_INVALID; + viport->mc_info.retries = 0; + spin_unlock_irqrestore(&viport->mc_info.lock, flags); + MCAST_ERROR("%s got ENETRESET what's the right thing " + "to do?\n", + control_ifcfg_name(&viport->control)); + return status; + } + /* perhaps the mcgroup hasn't yet been created - retry */ + viport->mc_info.retries++; + viport->mc_info.mc = NULL; + if (viport->mc_info.retries > MAX_MCAST_JOIN_RETRIES) { + viport->mc_info.state = MCAST_STATE_RETRIED; + spin_unlock_irqrestore(&viport->mc_info.lock, flags); + MCAST_ERROR("%s join failed 0x%x - max retries:%d " + "exceeded\n", + control_ifcfg_name(&viport->control), + status, viport->mc_info.retries); + } else { + viport->mc_info.state = MCAST_STATE_INVALID; + spin_unlock_irqrestore(&viport->mc_info.lock, flags); + spin_lock_irqsave(&viport->lock, flags); + viport->updates |= NEED_MCAST_JOIN; + spin_unlock_irqrestore(&viport->lock, flags); + viport_kick(viport); + MCAST_ERROR("%s join failed 0x%x - retrying; " + "retries:%d\n", + control_ifcfg_name(&viport->control), + status, viport->mc_info.retries); + } + return status; + } + + /* finish join work from main state loop for viport - in case + * the work itself cannot be done in a callback environment */ + spin_lock_irqsave(&viport->lock, flags); + viport->mc_info.mlid = be16_to_cpu(multicast->rec.mlid); + viport->updates |= NEED_MCAST_COMPLETION; + spin_unlock_irqrestore(&viport->lock, flags); + viport_kick(viport); + MCAST_INFO("%s set NEED_MCAST_COMPLETION %x %x\n", + control_ifcfg_name(&viport->control), + multicast->rec.mlid, viport->mc_info.mlid); + return 0; +} + +void vnic_mc_join_setup(struct viport *viport, union ib_gid *mgid) +{ + unsigned long flags; + + MCAST_FUNCTION("in vnic_mc_join_setup\n"); + spin_lock_irqsave(&viport->mc_info.lock, flags); + if (viport->mc_info.state != MCAST_STATE_INVALID) { + if (viport->mc_info.state == MCAST_STATE_DETACHING) { + MCAST_ERROR("%s detach in progress\n", + control_ifcfg_name(&viport->control)); + } else if (viport->mc_info.state == MCAST_STATE_RETRIED) { + MCAST_ERROR("%s max join retries exceeded\n", + control_ifcfg_name(&viport->control)); + } else { + /* join/attach in progress or done */ + /* verify that the current mgid is same as prev mgid */ + if (memcmp(mgid, &viport->mc_info.mgid, sizeof(union ib_gid)) != 0) { + /* Separate MGID for each IOC */ + MCAST_ERROR("%s Multicast Group MGIDs not " + "unique; mgids: " VNIC_GID_FMT + " " VNIC_GID_FMT "\n", + control_ifcfg_name(&viport->control), + VNIC_GID_RAW_ARG(mgid->raw), + VNIC_GID_RAW_ARG(viport->mc_info.mgid.raw)); + } else + MCAST_INFO("%s join already issued: %d\n", + control_ifcfg_name(&viport->control), + viport->mc_info.state); + + } + spin_unlock_irqrestore(&viport->mc_info.lock, flags); + return; + } + viport->mc_info.mgid = *mgid; + spin_unlock_irqrestore(&viport->mc_info.lock, flags); + spin_lock_irqsave(&viport->lock, flags); + viport->updates |= NEED_MCAST_JOIN; + spin_unlock_irqrestore(&viport->lock, flags); + viport_kick(viport); + MCAST_INFO("%s set NEED_MCAST_JOIN \n", + control_ifcfg_name(&viport->control)); +} + +int vnic_mc_join(struct viport *viport) +{ + struct ib_sa_mcmember_rec rec; + ib_sa_comp_mask comp_mask; + unsigned long flags; + + MCAST_FUNCTION("in vnic_mc_join\n"); + if (!viport->mc_data.ib_conn.qp) { + MCAST_ERROR("%s qp is NULL\n", + control_ifcfg_name(&viport->control)); + return -1; + } + spin_lock_irqsave(&viport->mc_info.lock, flags); + if (viport->mc_info.state != MCAST_STATE_INVALID) { + MCAST_INFO("%s join already issued: %d\n", + control_ifcfg_name(&viport->control), + viport->mc_info.state); + spin_unlock_irqrestore(&viport->mc_info.lock, flags); + return 0; + } + viport->mc_info.state = MCAST_STATE_JOINING; + spin_unlock_irqrestore(&viport->mc_info.lock, flags); + + memset(&rec, 0, sizeof(rec)); + rec.join_state = 2; /* bit 1 is Nonmember */ + rec.mgid = viport->mc_info.mgid; + rec.port_gid = viport->config->path_info.path.sgid; + + comp_mask = IB_SA_MCMEMBER_REC_MGID | + IB_SA_MCMEMBER_REC_PORT_GID | + IB_SA_MCMEMBER_REC_JOIN_STATE; + + MCAST_INFO("%s calling ib_sa_join_multicast %lx mgid:" + VNIC_GID_FMT " port_gid: " VNIC_GID_FMT "\n", + control_ifcfg_name(&viport->control), jiffies, + VNIC_GID_RAW_ARG(rec.mgid.raw), + VNIC_GID_RAW_ARG(rec.port_gid.raw)); + + viport->mc_info.mc = ib_sa_join_multicast(&vnic_sa_client, + viport->config->ibdev, viport->config->port, + &rec, comp_mask, GFP_KERNEL, + vnic_mc_join_complete, viport); + + if (IS_ERR(viport->mc_info.mc)) { + MCAST_ERROR("%s ib_sa_join_multicast failed " VNIC_GID_FMT + ".\n", + control_ifcfg_name(&viport->control), + VNIC_GID_RAW_ARG(rec.mgid.raw)); + spin_lock_irqsave(&viport->mc_info.lock, flags); + viport->mc_info.state = MCAST_STATE_INVALID; + spin_unlock_irqrestore(&viport->mc_info.lock, flags); + return -1; + } + MCAST_INFO("%s join issued ib_sa_join_multicast mgid:" + VNIC_GID_FMT " port_gid: " VNIC_GID_FMT "\n", + control_ifcfg_name(&viport->control), + VNIC_GID_RAW_ARG(rec.mgid.raw), + VNIC_GID_RAW_ARG(rec.port_gid.raw)); + + return 0; +} + +void vnic_mc_leave(struct viport *viport) +{ + unsigned long flags; + unsigned int ret; + struct ib_sa_multicast *mc; + + MCAST_FUNCTION("vnic_mc_leave \n"); + + spin_lock_irqsave(&viport->mc_info.lock, flags); + if ((viport->mc_info.state == MCAST_STATE_INVALID) || + (viport->mc_info.state == MCAST_STATE_RETRIED)) { + spin_unlock_irqrestore(&viport->mc_info.lock, flags); + return; + } + + if (viport->mc_info.state == MCAST_STATE_JOINED_ATTACHED) { + + viport->mc_info.state = MCAST_STATE_DETACHING; + spin_unlock_irqrestore(&viport->mc_info.lock, flags); + ret = ib_detach_mcast(viport->mc_data.ib_conn.qp, + &viport->mc_info.mgid, + viport->mc_info.mlid); + if (ret) { + MCAST_ERROR("%s detach failed %d\n", + control_ifcfg_name(&viport->control), ret); + return; + } + MCAST_INFO("%s detached succesfully\n", + control_ifcfg_name(&viport->control)); + spin_lock_irqsave(&viport->mc_info.lock, flags); + } + mc = viport->mc_info.mc; + SET_MCAST_STATE_INVALID; + viport->mc_info.retries = 0; + spin_unlock_irqrestore(&viport->mc_info.lock, flags); + + if (mc) { + MCAST_INFO("%s calling ib_sa_free_multicast\n", + control_ifcfg_name(&viport->control)); + ib_sa_free_multicast(mc); + } + MCAST_FUNCTION("vnic_mc_leave done\n"); + return; +} + + diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.h new file mode 100644 index 0000000..0e5499d --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.h @@ -0,0 +1,76 @@ +/* + * Copyright (c) 2008 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef __VNIC_MULTICAST_H__ +#define __VNIC_MULTTCAST_H__ + +enum { + MCAST_STATE_INVALID = 0x00, /* join not attempted or failed */ + MCAST_STATE_JOINING = 0x01, /* join mcgroup in progress */ + MCAST_STATE_ATTACHING = 0x02, /* join completed with success, + * attach qp to mcgroup in progress + */ + MCAST_STATE_JOINED_ATTACHED = 0x03, /* join completed with success */ + MCAST_STATE_DETACHING = 0x04, /* detach qp in progress */ + MCAST_STATE_RETRIED = 0x05, /* retried join and failed */ +}; + +#define MAX_MCAST_JOIN_RETRIES 5 /* used to retry join */ + +struct mc_info { + u8 state; + spinlock_t lock; + union ib_gid mgid; + u16 mlid; + struct ib_sa_multicast *mc; + u8 retries; +}; + + +int vnic_mc_init(struct viport *viport); +void vnic_mc_uninit(struct viport *viport); + +/* This function is called when a viport gets a multicast mgid from EVIC + and must join the multicast group. It sets up NEED_MCAST_JOIN flag, which + results in vnic_mc_join being called later. */ +void vnic_mc_join_setup(struct viport *viport, union ib_gid *mgid); + +/* This function is called when NEED_MCAST_JOIN flag is set. */ +int vnic_mc_join(struct viport *viport); + +/* This function is called when NEED_MCAST_COMPLETION is set. + It finishes off the join multicast work. */ +int vnic_mc_join_handle_completion(struct viport *viport); + +void vnic_mc_leave(struct viport *viport); + +#endif /* __VNIC_MULTICAST_H__ */ From ramachandra.kuchimanchi at qlogic.com Wed Apr 30 10:20:55 2008 From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K) Date: Wed, 30 Apr 2008 22:50:55 +0530 Subject: [ofa-general] [PATCH 10/13] QLogic VNIC: Driver Statistics collection In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain> References: <20080430171028.31725.86190.stgit@localhost.localdomain> Message-ID: <20080430172055.31725.70663.stgit@localhost.localdomain> From: Amar Mudrankit Collection of statistics about QLogic VNIC interfaces is implemented in this patch. Signed-off-by: Ramachandra K Signed-off-by: Poornima Kamath --- drivers/infiniband/ulp/qlgc_vnic/vnic_stats.c | 234 ++++++++++++ drivers/infiniband/ulp/qlgc_vnic/vnic_stats.h | 497 +++++++++++++++++++++++++ 2 files changed, 731 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_stats.c create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_stats.h diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_stats.c b/drivers/infiniband/ulp/qlgc_vnic/vnic_stats.c new file mode 100644 index 0000000..cebcc26 --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_stats.c @@ -0,0 +1,234 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include + +#include "vnic_main.h" + +cycles_t recv_ref; + +/* + * TODO: Statistics reporting for control path, data path, + * RDMA times, IOs etc + * + */ +static ssize_t show_lifetime(struct device *dev, + struct device_attribute *dev_attr, char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + cycles_t time = get_cycles() - vnic->statistics.start_time; + + return sprintf(buf, "%llu\n", (unsigned long long)time); +} + +static DEVICE_ATTR(lifetime, S_IRUGO, show_lifetime, NULL); + +static ssize_t show_conntime(struct device *dev, + struct device_attribute *dev_attr, char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + + if (vnic->statistics.conn_time) + return sprintf(buf, "%llu\n", + (unsigned long long)vnic->statistics.conn_time); + return 0; +} + +static DEVICE_ATTR(connection_time, S_IRUGO, show_conntime, NULL); + +static ssize_t show_disconnects(struct device *dev, + struct device_attribute *dev_attr, char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + u32 num; + + if (vnic->statistics.disconn_ref) + num = vnic->statistics.disconn_num + 1; + else + num = vnic->statistics.disconn_num; + + return sprintf(buf, "%d\n", num); +} + +static DEVICE_ATTR(disconnects, S_IRUGO, show_disconnects, NULL); + +static ssize_t show_total_disconn_time(struct device *dev, + struct device_attribute *dev_attr, + char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + cycles_t time; + + if (vnic->statistics.disconn_ref) + time = vnic->statistics.disconn_time + + get_cycles() - vnic->statistics.disconn_ref; + else + time = vnic->statistics.disconn_time; + + return sprintf(buf, "%llu\n", (unsigned long long)time); +} + +static DEVICE_ATTR(total_disconn_time, S_IRUGO, show_total_disconn_time, NULL); + +static ssize_t show_carrier_losses(struct device *dev, + struct device_attribute *dev_attr, char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + u32 num; + + if (vnic->statistics.carrier_ref) + num = vnic->statistics.carrier_off_num + 1; + else + num = vnic->statistics.carrier_off_num; + + return sprintf(buf, "%d\n", num); +} + +static DEVICE_ATTR(carrier_losses, S_IRUGO, show_carrier_losses, NULL); + +static ssize_t show_total_carr_loss_time(struct device *dev, + struct device_attribute *dev_attr, + char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + cycles_t time; + + if (vnic->statistics.carrier_ref) + time = vnic->statistics.carrier_off_time + + get_cycles() - vnic->statistics.carrier_ref; + else + time = vnic->statistics.carrier_off_time; + + return sprintf(buf, "%llu\n", (unsigned long long)time); +} + +static DEVICE_ATTR(total_carrier_loss_time, S_IRUGO, + show_total_carr_loss_time, NULL); + +static ssize_t show_total_recv_time(struct device *dev, + struct device_attribute *dev_attr, + char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + + return sprintf(buf, "%llu\n", + (unsigned long long)vnic->statistics.recv_time); +} + +static DEVICE_ATTR(total_recv_time, S_IRUGO, show_total_recv_time, NULL); + +static ssize_t show_recvs(struct device *dev, + struct device_attribute *dev_attr, char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + + return sprintf(buf, "%d\n", vnic->statistics.recv_num); +} + +static DEVICE_ATTR(recvs, S_IRUGO, show_recvs, NULL); + +static ssize_t show_multicast_recvs(struct device *dev, + struct device_attribute *dev_attr, + char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + + return sprintf(buf, "%d\n", vnic->statistics.multicast_recv_num); +} + +static DEVICE_ATTR(multicast_recvs, S_IRUGO, show_multicast_recvs, NULL); + +static ssize_t show_total_xmit_time(struct device *dev, + struct device_attribute *dev_attr, + char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + + return sprintf(buf, "%llu\n", + (unsigned long long)vnic->statistics.xmit_time); +} + +static DEVICE_ATTR(total_xmit_time, S_IRUGO, show_total_xmit_time, NULL); + +static ssize_t show_xmits(struct device *dev, + struct device_attribute *dev_attr, char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + + return sprintf(buf, "%d\n", vnic->statistics.xmit_num); +} + +static DEVICE_ATTR(xmits, S_IRUGO, show_xmits, NULL); + +static ssize_t show_failed_xmits(struct device *dev, + struct device_attribute *dev_attr, char *buf) +{ + struct dev_info *info = container_of(dev, struct dev_info, dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + + return sprintf(buf, "%d\n", vnic->statistics.xmit_fail); +} + +static DEVICE_ATTR(failed_xmits, S_IRUGO, show_failed_xmits, NULL); + +static struct attribute *vnic_stats_attrs[] = { + &dev_attr_lifetime.attr, + &dev_attr_xmits.attr, + &dev_attr_total_xmit_time.attr, + &dev_attr_failed_xmits.attr, + &dev_attr_recvs.attr, + &dev_attr_multicast_recvs.attr, + &dev_attr_total_recv_time.attr, + &dev_attr_connection_time.attr, + &dev_attr_disconnects.attr, + &dev_attr_total_disconn_time.attr, + &dev_attr_carrier_losses.attr, + &dev_attr_total_carrier_loss_time.attr, + NULL +}; + +struct attribute_group vnic_stats_attr_group = { + .attrs = vnic_stats_attrs, +}; diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_stats.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_stats.h new file mode 100644 index 0000000..af77794 --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_stats.h @@ -0,0 +1,497 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_STATS_H_INCLUDED +#define VNIC_STATS_H_INCLUDED + +#include "vnic_main.h" +#include "vnic_ib.h" +#include "vnic_sys.h" + +#ifdef CONFIG_INFINIBAND_QLGC_VNIC_STATS + +static inline void vnic_connected_stats(struct vnic *vnic) +{ + if (vnic->statistics.conn_time == 0) { + vnic->statistics.conn_time = + get_cycles() - vnic->statistics.start_time; + } + + if (vnic->statistics.disconn_ref != 0) { + vnic->statistics.disconn_time += + get_cycles() - vnic->statistics.disconn_ref; + vnic->statistics.disconn_num++; + vnic->statistics.disconn_ref = 0; + } + +} + +static inline void vnic_stop_xmit_stats(struct vnic *vnic) +{ + if (vnic->statistics.xmit_ref == 0) + vnic->statistics.xmit_ref = get_cycles(); +} + +static inline void vnic_restart_xmit_stats(struct vnic *vnic) +{ + if (vnic->statistics.xmit_ref != 0) { + vnic->statistics.xmit_off_time += + get_cycles() - vnic->statistics.xmit_ref; + vnic->statistics.xmit_off_num++; + vnic->statistics.xmit_ref = 0; + } +} + +static inline void vnic_recv_pkt_stats(struct vnic *vnic) +{ + vnic->statistics.recv_time += get_cycles() - recv_ref; + vnic->statistics.recv_num++; +} + +static inline void vnic_multicast_recv_pkt_stats(struct vnic *vnic) +{ + vnic->statistics.multicast_recv_num++; +} + +static inline void vnic_pre_pkt_xmit_stats(cycles_t *time) +{ + *time = get_cycles(); +} + +static inline void vnic_post_pkt_xmit_stats(struct vnic *vnic, + cycles_t time) +{ + vnic->statistics.xmit_time += get_cycles() - time; + vnic->statistics.xmit_num++; + +} + +static inline void vnic_xmit_fail_stats(struct vnic *vnic) +{ + vnic->statistics.xmit_fail++; +} + +static inline void vnic_carrier_loss_stats(struct vnic *vnic) +{ + if (vnic->statistics.carrier_ref != 0) { + vnic->statistics.carrier_off_time += + get_cycles() - vnic->statistics.carrier_ref; + vnic->statistics.carrier_off_num++; + vnic->statistics.carrier_ref = 0; + } +} + +static inline int vnic_setup_stats_files(struct vnic *vnic) +{ + init_completion(&vnic->stat_info.released); + vnic->stat_info.dev.class = NULL; + vnic->stat_info.dev.parent = &vnic->dev_info.dev; + vnic->stat_info.dev.release = vnic_release_dev; + snprintf(vnic->stat_info.dev.bus_id, BUS_ID_SIZE, + "stats"); + + if (device_register(&vnic->stat_info.dev)) { + SYS_ERROR("create_vnic: error in registering" + " stat class dev\n"); + goto stats_out; + } + + if (sysfs_create_group(&vnic->stat_info.dev.kobj, + &vnic_stats_attr_group)) + goto err_stats_file; + + return 0; +err_stats_file: + device_unregister(&vnic->stat_info.dev); + wait_for_completion(&vnic->stat_info.released); +stats_out: + return -1; +} + +static inline void vnic_cleanup_stats_files(struct vnic *vnic) +{ + sysfs_remove_group(&vnic->dev_info.dev.kobj, + &vnic_stats_attr_group); + device_unregister(&vnic->stat_info.dev); + wait_for_completion(&vnic->stat_info.released); +} + +static inline void vnic_disconn_stats(struct vnic *vnic) +{ + if (!vnic->statistics.disconn_ref) + vnic->statistics.disconn_ref = get_cycles(); + + if (vnic->statistics.carrier_ref == 0) + vnic->statistics.carrier_ref = get_cycles(); +} + +static inline void vnic_alloc_stats(struct vnic *vnic) +{ + vnic->statistics.start_time = get_cycles(); +} + +static inline void control_note_rsptime_stats(cycles_t *time) +{ + *time = get_cycles(); +} + +static inline void control_update_rsptime_stats(struct control *control, + cycles_t response_time) +{ + response_time -= control->statistics.request_time; + control->statistics.response_time += response_time; + control->statistics.response_num++; + if (control->statistics.response_max < response_time) + control->statistics.response_max = response_time; + if ((control->statistics.response_min == 0) || + (control->statistics.response_min > response_time)) + control->statistics.response_min = response_time; + +} + +static inline void control_note_reqtime_stats(struct control *control) +{ + control->statistics.request_time = get_cycles(); +} + +static inline void control_timeout_stats(struct control *control) +{ + control->statistics.timeout_num++; +} + +static inline void data_kickreq_stats(struct data *data) +{ + data->statistics.kick_reqs++; +} + +static inline void data_no_xmitbuf_stats(struct data *data) +{ + data->statistics.no_xmit_bufs++; +} + +static inline void data_xmits_stats(struct data *data) +{ + data->statistics.xmit_num++; +} + +static inline void data_recvs_stats(struct data *data) +{ + data->statistics.recv_num++; +} + +static inline void data_note_kickrcv_time(void) +{ + recv_ref = get_cycles(); +} + +static inline void data_rcvkicks_stats(struct data *data) +{ + data->statistics.kick_recvs++; +} + + +static inline void vnic_ib_conntime_stats(struct vnic_ib_conn *ib_conn) +{ + ib_conn->statistics.connection_time = get_cycles(); +} + +static inline void vnic_ib_note_comptime_stats(cycles_t *time) +{ + *time = get_cycles(); +} + +static inline void vnic_ib_callback_stats(struct vnic_ib_conn *ib_conn) +{ + ib_conn->statistics.num_callbacks++; +} + +static inline void vnic_ib_comp_stats(struct vnic_ib_conn *ib_conn, + u32 *comp_num) +{ + ib_conn->statistics.num_ios++; + *comp_num = *comp_num + 1; + +} + +static inline void vnic_ib_io_stats(struct io *io, + struct vnic_ib_conn *ib_conn, + cycles_t comp_time) +{ + if ((io->type == RECV) || (io->type == RECV_UD)) + io->time = comp_time; + else if (io->type == RDMA) { + ib_conn->statistics.rdma_comp_time += comp_time - io->time; + ib_conn->statistics.rdma_comp_ios++; + } else if (io->type == SEND) { + ib_conn->statistics.send_comp_time += comp_time - io->time; + ib_conn->statistics.send_comp_ios++; + } +} + +static inline void vnic_ib_maxio_stats(struct vnic_ib_conn *ib_conn, + u32 comp_num) +{ + if (comp_num > ib_conn->statistics.max_ios) + ib_conn->statistics.max_ios = comp_num; +} + +static inline void vnic_ib_connected_time_stats(struct vnic_ib_conn *ib_conn) +{ + ib_conn->statistics.connection_time = + get_cycles() - ib_conn->statistics.connection_time; + +} + +static inline void vnic_ib_pre_rcvpost_stats(struct vnic_ib_conn *ib_conn, + struct io *io, + cycles_t *time) +{ + *time = get_cycles(); + if (io->time != 0) { + ib_conn->statistics.recv_comp_time += *time - io->time; + ib_conn->statistics.recv_comp_ios++; + } + +} + +static inline void vnic_ib_post_rcvpost_stats(struct vnic_ib_conn *ib_conn, + cycles_t time) +{ + ib_conn->statistics.recv_post_time += get_cycles() - time; + ib_conn->statistics.recv_post_ios++; +} + +static inline void vnic_ib_pre_sendpost_stats(struct io *io, + cycles_t *time) +{ + io->time = *time = get_cycles(); +} + +static inline void vnic_ib_post_sendpost_stats(struct vnic_ib_conn *ib_conn, + struct io *io, + cycles_t time) +{ + time = get_cycles() - time; + if (io->swr.opcode == IB_WR_RDMA_WRITE) { + ib_conn->statistics.rdma_post_time += time; + ib_conn->statistics.rdma_post_ios++; + } else { + ib_conn->statistics.send_post_time += time; + ib_conn->statistics.send_post_ios++; + } +} +#else /*CONFIG_INIFINIBAND_VNIC_STATS*/ + +static inline void vnic_connected_stats(struct vnic *vnic) +{ + ; +} + +static inline void vnic_stop_xmit_stats(struct vnic *vnic) +{ + ; +} + +static inline void vnic_restart_xmit_stats(struct vnic *vnic) +{ + ; +} + +static inline void vnic_recv_pkt_stats(struct vnic *vnic) +{ + ; +} + +static inline void vnic_multicast_recv_pkt_stats(struct vnic *vnic) +{ + ; +} + +static inline void vnic_pre_pkt_xmit_stats(cycles_t *time) +{ + ; +} + +static inline void vnic_post_pkt_xmit_stats(struct vnic *vnic, + cycles_t time) +{ + ; +} + +static inline void vnic_xmit_fail_stats(struct vnic *vnic) +{ + ; +} + +static inline int vnic_setup_stats_files(struct vnic *vnic) +{ + return 0; +} + +static inline void vnic_cleanup_stats_files(struct vnic *vnic) +{ + ; +} + +static inline void vnic_carrier_loss_stats(struct vnic *vnic) +{ + ; +} + +static inline void vnic_disconn_stats(struct vnic *vnic) +{ + ; +} + +static inline void vnic_alloc_stats(struct vnic *vnic) +{ + ; +} + +static inline void control_note_rsptime_stats(cycles_t *time) +{ + ; +} + +static inline void control_update_rsptime_stats(struct control *control, + cycles_t response_time) +{ + ; +} + +static inline void control_note_reqtime_stats(struct control *control) +{ + ; +} + +static inline void control_timeout_stats(struct control *control) +{ + ; +} + +static inline void data_kickreq_stats(struct data *data) +{ + ; +} + +static inline void data_no_xmitbuf_stats(struct data *data) +{ + ; +} + +static inline void data_xmits_stats(struct data *data) +{ + ; +} + +static inline void data_recvs_stats(struct data *data) +{ + ; +} + +static inline void data_note_kickrcv_time(void) +{ + ; +} + +static inline void data_rcvkicks_stats(struct data *data) +{ + ; +} + +static inline void vnic_ib_conntime_stats(struct vnic_ib_conn *ib_conn) +{ + ; +} + +static inline void vnic_ib_note_comptime_stats(cycles_t *time) +{ + ; +} + +static inline void vnic_ib_callback_stats(struct vnic_ib_conn *ib_conn) + +{ + ; +} +static inline void vnic_ib_comp_stats(struct vnic_ib_conn *ib_conn, + u32 *comp_num) +{ + ; +} + +static inline void vnic_ib_io_stats(struct io *io, + struct vnic_ib_conn *ib_conn, + cycles_t comp_time) +{ + ; +} + +static inline void vnic_ib_maxio_stats(struct vnic_ib_conn *ib_conn, + u32 comp_num) +{ + ; +} + +static inline void vnic_ib_connected_time_stats(struct vnic_ib_conn *ib_conn) +{ + ; +} + +static inline void vnic_ib_pre_rcvpost_stats(struct vnic_ib_conn *ib_conn, + struct io *io, + cycles_t *time) +{ + ; +} + +static inline void vnic_ib_post_rcvpost_stats(struct vnic_ib_conn *ib_conn, + cycles_t time) +{ + ; +} + +static inline void vnic_ib_pre_sendpost_stats(struct io *io, + cycles_t *time) +{ + ; +} + +static inline void vnic_ib_post_sendpost_stats(struct vnic_ib_conn *ib_conn, + struct io *io, + cycles_t time) +{ + ; +} +#endif /*CONFIG_INIFINIBAND_VNIC_STATS*/ + +#endif /*VNIC_STATS_H_INCLUDED*/ From ramachandra.kuchimanchi at qlogic.com Wed Apr 30 10:21:26 2008 From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K) Date: Wed, 30 Apr 2008 22:51:26 +0530 Subject: [ofa-general] [PATCH 11/13] QLogic VNIC: Driver utility file - implements various utility macros In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain> References: <20080430171028.31725.86190.stgit@localhost.localdomain> Message-ID: <20080430172126.31725.48554.stgit@localhost.localdomain> From: Poornima Kamath This patch adds the driver utility file which mainly contains utility macros for debugging of QLogic VNIC driver. Signed-off-by: Ramachandra K Signed-off-by: Amar Mudrankit --- drivers/infiniband/ulp/qlgc_vnic/vnic_util.h | 251 ++++++++++++++++++++++++++ 1 files changed, 251 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_util.h diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_util.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_util.h new file mode 100644 index 0000000..4d7d540 --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_util.h @@ -0,0 +1,251 @@ +/* + * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_UTIL_H_INCLUDED +#define VNIC_UTIL_H_INCLUDED + +#define MODULE_NAME "QLGC_VNIC" + +#define VNIC_MAJORVERSION 1 +#define VNIC_MINORVERSION 1 + +#define is_power_of2(value) (((value) & ((value - 1))) == 0) +#define ALIGN_DOWN(x, a) ((x)&(~((a)-1))) + +extern u32 vnic_debug; + +enum { + DEBUG_IB_INFO = 0x00000001, + DEBUG_IB_FUNCTION = 0x00000002, + DEBUG_IB_FSTATUS = 0x00000004, + DEBUG_IB_ASSERTS = 0x00000008, + DEBUG_CONTROL_INFO = 0x00000010, + DEBUG_CONTROL_FUNCTION = 0x00000020, + DEBUG_CONTROL_PACKET = 0x00000040, + DEBUG_CONFIG_INFO = 0x00000100, + DEBUG_DATA_INFO = 0x00001000, + DEBUG_DATA_FUNCTION = 0x00002000, + DEBUG_NETPATH_INFO = 0x00010000, + DEBUG_VIPORT_INFO = 0x00100000, + DEBUG_VIPORT_FUNCTION = 0x00200000, + DEBUG_LINK_STATE = 0x00400000, + DEBUG_VNIC_INFO = 0x01000000, + DEBUG_VNIC_FUNCTION = 0x02000000, + DEBUG_MCAST_INFO = 0x04000000, + DEBUG_MCAST_FUNCTION = 0x08000000, + DEBUG_SYS_INFO = 0x10000000, + DEBUG_SYS_VERBOSE = 0x40000000 +}; + +#ifdef CONFIG_INFINIBAND_QLGC_VNIC_DEBUG +#define PRINT(level, x, fmt, arg...) \ + printk(level "%s: %s: %s, line %d: " fmt, \ + MODULE_NAME, x, __FILE__, __LINE__, ##arg) + +#define PRINT_CONDITIONAL(level, x, condition, fmt, arg...) \ + do { \ + if (condition) \ + printk(level "%s: %s: %s, line %d: " fmt, \ + MODULE_NAME, x, __FILE__, __LINE__, \ + ##arg); \ + } while (0) +#else +#define PRINT(level, x, fmt, arg...) \ + printk(level "%s: " fmt, MODULE_NAME, ##arg) + +#define PRINT_CONDITIONAL(level, x, condition, fmt, arg...) \ + do { \ + if (condition) \ + printk(level "%s: %s: " fmt, \ + MODULE_NAME, x, ##arg); \ + } while (0) +#endif /*CONFIG_INFINIBAND_QLGC_VNIC_DEBUG*/ + +#define IB_PRINT(fmt, arg...) \ + PRINT(KERN_INFO, "IB", fmt, ##arg) +#define IB_ERROR(fmt, arg...) \ + PRINT(KERN_ERR, "IB", fmt, ##arg) + +#define IB_FUNCTION(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "IB", \ + (vnic_debug & DEBUG_IB_FUNCTION), \ + fmt, ##arg) + +#define IB_INFO(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "IB", \ + (vnic_debug & DEBUG_IB_INFO), \ + fmt, ##arg) + +#define IB_ASSERT(x) \ + do { \ + if ((vnic_debug & DEBUG_IB_ASSERTS) && !(x)) \ + panic("%s assertion failed, file: %s," \ + " line %d: ", \ + MODULE_NAME, __FILE__, __LINE__) \ + } while (0) + +#define CONTROL_PRINT(fmt, arg...) \ + PRINT(KERN_INFO, "CONTROL", fmt, ##arg) +#define CONTROL_ERROR(fmt, arg...) \ + PRINT(KERN_ERR, "CONTROL", fmt, ##arg) + +#define CONTROL_INFO(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "CONTROL", \ + (vnic_debug & DEBUG_CONTROL_INFO), \ + fmt, ##arg) + +#define CONTROL_FUNCTION(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "CONTROL", \ + (vnic_debug & DEBUG_CONTROL_FUNCTION), \ + fmt, ##arg) + +#define CONTROL_PACKET(pkt) \ + do { \ + if (vnic_debug & DEBUG_CONTROL_PACKET) \ + control_log_control_packet(pkt); \ + } while (0) + +#define CONFIG_PRINT(fmt, arg...) \ + PRINT(KERN_INFO, "CONFIG", fmt, ##arg) +#define CONFIG_ERROR(fmt, arg...) \ + PRINT(KERN_ERR, "CONFIG", fmt, ##arg) + +#define CONFIG_INFO(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "CONFIG", \ + (vnic_debug & DEBUG_CONFIG_INFO), \ + fmt, ##arg) + +#define DATA_PRINT(fmt, arg...) \ + PRINT(KERN_INFO, "DATA", fmt, ##arg) +#define DATA_ERROR(fmt, arg...) \ + PRINT(KERN_ERR, "DATA", fmt, ##arg) + +#define DATA_INFO(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "DATA", \ + (vnic_debug & DEBUG_DATA_INFO), \ + fmt, ##arg) + +#define DATA_FUNCTION(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "DATA", \ + (vnic_debug & DEBUG_DATA_FUNCTION), \ + fmt, ##arg) + + +#define MCAST_PRINT(fmt, arg...) \ + PRINT(KERN_INFO, "MCAST", fmt, ##arg) +#define MCAST_ERROR(fmt, arg...) \ + PRINT(KERN_ERR, "MCAST", fmt, ##arg) + +#define MCAST_INFO(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "MCAST", \ + (vnic_debug & DEBUG_MCAST_INFO), \ + fmt, ##arg) + +#define MCAST_FUNCTION(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "MCAST", \ + (vnic_debug & DEBUG_MCAST_FUNCTION), \ + fmt, ##arg) + +#define NETPATH_PRINT(fmt, arg...) \ + PRINT(KERN_INFO, "NETPATH", fmt, ##arg) +#define NETPATH_ERROR(fmt, arg...) \ + PRINT(KERN_ERR, "NETPATH", fmt, ##arg) + +#define NETPATH_INFO(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "NETPATH", \ + (vnic_debug & DEBUG_NETPATH_INFO), \ + fmt, ##arg) + +#define VIPORT_PRINT(fmt, arg...) \ + PRINT(KERN_INFO, "VIPORT", fmt, ##arg) +#define VIPORT_ERROR(fmt, arg...) \ + PRINT(KERN_ERR, "VIPORT", fmt, ##arg) + +#define VIPORT_INFO(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "VIPORT", \ + (vnic_debug & DEBUG_VIPORT_INFO), \ + fmt, ##arg) + +#define VIPORT_FUNCTION(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "VIPORT", \ + (vnic_debug & DEBUG_VIPORT_FUNCTION), \ + fmt, ##arg) + +#define LINK_STATE(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "LINK", \ + (vnic_debug & DEBUG_LINK_STATE), \ + fmt, ##arg) + +#define VNIC_PRINT(fmt, arg...) \ + PRINT(KERN_INFO, "NIC", fmt, ##arg) +#define VNIC_ERROR(fmt, arg...) \ + PRINT(KERN_ERR, "NIC", fmt, ##arg) +#define VNIC_INIT(fmt, arg...) \ + PRINT(KERN_INFO, "NIC", fmt, ##arg) + +#define VNIC_INFO(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "NIC", \ + (vnic_debug & DEBUG_VNIC_INFO), \ + fmt, ##arg) + +#define VNIC_FUNCTION(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "NIC", \ + (vnic_debug & DEBUG_VNIC_FUNCTION), \ + fmt, ##arg) + +#define SYS_PRINT(fmt, arg...) \ + PRINT(KERN_INFO, "SYS", fmt, ##arg) +#define SYS_ERROR(fmt, arg...) \ + PRINT(KERN_ERR, "SYS", fmt, ##arg) + +#define SYS_INFO(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "SYS", \ + (vnic_debug & DEBUG_SYS_INFO), \ + fmt, ##arg) + +#endif /* VNIC_UTIL_H_INCLUDED */ From ramachandra.kuchimanchi at qlogic.com Wed Apr 30 10:21:56 2008 From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K) Date: Wed, 30 Apr 2008 22:51:56 +0530 Subject: [ofa-general] [PATCH 12/13] QLogic VNIC: Driver Kconfig and Makefile. In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain> References: <20080430171028.31725.86190.stgit@localhost.localdomain> Message-ID: <20080430172156.31725.94843.stgit@localhost.localdomain> From: Ramachandra K Kconfig and Makefile for the QLogic VNIC driver. Signed-off-by: Poornima Kamath Signed-off-by: Amar Mudrankit --- drivers/infiniband/ulp/qlgc_vnic/Kconfig | 28 ++++++++++++++++++++++++++++ drivers/infiniband/ulp/qlgc_vnic/Makefile | 13 +++++++++++++ 2 files changed, 41 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/ulp/qlgc_vnic/Kconfig create mode 100644 drivers/infiniband/ulp/qlgc_vnic/Makefile diff --git a/drivers/infiniband/ulp/qlgc_vnic/Kconfig b/drivers/infiniband/ulp/qlgc_vnic/Kconfig new file mode 100644 index 0000000..6a08770 --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/Kconfig @@ -0,0 +1,28 @@ +config INFINIBAND_QLGC_VNIC + tristate "QLogic VNIC - Support for QLogic Ethernet Virtual I/O Controller" + depends on INFINIBAND && NETDEVICES && INET + ---help--- + Support for the QLogic Ethernet Virtual I/O Controller + (EVIC). In conjunction with the EVIC, this provides virtual + ethernet interfaces and transports ethernet packets over + InfiniBand so that you can communicate with Ethernet networks + using your IB device. + +config INFINIBAND_QLGC_VNIC_DEBUG + bool "QLogic VNIC Verbose debugging" + depends on INFINIBAND_QLGC_VNIC + default n + ---help--- + This option causes verbose debugging code to be compiled + into the QLogic VNIC driver. The output can be turned on via the + vnic_debug module parameter. + +config INFINIBAND_QLGC_VNIC_STATS + bool "QLogic VNIC Statistics" + depends on INFINIBAND_QLGC_VNIC + default n + ---help--- + This option compiles statistics collecting code into the + data path of the QLogic VNIC driver to help in profiling and fine + tuning. This adds some overhead in the interest of gathering + data. diff --git a/drivers/infiniband/ulp/qlgc_vnic/Makefile b/drivers/infiniband/ulp/qlgc_vnic/Makefile new file mode 100644 index 0000000..509dd67 --- /dev/null +++ b/drivers/infiniband/ulp/qlgc_vnic/Makefile @@ -0,0 +1,13 @@ +obj-$(CONFIG_INFINIBAND_QLGC_VNIC) += qlgc_vnic.o + +qlgc_vnic-y := vnic_main.o \ + vnic_ib.o \ + vnic_viport.o \ + vnic_control.o \ + vnic_data.o \ + vnic_netpath.o \ + vnic_config.o \ + vnic_sys.o \ + vnic_multicast.o + +qlgc_vnic-$(CONFIG_INFINIBAND_QLGC_VNIC_STATS) += vnic_stats.o From ramachandra.kuchimanchi at qlogic.com Wed Apr 30 10:22:26 2008 From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K) Date: Wed, 30 Apr 2008 22:52:26 +0530 Subject: [ofa-general] [PATCH 13/13] QLogic VNIC: Modifications to IB Kconfig and Makefile In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain> References: <20080430171028.31725.86190.stgit@localhost.localdomain> Message-ID: <20080430172226.31725.57890.stgit@localhost.localdomain> From: Ramachandra K This patch modifies the toplevel Infiniband Kconfig and Makefile to include QLogic VNIC as new ULP. Signed-off-by: Poornima Kamath Signed-off-by: Amar Mudrankit --- drivers/infiniband/Kconfig | 2 ++ drivers/infiniband/Makefile | 1 + 2 files changed, 3 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index a5dc78a..0775df5 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -53,4 +53,6 @@ source "drivers/infiniband/ulp/srp/Kconfig" source "drivers/infiniband/ulp/iser/Kconfig" +source "drivers/infiniband/ulp/qlgc_vnic/Kconfig" + endif # INFINIBAND diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile index ed35e44..845271e 100644 --- a/drivers/infiniband/Makefile +++ b/drivers/infiniband/Makefile @@ -9,3 +9,4 @@ obj-$(CONFIG_INFINIBAND_NES) += hw/nes/ obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/ obj-$(CONFIG_INFINIBAND_SRP) += ulp/srp/ obj-$(CONFIG_INFINIBAND_ISER) += ulp/iser/ +obj-$(CONFIG_INFINIBAND_QLGC_VNIC) += ulp/qlgc_vnic/ From eli at dev.mellanox.co.il Wed Apr 30 10:39:16 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Wed, 30 Apr 2008 20:39:16 +0300 Subject: [ofa-general] [PATCH] IB/ipoib: fix net queue lockup Message-ID: <1209577156.1790.11.camel@mtls03> >From 1644c62982335b5cf67300ccba2533016e240d6a Mon Sep 17 00:00:00 2001 From: Eli Cohen Date: Wed, 30 Apr 2008 20:37:31 +0300 Subject: [PATCH] IB/ipoib: fix net queue lockup Fix the lockup of the net queue introduced in the split CQ patch. The idea is to arm the send CQ just before posting the last send request to the QP. When the completion handler is called, drain the CQ. Since not all the CQEs might already be in the CQ, verify that the the net queue has been woken up. If not arm a timer and drain again at the timer function. In order to reduce the number of cases the queue is stopped we should use a larger tx queue. Roland, we haves seen a few other cases where a large tx queue is needed. I think we should choose a larger default value than the current 64. How about 256? --- drivers/infiniband/ulp/ipoib/ipoib.h | 2 + drivers/infiniband/ulp/ipoib/ipoib_ib.c | 47 +++++++++++++++++++++++++--- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 3 +- 3 files changed, 46 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 9044f88..b46baf2 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -334,6 +334,7 @@ struct ipoib_dev_priv { #endif int hca_caps; struct ipoib_ethtool_st ethtool; + struct timer_list poll_timer; }; struct ipoib_ah { @@ -404,6 +405,7 @@ extern struct workqueue_struct *ipoib_workqueue; int ipoib_poll(struct napi_struct *napi, int budget); void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr); +void send_comp_handler(struct ib_cq *cq, void *dev_ptr); struct ipoib_ah *ipoib_create_ah(struct net_device *dev, struct ib_pd *pd, struct ib_ah_attr *attr); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 97b815c..e620a90 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -461,6 +461,26 @@ void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr) netif_rx_schedule(dev, &priv->napi); } +static void drain_tx_cq(struct net_device *dev) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + unsigned long flags; + + spin_lock_irqsave(&priv->tx_lock, flags); + while(poll_tx(priv)) + ; /* nothing */ + + if (netif_queue_stopped(dev)) + mod_timer(&priv->poll_timer, jiffies + 1); + + spin_unlock_irqrestore(&priv->tx_lock, flags); +} + +void send_comp_handler(struct ib_cq *cq, void *dev_ptr) +{ + drain_tx_cq((struct net_device *)dev_ptr); +} + static inline int post_send(struct ipoib_dev_priv *priv, unsigned int wr_id, struct ib_ah *address, u32 qpn, @@ -555,12 +575,22 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, else priv->tx_wr.send_flags &= ~IB_SEND_IP_CSUM; + if (++priv->tx_outstanding == ipoib_sendq_size) { + ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n"); + if (ib_req_notify_cq(priv->send_cq, IB_CQ_NEXT_COMP)) + ipoib_warn(priv, "request notify on send queue failed\n"); + netif_stop_queue(dev); + } + if (unlikely(post_send(priv, priv->tx_head & (ipoib_sendq_size - 1), address->ah, qpn, tx_req, phead, hlen))) { ipoib_warn(priv, "post_send failed\n"); ++dev->stats.tx_errors; + --priv->tx_outstanding; ipoib_dma_unmap_tx(priv->ca, tx_req); dev_kfree_skb_any(skb); + if (netif_queue_stopped(dev)) + netif_wake_queue(dev); } else { dev->trans_start = jiffies; @@ -568,14 +598,11 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, ++priv->tx_head; skb_orphan(skb); - if (++priv->tx_outstanding == ipoib_sendq_size) { - ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n"); - netif_stop_queue(dev); - } } if (unlikely(priv->tx_outstanding > MAX_SEND_CQE)) - poll_tx(priv); + while(poll_tx(priv)) + ; /* nothing */ } static void __ipoib_reap_ah(struct net_device *dev) @@ -609,6 +636,11 @@ void ipoib_reap_ah(struct work_struct *work) round_jiffies_relative(HZ)); } +static void ipoib_ib_tx_timer_func(unsigned long ctx) +{ + drain_tx_cq((struct net_device *)ctx); +} + int ipoib_ib_dev_open(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -645,6 +677,10 @@ int ipoib_ib_dev_open(struct net_device *dev) queue_delayed_work(ipoib_workqueue, &priv->ah_reap_task, round_jiffies_relative(HZ)); + init_timer(&priv->poll_timer); + priv->poll_timer.function = ipoib_ib_tx_timer_func; + priv->poll_timer.data = (unsigned long)dev; + set_bit(IPOIB_FLAG_INITIALIZED, &priv->flags); return 0; @@ -810,6 +846,7 @@ int ipoib_ib_dev_stop(struct net_device *dev, int flush) ipoib_dbg(priv, "All sends and receives done.\n"); timeout: + del_timer_sync(&priv->poll_timer); qp_attr.qp_state = IB_QPS_RESET; if (ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE)) ipoib_warn(priv, "Failed to modify QP to RESET state\n"); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c index c1e7ece..706384d 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -187,7 +187,8 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca) goto out_free_mr; } - priv->send_cq = ib_create_cq(priv->ca, NULL, NULL, dev, ipoib_sendq_size, 0); + priv->send_cq = ib_create_cq(priv->ca, send_comp_handler, NULL, dev, + ipoib_sendq_size, 0); if (IS_ERR(priv->send_cq)) { printk(KERN_WARNING "%s: failed to create send CQ\n", ca->name); goto out_free_recv_cq; -- 1.5.5 From swise at opengridcomputing.com Wed Apr 30 11:28:55 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 30 Apr 2008 13:28:55 -0500 Subject: [ofa-general] Re: iwarp-specific async events In-Reply-To: References: <4818A243.1090201@opengridcomputing.com> Message-ID: <4818BA67.9000201@opengridcomputing.com> Roland Dreier wrote: > > I'm looking for a good way to trigger iwarp QP flushing on a normal > > disconnect for user mode QPs. The async event notification provider > > ops function is one way I can do it easily with the currently > > infrastructure, if we add some new event types. For example, if a > > fatal error occurs on a QP which causes the connection to be aborted, > > then the kernel driver will mark the user qp as "in error" and post a > > FATAL_QP event. When the app reaps that event, the libcxgb3 async > > event ops function will flush the user's qp. However for a normal non > > fatal close, no async event is posted. But one should be. The iWARP > > verbs specify many async event types that I think we need to add at > > some point. Case in point: > > > > LLP Close Complete (qp event) - The TCP connection completed and no > > SQ WQEs were flushed (normal close) > > Yeah, it makes sense just to add any iWARP events that make sense and > don't fit the existing set of IB events. We already have IB-specific > stuff for path migration etc. > > > There is a whole slew of other events. The above event, however, is > > key in that libcxgb3 could trigger a qp flush when this event is > > reaped by the application. Currently, the flushing of the QP is only > > triggered by fatal connections errors as described above and/or if the > > application tries to post on a QP that has been marked in error by the > > kernel. However, If the app does neither, then the flush never > > happens. > > On the other hand, how does cxgb3 know when an application has reaped > the event? Do we need to add code to the uverbs module to know when an > async event has reached userspace? > > I meant libcxgb3, not the kernel modules. The kernel driver knows the connection went down and the qp needs flushing. That's who posted the async event. The driver just needs a way to kick the library to do the flush because the kernel driver doesn't cannot touch the user structs (without painful synchronization). So the library will discover this when the app reaps the async event via the context ops async_event function that libcxgb3 registers. Steve. From akepner at sgi.com Wed Apr 30 12:23:54 2008 From: akepner at sgi.com (akepner at sgi.com) Date: Wed, 30 Apr 2008 12:23:54 -0700 Subject: [ofa-general] IPoIB-UD TX timeouts (OFED 1.2) Message-ID: <20080430192354.GG26724@sgi.com> At a customer site running OFED 1.2 we are seeing the following - after ~10s of hours of stressing IPoIB, the card apparently stops generating TX completions. (These are MT25204 cards in x86_64 boxes, and we've seen this with a couple f/w versions, including the latest.) We get something like: kernel: NETDEV WATCHDOG: ib0: transmit timed out kernel: ib0: transmit timeout: latency 1972 msecs kernel: ib0: queue stopped 1, tx_head 3271, tx_tail 3207 and that repeats "forever". And to simplify things, we can produce this behavior in datagram mode. As long as only datagram mode is in use, the TX code in the IPoIB driver seems quite straightforward. The only reason I can imagine that we'd fail to get a timely TX completion would be if link-level flow control were to throttle us. And I'd expect that to be a transient condition... Am I ovelooking something? Anyone seen similar? Suggestions for debugging? -- Arthur From liranl at mellanox.co.il Wed Apr 30 12:56:28 2008 From: liranl at mellanox.co.il (Liran Liss) Date: Wed, 30 Apr 2008 22:56:28 +0300 Subject: [ofa-general][PATCH] Re: mlx4: Completion EQ per cpu (MP support, Patch 10) In-Reply-To: Message-ID: <40FA0A8088E8A441973D37502F00933E3A24@mtlexch01.mtl.com> > > I would just like to see an approach that is fully thought through and > gives a way for applications/kernel drivers to choose a CQ vector based > on some information about what CPU it will go to. > Isn't the decision of which CPU an MSI-X is routed to (and hence, to which CPI an EQ is bound to) determined by userspace? (either by the irq balancer process or by manually setting /proc/irq//smp_affinity)? I am not sure we aren't better off leaving this to user-space: both application and interrupt affinity are administrative tasks. We can also use installation scripts to set a "default" configuration in which vector 0 is bound to cpu0, vector 1 is bound to cpu1, etc. > If we want to add a way to allow a request for round-robin, that is > fine, but I don't think we want to change the default to round-robin, > unless someone can come up with a workload where it actually helps. Several IPoIB partitions can easily saturate a single core if their Rx interrupts are not handled by several CPUs. This is not any different from multiple Ethernet NICs whose interrupts are balanced today by the irq balancer. We can argue that IPoIB can use a special "round-robin" vector while leaving the default vector fixed to a single EQ. However, there is essentially no difference between IPoIB and other IB ULPs: an IB HCA is actually a platform for other services, each with its own queues that are directly accessed by HW, each with its own CQs and interrupt moderation. Putting all these ULPs on a single EQ will prevent interrupt balancing. What are we risking in making the default action to spread interrupts? --Liran From eli at dev.mellanox.co.il Wed Apr 30 13:00:55 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Wed, 30 Apr 2008 23:00:55 +0300 Subject: [ofa-general] IPoIB-UD TX timeouts (OFED 1.2) In-Reply-To: <20080430192354.GG26724@sgi.com> References: <20080430192354.GG26724@sgi.com> Message-ID: <4e6a6b3c0804301300q57b4b562r854e337ff8706222@mail.gmail.com> Artur, when it happens please: 1. Check the link error counters. 2. Disconnect and reconnect the cable and see if it recovers. On 4/30/08, akepner at sgi.com wrote: > > At a customer site running OFED 1.2 we are seeing the > following - after ~10s of hours of stressing IPoIB, > the card apparently stops generating TX completions. > (These are MT25204 cards in x86_64 boxes, and we've seen > this with a couple f/w versions, including the latest.) > > We get something like: > > kernel: NETDEV WATCHDOG: ib0: transmit timed out > kernel: ib0: transmit timeout: latency 1972 msecs > kernel: ib0: queue stopped 1, tx_head 3271, tx_tail 3207 > > and that repeats "forever". > > And to simplify things, we can produce this behavior in > datagram mode. > > As long as only datagram mode is in use, the TX code in the > IPoIB driver seems quite straightforward. The only reason I > can imagine that we'd fail to get a timely TX completion > would be if link-level flow control were to throttle us. And > I'd expect that to be a transient condition... Am I > ovelooking something? Anyone seen similar? Suggestions for > debugging? > > -- > Arthur > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From eli at dev.mellanox.co.il Wed Apr 30 13:02:34 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Wed, 30 Apr 2008 23:02:34 +0300 Subject: [ofa-general] IPoIB-UD TX timeouts (OFED 1.2) In-Reply-To: <4e6a6b3c0804301300q57b4b562r854e337ff8706222@mail.gmail.com> References: <20080430192354.GG26724@sgi.com> <4e6a6b3c0804301300q57b4b562r854e337ff8706222@mail.gmail.com> Message-ID: <4e6a6b3c0804301302i1fc42d90u9a0ac7be9048b8eb@mail.gmail.com> On 4/30/08, Eli Cohen wrote: > Artur, > when it happens please: > 1. Check the link error counters. > 2. Disconnect and reconnect the cable and see if it recovers. > Sorry for misspelling you name :-) From eli at dev.mellanox.co.il Wed Apr 30 13:00:55 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Wed, 30 Apr 2008 23:00:55 +0300 Subject: [ofa-general] IPoIB-UD TX timeouts (OFED 1.2) In-Reply-To: <20080430192354.GG26724@sgi.com> References: <20080430192354.GG26724@sgi.com> Message-ID: <4e6a6b3c0804301300q57b4b562r854e337ff8706222@mail.gmail.com> Artur, when it happens please: 1. Check the link error counters. 2. Disconnect and reconnect the cable and see if it recovers. On 4/30/08, akepner at sgi.com wrote: > > At a customer site running OFED 1.2 we are seeing the > following - after ~10s of hours of stressing IPoIB, > the card apparently stops generating TX completions. > (These are MT25204 cards in x86_64 boxes, and we've seen > this with a couple f/w versions, including the latest.) > > We get something like: > > kernel: NETDEV WATCHDOG: ib0: transmit timed out > kernel: ib0: transmit timeout: latency 1972 msecs > kernel: ib0: queue stopped 1, tx_head 3271, tx_tail 3207 > > and that repeats "forever". > > And to simplify things, we can produce this behavior in > datagram mode. > > As long as only datagram mode is in use, the TX code in the > IPoIB driver seems quite straightforward. The only reason I > can imagine that we'd fail to get a timely TX completion > would be if link-level flow control were to throttle us. And > I'd expect that to be a transient condition... Am I > ovelooking something? Anyone seen similar? Suggestions for > debugging? > > -- > Arthur > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From makc at sgi.com Wed Apr 30 13:59:47 2008 From: makc at sgi.com (Max Matveev) Date: Thu, 1 May 2008 06:59:47 +1000 Subject: [ofa-general] mapping IP addresses to GIDs across IP subnets Message-ID: <18456.56771.908062.459625@kuku.melbourne.sgi.com> IB GID has the same format as IPv6 address, IPv6 addresses are resolvable via DNS' AAAA or A6 records, you can go from IPv4 to name to IPv6 address without reinventing the wheel. It would not help with replacing arp use in rdma_cm though. max From swise at opengridcomputing.com Wed Apr 30 14:21:09 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 30 Apr 2008 16:21:09 -0500 Subject: [ofa-general] [GIT PULL ofed-1.3.1] - chelsio changes for ofed-1.3.1 Message-ID: <4818E2C5.7060907@opengridcomputing.com> Vlad, Please pull from: git://git.openfabrics.org/~swise/ofed-1.3 ofed_kernel This will sync up ofed-1.3.1 with all the important upstream fixes since ofed-1.3. The patch files added are: kernel_patches/fixes/iw_cxgb3_0080_Fail_Loopback_Connections.patch kernel_patches/fixes/iw_cxgb3_0090_Fix_shift_calc_in_build_phys_page_list_for_1-entry_page_lists.patch kernel_patches/fixes/iw_cxgb3_0100_Return_correct_max_inline_data_when_creating_a_QP.patch kernel_patches/fixes/iw_cxgb3_0110_Fix_iwch_create_cq_off-by-one_error.patch kernel_patches/fixes/iw_cxgb3_0120_Dont_access_a_cm_id_after_dropping_reference.patch kernel_patches/fixes/iw_cxgb3_0130_Correctly_set_the_max_mr_size_device_attribute.patch kernel_patches/fixes/iw_cxgb3_0140_Correctly_serialize_peer_abort_path.patch kernel_patches/fixes/iw_cxgb3_0150_Support_peer-2-peer_connection_setup.patch Thanks, Steve. From swise at opengridcomputing.com Wed Apr 30 14:23:40 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 30 Apr 2008 16:23:40 -0500 Subject: [ofa-general] [GIT PULL ofed-1.3.1] libcxgb3 version 1.2.0 Message-ID: <4818E35C.4050206@opengridcomputing.com> Vlad, Please pull in version 1.2.0 of libcxgb3. This is needed for the ofed-1.3.1 kernel drivers. Pull from: git://git.openfabrics.org/~swise/libcxgb3 ofed_1_3_1 Thanks, Steve. From jgunthorpe at obsidianresearch.com Wed Apr 30 14:30:51 2008 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Wed, 30 Apr 2008 15:30:51 -0600 Subject: [ofa-general] mapping IP addresses to GIDs across IP subnets In-Reply-To: <18456.56771.908062.459625@kuku.melbourne.sgi.com> References: <18456.56771.908062.459625@kuku.melbourne.sgi.com> Message-ID: <20080430213051.GX24525@obsidianresearch.com> On Thu, May 01, 2008 at 06:59:47AM +1000, Max Matveev wrote: > IB GID has the same format as IPv6 address, IPv6 addresses are > resolvable via DNS' AAAA or A6 records, you can go from IPv4 to name > to IPv6 address without reinventing the wheel. Well, you can't just assume that a AAAA record associated with the reverse of a IPv4 is a GID - it could be a legitimate IPv6 address. The GID space and IPv6 space are completely distinct, despite the same format of the address. The only way I could see to do this with DNS is to introduce a new record type for GIDs.. Alternatively, you could use DNS to manage a mapping table, ala the reverse map: 1.0.0.10.ipv4.ibta-addr. AAAA fd83:609c:bdc8:1:213:72ff:fe29:e65d Jason From roland.list at gmail.com Wed Apr 30 15:21:17 2008 From: roland.list at gmail.com (Roland Dreier) Date: Wed, 30 Apr 2008 15:21:17 -0700 Subject: [ofa-general] [GIT PULL ofed-1.3.1] - chelsio changes for ofed-1.3.1 In-Reply-To: <4818E2C5.7060907@opengridcomputing.com> References: <4818E2C5.7060907@opengridcomputing.com> Message-ID: Steve -- did the IRD/ORD mixup fix get included? (It's 1f71f503 "RDMA/cxgb3: Program hardware IRD with correct value") in the upstream kernel On Wed, Apr 30, 2008 at 2:21 PM, Steve Wise wrote: > Vlad, > > Please pull from: > > git://git.openfabrics.org/~swise/ofed-1.3 ofed_kernel > > This will sync up ofed-1.3.1 with all the important upstream fixes since > ofed-1.3. The patch files added are: > > kernel_patches/fixes/iw_cxgb3_0080_Fail_Loopback_Connections.patch > > kernel_patches/fixes/iw_cxgb3_0090_Fix_shift_calc_in_build_phys_page_list_for_1-entry_page_lists.patch > > kernel_patches/fixes/iw_cxgb3_0100_Return_correct_max_inline_data_when_creating_a_QP.patch > > kernel_patches/fixes/iw_cxgb3_0110_Fix_iwch_create_cq_off-by-one_error.patch > > kernel_patches/fixes/iw_cxgb3_0120_Dont_access_a_cm_id_after_dropping_reference.patch > > kernel_patches/fixes/iw_cxgb3_0130_Correctly_set_the_max_mr_size_device_attribute.patch > > kernel_patches/fixes/iw_cxgb3_0140_Correctly_serialize_peer_abort_path.patch > > kernel_patches/fixes/iw_cxgb3_0150_Support_peer-2-peer_connection_setup.patch > > > Thanks, > > Steve. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Wed Apr 30 15:24:18 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 30 Apr 2008 15:24:18 -0700 Subject: [ofa-general] Re: [ewg] [GIT PULL ofed-1.3.1] libcxgb3 version 1.2.0 In-Reply-To: <4818E35C.4050206@opengridcomputing.com> (Steve Wise's message of "Wed, 30 Apr 2008 16:23:40 -0500") References: <4818E35C.4050206@opengridcomputing.com> Message-ID: Steve -- If you put a tarball (from make dist ;) on openfabrics.org, I'll update the Debian packages. From swise at opengridcomputing.com Wed Apr 30 15:25:40 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 30 Apr 2008 17:25:40 -0500 Subject: [ofa-general] [GIT PULL ofed-1.3.1] - chelsio changes for ofed-1.3.1 In-Reply-To: References: <4818E2C5.7060907@opengridcomputing.com> Message-ID: <4818F1E4.1080202@opengridcomputing.com> Roland Dreier wrote: > Steve -- did the IRD/ORD mixup fix get included? (It's 1f71f503 > "RDMA/cxgb3: Program hardware IRD with correct value") in the upstream > kernel > > Oops. Good catch. No worries though, I've got another series to post (including the qp flush bug NFSRDMA found) for ofed-1.3.1 so i'll add this one. Thanks, Steve. > On Wed, Apr 30, 2008 at 2:21 PM, Steve Wise wrote: > >> Vlad, >> >> Please pull from: >> >> git://git.openfabrics.org/~swise/ofed-1.3 ofed_kernel >> >> This will sync up ofed-1.3.1 with all the important upstream fixes since >> ofed-1.3. The patch files added are: >> >> kernel_patches/fixes/iw_cxgb3_0080_Fail_Loopback_Connections.patch >> >> kernel_patches/fixes/iw_cxgb3_0090_Fix_shift_calc_in_build_phys_page_list_for_1-entry_page_lists.patch >> >> kernel_patches/fixes/iw_cxgb3_0100_Return_correct_max_inline_data_when_creating_a_QP.patch >> >> kernel_patches/fixes/iw_cxgb3_0110_Fix_iwch_create_cq_off-by-one_error.patch >> >> kernel_patches/fixes/iw_cxgb3_0120_Dont_access_a_cm_id_after_dropping_reference.patch >> >> kernel_patches/fixes/iw_cxgb3_0130_Correctly_set_the_max_mr_size_device_attribute.patch >> >> kernel_patches/fixes/iw_cxgb3_0140_Correctly_serialize_peer_abort_path.patch >> >> kernel_patches/fixes/iw_cxgb3_0150_Support_peer-2-peer_connection_setup.patch >> >> >> Thanks, >> >> Steve. >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> >> From rdreier at cisco.com Wed Apr 30 15:25:40 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 30 Apr 2008 15:25:40 -0700 Subject: [ofa-general] Re: [PATCH 00/13] QLogic Virtual NIC (VNIC) Driver In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain> (Ramachandra K.'s message of "Wed, 30 Apr 2008 22:45:52 +0530") References: <20080430171028.31725.86190.stgit@localhost.localdomain> Message-ID: > This is the QLogic Virtual NIC driver patch series which has been tested > against your for-2.6.26 and for-2.6.27 branches. We intended these patches to > make it to the 2.6.26 kernel, but if it is too late for the 2.6.26 merge window > please consider them for 2.6.27. Yes, *WAY* too late for 2.6.26, given that today is the last day of the merge window, and that things that get merged need to be ready before the merge window opens. > The driver compiles cleanly with sparse endianness checking enabled. We have > also tested the driver with lockdep checking enabled. > > We have run these patches through checkpatch.pl and the only warnings are > related to lines slightly longer than 80 columns in some of the statements. All good news. Will review and I hope get this into 2.6.27. - R. From swise at opengridcomputing.com Wed Apr 30 15:26:22 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 30 Apr 2008 17:26:22 -0500 Subject: [ofa-general] Re: [ewg] [GIT PULL ofed-1.3.1] libcxgb3 version 1.2.0 In-Reply-To: References: <4818E35C.4050206@opengridcomputing.com> Message-ID: <4818F20E.3040500@opengridcomputing.com> Roland Dreier wrote: > Steve -- If you put a tarball (from make dist ;) on openfabrics.org, > I'll update the Debian packages. > I plan to do this soon. Steve. From rdreier at cisco.com Wed Apr 30 15:30:03 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 30 Apr 2008 15:30:03 -0700 Subject: [ofa-general][PATCH] Re: mlx4: Completion EQ per cpu (MP support, Patch 10) In-Reply-To: <40FA0A8088E8A441973D37502F00933E3A24@mtlexch01.mtl.com> (Liran Liss's message of "Wed, 30 Apr 2008 22:56:28 +0300") References: <40FA0A8088E8A441973D37502F00933E3A24@mtlexch01.mtl.com> Message-ID: > > I would just like to see an approach that is fully thought through and > > gives a way for applications/kernel drivers to choose a CQ vector based > > on some information about what CPU it will go to. > Isn't the decision of which CPU an MSI-X is routed to (and hence, to > which CPI an EQ is bound to) determined by userspace? (either by the irq > balancer process or by manually setting /proc/irq//smp_affinity)? Yes, but how can anything tell which IRQ number corresponds to a given "CQ vector" number? (And don't be too stuck on MSI-X, since ehca uses some completely different GX-bus related thing to get multiple interrupts) > What are we risking in making the default action to spread interrupts? There are fairly plausible scenarios like a multi-threaded app where each thread creates a send CQ and a receive CQ, which should both be bound to the same CPU as the thread. If we spread all CQs then it's impossible to get thread-locality. I'm not saying that round-robin is necessarily a bad default policy, but I do think there needs to be a complete picture of how that policy can be overridden before we go for multiple interrupt vectors. - R. From arlin.r.davis at intel.com Wed Apr 30 15:57:54 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Wed, 30 Apr 2008 15:57:54 -0700 Subject: [ofa-general] [PATCH] [dat2.0] dapl: fix post_ext_send, post_send, post_recv to handle 0 byte's and NULL iov handles Message-ID: and return errno with verbs post failures. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/openib_cma/dapl_ib_dto.h | 20 ++++++++++++-------- dapl/openib_cma/dapl_ib_extensions.c | 3 --- 2 files changed, 12 insertions(+), 11 deletions(-) diff --git a/dapl/openib_cma/dapl_ib_dto.h b/dapl/openib_cma/dapl_ib_dto.h index b111e5e..ffb5dca 100644 --- a/dapl/openib_cma/dapl_ib_dto.h +++ b/dapl/openib_cma/dapl_ib_dto.h @@ -124,7 +124,7 @@ dapls_ib_post_recv ( dapl_os_free(ds_array_start_p, segments * sizeof(ib_data_segment_t)); if (ret) - return( dapl_convert_errno(EFAULT,"ibv_recv") ); + return( dapl_convert_errno(errno,"ibv_recv") ); return DAT_SUCCESS; } @@ -202,7 +202,8 @@ dapls_ib_post_send ( if (cookie != NULL) cookie->val.dto.size = total_len; - if ((op_type == OP_RDMA_WRITE) || (op_type == OP_RDMA_READ)) { + if (wr.num_sge && + (op_type == OP_RDMA_WRITE || op_type == OP_RDMA_READ)) { wr.wr.rdma.remote_addr = remote_iov->virtual_address; wr.wr.rdma.rkey = remote_iov->rmr_context; dapl_dbg_log(DAPL_DBG_TYPE_EP, @@ -234,7 +235,7 @@ dapls_ib_post_send ( dapl_os_free(ds_array_start_p, segments * sizeof(ib_data_segment_t)); if (ret) - return( dapl_convert_errno(EFAULT,"ibv_send") ); + return( dapl_convert_errno(errno,"ibv_send") ); dapl_dbg_log(DAPL_DBG_TYPE_EP," post_snd: returned\n"); return DAT_SUCCESS; @@ -357,12 +358,15 @@ dapls_ib_post_ext_send ( /* OP_RDMA_WRITE)IMMED has direct IB wr_type mapping */ dapl_dbg_log(DAPL_DBG_TYPE_EP, " post_ext: rkey 0x%x va %#016Lx immed=0x%x\n", - remote_iov->rmr_context, - remote_iov->virtual_address, immed_data); + remote_iov?remote_iov->rmr_context:0, + remote_iov?remote_iov->virtual_address:0, + immed_data); wr.imm_data = immed_data; - wr.wr.rdma.remote_addr = remote_iov->virtual_address; - wr.wr.rdma.rkey = remote_iov->rmr_context; + if (wr.num_sge) { + wr.wr.rdma.remote_addr = remote_iov->virtual_address; + wr.wr.rdma.rkey = remote_iov->rmr_context; + } break; case OP_COMP_AND_SWAP: /* OP_COMP_AND_SWAP has direct IB wr_type mapping */ @@ -411,7 +415,7 @@ dapls_ib_post_ext_send ( dapl_os_free(ds_array_start_p, segments * sizeof(ib_data_segment_t)); if (ret) - return( dapl_convert_errno(EFAULT,"ibv_send") ); + return( dapl_convert_errno(errno,"ibv_send") ); dapl_dbg_log(DAPL_DBG_TYPE_EP," post_snd: returned\n"); return DAT_SUCCESS; diff --git a/dapl/openib_cma/dapl_ib_extensions.c b/dapl/openib_cma/dapl_ib_extensions.c index 3132ffb..52b238f 100755 --- a/dapl/openib_cma/dapl_ib_extensions.c +++ b/dapl/openib_cma/dapl_ib_extensions.c @@ -185,9 +185,6 @@ dapli_post_ext( IN DAT_EP_HANDLE ep_handle, if (DAPL_BAD_HANDLE(ep_handle, DAPL_MAGIC_EP)) return(DAT_ERROR(DAT_INVALID_HANDLE, DAT_INVALID_HANDLE_EP)); - if ((NULL == remote_iov) || (NULL == local_iov)) - return DAT_INVALID_PARAMETER; - ep_ptr = (DAPL_EP *) ep_handle; qp_ptr = ep_ptr->qp_handle; -- 1.5.2.5 From arlin.r.davis at intel.com Wed Apr 30 15:57:50 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Wed, 30 Apr 2008 15:57:50 -0700 Subject: [ofa-general] [PATCH] [dat1.2] dapl: fix post_send, post_recv to handle 0 byte's and NULL iov handles Message-ID: <000901c8ab15$9d72f9c0$daba020a@amr.corp.intel.com> and return errno with verbs post failures. Signed-off by: Arlin Davis ardavis at ichips.intel.com --- dapl/openib_cma/dapl_ib_dto.h | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/dapl/openib_cma/dapl_ib_dto.h b/dapl/openib_cma/dapl_ib_dto.h index 52b189b..f45da35 100644 --- a/dapl/openib_cma/dapl_ib_dto.h +++ b/dapl/openib_cma/dapl_ib_dto.h @@ -120,7 +120,7 @@ dapls_ib_post_recv ( dapl_os_free(ds_array_start_p, segments * sizeof(ib_data_segment_t)); if (ret) - return( dapl_convert_errno(EFAULT,"ibv_recv") ); + return( dapl_convert_errno(errno,"ibv_recv") ); return DAT_SUCCESS; } @@ -199,7 +199,8 @@ dapls_ib_post_send ( if (cookie != NULL) cookie->val.dto.size = total_len; - if ((op_type == OP_RDMA_WRITE) || (op_type == OP_RDMA_READ)) { + if (wr.num_sge && + (op_type == OP_RDMA_WRITE || op_type == OP_RDMA_READ)) { wr.wr.rdma.remote_addr = remote_iov->target_address; wr.wr.rdma.rkey = remote_iov->rmr_context; dapl_dbg_log(DAPL_DBG_TYPE_EP, @@ -230,7 +231,7 @@ dapls_ib_post_send ( dapl_os_free(ds_array_start_p, segments * sizeof(ib_data_segment_t)); if (ret) - return( dapl_convert_errno(EFAULT,"ibv_send") ); + return( dapl_convert_errno(errno,"ibv_send") ); dapl_dbg_log(DAPL_DBG_TYPE_EP," post_snd: returned\n"); return DAT_SUCCESS; -- 1.5.2.5 From rdreier at cisco.com Wed Apr 30 19:55:24 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 30 Apr 2008 19:55:24 -0700 Subject: [ofa-general] [PATCH] IB/ipoib: fix net queue lockup In-Reply-To: <1209577156.1790.11.camel@mtls03> (Eli Cohen's message of "Wed, 30 Apr 2008 20:39:16 +0300") References: <1209577156.1790.11.camel@mtls03> Message-ID: thanks, looks like a good solution, applied, just adding an ipoib_ prefix since > +void send_comp_handler(struct ib_cq *cq, void *dev_ptr) is too generic a name for a global symbol. By the way I figured out the crash on unload -- it was an mlx4 bug that I introduced, which is fixed by: IB/mlx4: Fix off-by-one errors in calls to mlx4_ib_free_cq_buf() When I merged bbf8eed1 ("IB/mlx4: Add support for resizing CQs") I changed things around so that mlx4_ib_alloc_cq_buf() and mlx4_ib_free_cq_buf() were used everywhere they could be. However, I screwed up the number of entries passed into mlx4_ib_alloc_cq_buf() in a couple places -- the function bumps the number of entries internally, so the caller shouldn't add 1 as well. Passing a too-big value for the number of entries to mlx4_ib_free_cq_buf() can cause the cleanup to go off the end of an array and corrupt allocator state in interesting ways. Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mlx4/cq.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 2f199c5..4521319 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -246,7 +246,7 @@ err_mtt: if (context) ib_umem_release(cq->umem); else - mlx4_ib_free_cq_buf(dev, &cq->buf, entries); + mlx4_ib_free_cq_buf(dev, &cq->buf, cq->ibcq.cqe); err_db: if (!context) @@ -434,7 +434,7 @@ int mlx4_ib_destroy_cq(struct ib_cq *cq) mlx4_ib_db_unmap_user(to_mucontext(cq->uobject->context), &mcq->db); ib_umem_release(mcq->umem); } else { - mlx4_ib_free_cq_buf(dev, &mcq->buf, cq->cqe + 1); + mlx4_ib_free_cq_buf(dev, &mcq->buf, cq->cqe); mlx4_db_free(dev->dev, &mcq->db); } -- 1.5.5.1 From rdreier at cisco.com Wed Apr 30 20:05:43 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 30 Apr 2008 20:05:43 -0700 Subject: [ofa-general] [PATCH] IB/ipoib: fix net queue lockup In-Reply-To: <1209577156.1790.11.camel@mtls03> (Eli Cohen's message of "Wed, 30 Apr 2008 20:39:16 +0300") References: <1209577156.1790.11.camel@mtls03> Message-ID: > we haves seen a few other cases where a large tx queue is needed. I > think we should choose a larger default value than the current 64. maybe yes, maybe no... what are the cases where it is needed? The send queue is basically acting as a "shock absorber" for bursty traffic. If the queue is filling up because of a steady traffic rate, then making the queue bigger means it will just take a little longer to fill. The way a longer send queue helps I guess is if the send queue is emptying out before the transmit queue is woken up... with small packets I suppose it doesn't take long for the send queue to drain completely. - R. From rdreier at cisco.com Wed Apr 30 20:46:24 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 30 Apr 2008 20:46:24 -0700 Subject: [ofa-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will get a couple of fixes for medium-impact bugs. If they can go into -rc1, great; otherwise the world won't end if they end up in -rc2. Eli Cohen (1): IB/ipoib: Fix transmit queue stalling forever Roland Dreier (1): IB/mlx4: Fix off-by-one errors in calls to mlx4_ib_free_cq_buf() drivers/infiniband/hw/mlx4/cq.c | 4 +- drivers/infiniband/ulp/ipoib/ipoib.h | 2 + drivers/infiniband/ulp/ipoib/ipoib_ib.c | 47 +++++++++++++++++++++++++--- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 3 +- 4 files changed, 48 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 2f199c5..4521319 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -246,7 +246,7 @@ err_mtt: if (context) ib_umem_release(cq->umem); else - mlx4_ib_free_cq_buf(dev, &cq->buf, entries); + mlx4_ib_free_cq_buf(dev, &cq->buf, cq->ibcq.cqe); err_db: if (!context) @@ -434,7 +434,7 @@ int mlx4_ib_destroy_cq(struct ib_cq *cq) mlx4_ib_db_unmap_user(to_mucontext(cq->uobject->context), &mcq->db); ib_umem_release(mcq->umem); } else { - mlx4_ib_free_cq_buf(dev, &mcq->buf, cq->cqe + 1); + mlx4_ib_free_cq_buf(dev, &mcq->buf, cq->cqe); mlx4_db_free(dev->dev, &mcq->db); } diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 9044f88..ca126fc 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -334,6 +334,7 @@ struct ipoib_dev_priv { #endif int hca_caps; struct ipoib_ethtool_st ethtool; + struct timer_list poll_timer; }; struct ipoib_ah { @@ -404,6 +405,7 @@ extern struct workqueue_struct *ipoib_workqueue; int ipoib_poll(struct napi_struct *napi, int budget); void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr); +void ipoib_send_comp_handler(struct ib_cq *cq, void *dev_ptr); struct ipoib_ah *ipoib_create_ah(struct net_device *dev, struct ib_pd *pd, struct ib_ah_attr *attr); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 97b815c..f429bce 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -461,6 +461,26 @@ void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr) netif_rx_schedule(dev, &priv->napi); } +static void drain_tx_cq(struct net_device *dev) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + unsigned long flags; + + spin_lock_irqsave(&priv->tx_lock, flags); + while (poll_tx(priv)) + ; /* nothing */ + + if (netif_queue_stopped(dev)) + mod_timer(&priv->poll_timer, jiffies + 1); + + spin_unlock_irqrestore(&priv->tx_lock, flags); +} + +void ipoib_send_comp_handler(struct ib_cq *cq, void *dev_ptr) +{ + drain_tx_cq((struct net_device *)dev_ptr); +} + static inline int post_send(struct ipoib_dev_priv *priv, unsigned int wr_id, struct ib_ah *address, u32 qpn, @@ -555,12 +575,22 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, else priv->tx_wr.send_flags &= ~IB_SEND_IP_CSUM; + if (++priv->tx_outstanding == ipoib_sendq_size) { + ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n"); + if (ib_req_notify_cq(priv->send_cq, IB_CQ_NEXT_COMP)) + ipoib_warn(priv, "request notify on send CQ failed\n"); + netif_stop_queue(dev); + } + if (unlikely(post_send(priv, priv->tx_head & (ipoib_sendq_size - 1), address->ah, qpn, tx_req, phead, hlen))) { ipoib_warn(priv, "post_send failed\n"); ++dev->stats.tx_errors; + --priv->tx_outstanding; ipoib_dma_unmap_tx(priv->ca, tx_req); dev_kfree_skb_any(skb); + if (netif_queue_stopped(dev)) + netif_wake_queue(dev); } else { dev->trans_start = jiffies; @@ -568,14 +598,11 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, ++priv->tx_head; skb_orphan(skb); - if (++priv->tx_outstanding == ipoib_sendq_size) { - ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n"); - netif_stop_queue(dev); - } } if (unlikely(priv->tx_outstanding > MAX_SEND_CQE)) - poll_tx(priv); + while (poll_tx(priv)) + ; /* nothing */ } static void __ipoib_reap_ah(struct net_device *dev) @@ -609,6 +636,11 @@ void ipoib_reap_ah(struct work_struct *work) round_jiffies_relative(HZ)); } +static void ipoib_ib_tx_timer_func(unsigned long ctx) +{ + drain_tx_cq((struct net_device *)ctx); +} + int ipoib_ib_dev_open(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -645,6 +677,10 @@ int ipoib_ib_dev_open(struct net_device *dev) queue_delayed_work(ipoib_workqueue, &priv->ah_reap_task, round_jiffies_relative(HZ)); + init_timer(&priv->poll_timer); + priv->poll_timer.function = ipoib_ib_tx_timer_func; + priv->poll_timer.data = (unsigned long)dev; + set_bit(IPOIB_FLAG_INITIALIZED, &priv->flags); return 0; @@ -810,6 +846,7 @@ int ipoib_ib_dev_stop(struct net_device *dev, int flush) ipoib_dbg(priv, "All sends and receives done.\n"); timeout: + del_timer_sync(&priv->poll_timer); qp_attr.qp_state = IB_QPS_RESET; if (ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE)) ipoib_warn(priv, "Failed to modify QP to RESET state\n"); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c index c1e7ece..8766d29 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -187,7 +187,8 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca) goto out_free_mr; } - priv->send_cq = ib_create_cq(priv->ca, NULL, NULL, dev, ipoib_sendq_size, 0); + priv->send_cq = ib_create_cq(priv->ca, ipoib_send_comp_handler, NULL, + dev, ipoib_sendq_size, 0); if (IS_ERR(priv->send_cq)) { printk(KERN_WARNING "%s: failed to create send CQ\n", ca->name); goto out_free_recv_cq;